
Let us see some set of m elements wÎW. It is known a partition is a set of mutually non–intersected subsets W_{1},...W_{K}, being united give initial set W. Let elements w_{i} be presented as vectors x_{i} = (x_{i}_{1},…x_{in}) in some n–dimensional space. We call them as objects in the n–dimensional space of features (measured characteristics). Objects form rows of data matrix X(m,n), and their partitioning is called as clustering (classification). In the cluster–analysis of features such an elements are features in the n–dimensional space of objects (acts of measuring). It is obvious features form columns in the same matrix X(m,n). Here it is convenient for us to consider features form rows in another matrix X(m,n) after initial matrix have been transposed, and their dimensionalities have been redenoted. Feature partitioning is usually called as grouping. Similarity of a couple of objects is usually estimated by Euclidean distance between them, and similarity of a couple of features is usually estimated by mutual correlation. Let matrix D(m,m) be the matrix of paired similarity between elements w in corresponding space. Let some partitioning of the set W have been done. It is usually possible for each subset W_{i} to point some element v_{i} as its most typical one in some sense. Let’s call such an element as a representative. Representative v_{i} can be noncoinciding with all elements from W_{i} in a common case. For example, in clustering problem the representative is usually the mean vector z_{i} for submatrix X(m_{i},n), and in some grouping problems (extreme grouping) the representative is the first principal component of sub–matrix D(m_{i},m_{i}). In this case components of the representative vector are calculated slightly more complicated.
Let’s say representatives be “centers” of corresponding sets. It is obvious the sense of the center depends on the contents of problem in a common case. Let’s call some partition as unbiased one, if all its representatives coincide with own centers. Elements w arrangement in some space is usually estimated by some characteristic of arrangement density (concentration). For example, partition density is described by dispersion of clusters W_{i} in cluster–analysis. Partition of extreme grouping is described, for example, by sum of correlation squares (or absolute values) of each group W_{i} representative with its own features. Now we can describe base algorithm to split some set W to a priori defined number K of subsets W_{i}. Algorithm 1. Let K representatives be defined, for example, as K most dissimilar elements. Let for current step s splitting be produced by scanning all elements w to mark each of them as from that set W_{i}, if corresponding representative is most similar to it. If partition appears to be biased, let centers of sets W_{i} be their representatives. Let’s go to the next step s=s+1. This algorithm stops for unbiased partition. It is easy to see we can get right now well–known base clustering (K–means), and extreme grouping algorithms. By using the same way we can get another wellknown forms of these algorithms. Let’s note for K=2 in base algorithm representatives are defined by simplest way as a pair of elements for extreme value from matrix D of similarity (maximum for distances, and minimum for absolute value of correlations). Let’s build non–hierarchical divisive partitioning algorithm. Algorithm 2. Let for each step from K=1 a pair of most dissimilar objects from the least compact subset W_{k} be its representatives. Let W_{k} be split by Algorithm 1 on two subsets denoted as W_{k}, and W_{K+1}. For K+1 subsets from new partitioning let representatives be defined as K1 previous representatives v_{1},…v_{k1}, v_{k+1},…v_{K}, and two new ones v_{k}, and v_{K+1}. Let the set W be split by Algorithm 1 on K+1 subsets again. Let’s go to the next step s=s+1. This algorithm stops for K = m. This algorithm builds full sequence of partitions from the single set W_{1}=W to singleelement sets W_{1},...W_{m}. Not all partitions from this sequence are hierarchically nested in a common case. It is obvious two subsequent partitions on K, and K+1 subsets are hierarchically nested, if partitioning of the least compact subset W_{k} on two ones W_{k}, and W_{K+1 }gives straight away unbiased partition W_{1},...W_{K+1}. Let’s call previous partition on K subsets as stable one. Let’s see elements w as objects. It is obvious a graph of shortest nonclosed path (minimal spanning tree) is defined on the set W. If elements w are called as features then a graph of maximal nonclosed correlation path is defined on the set W. Let’s say the nonclosed graph of maximal similarity is defined on the set W. Theorem. Let tree of partitions be built by non–hierarchical divisive partitioning algorithm. Then this tree is equivalent to nonclosed graph of maximal similarity.
Corollary. If tree of partitions is built, then non–hierarchical divisive partitioning algorithm is equivalent to the “nearest neighbor” algorithm in cluster–analysis, and is equivalent to the “correlation pleiades” algorithm in grouping problem. In a common case this algorithm is equivalent to some set of hierarchical divisive algorithms, starting from single cluster (group), forming one of subsequences of hierarchically nested partitions, and finishing with single–element clusters (groups). The considered method of non–hierarchical partitioning combines two essentially different approaches to clustering. From one side, wellknown algorithms are used, and from other side, the result is evident like the dendrogram. Here we try to directly use intuitive assumption about “stability” of a goodquality partitioning. It is obvious, some stable partition needs to be used as the optimal one from one of subsequences of hierarchically nested partitions. It is obvious too, the general sequence can be limited. We can throw off partitions with weakly filled clusters (groups) starting from some K near of sample size m. We also can throw off partitions with friable clusters for small value of K. For example, for these partitions the average dispersion of clusters exceeds the dispersion of their centers. Then remained partitions will represent compact clusters corresponding to generally accepted heuristic cluster definition. At the same time using only single type of compact cluster or group representing undoubtedly constricts a field of applicability of this method. But proposed algorithm appears to be quite simple, and fast to get at least initial information about structure of data under investigation. It isn’t so hard to build agglomerative version of non–hierarchical partitioning algorithm. Its behavior is sufficiently evident. We will show below using of non–hierarchical partitioning method for another important problem of restoring of missing data.
