Let us see some set of m elements wÎW. It is known a partition is a set of mutually non–intersected subsets W1,...WK, being united give initial set W. Let elements wi be presented as vectors xi = (xi1,…xin) in some n–dimensional space. We call them as objects in the n–dimensional space of features (measured characteristics). Objects form rows of data matrix X(m,n), and their partitioning is called as clustering (classification).
In the cluster–analysis of features such an elements are features in the n–dimensional space of objects (acts of measuring). It is obvious features form columns in the same matrix X(m,n). Here it is convenient for us to consider features form rows in another matrix X(m,n) after initial matrix have been transposed, and their dimensionalities have been re-denoted. Feature partitioning is usually called as grouping.
Similarity of a couple of objects is usually estimated by Euclidean distance between them, and similarity of a couple of features is usually estimated by mutual correlation. Let matrix D(m,m) be the matrix of paired similarity between elements w in corresponding space.
Let some partitioning of the set W have been done. It is usually possible for each subset Wi to point some element vi as its most typical one in some sense. Let’s call such an element as a representative. Representative vi can be non-coinciding with all elements from Wi in a common case. For example, in clustering problem the representative is usually the mean vector zi for sub-matrix X(mi,n), and in some grouping problems (extreme grouping) the representative is the first principal component of sub–matrix D(mi,mi). In this case components of the representative vector are calculated slightly more complicated.
Let’s say representatives be “centers” of corresponding sets. It is obvious the sense of the center depends on the contents of problem in a common case.
Let’s call some partition as unbiased one, if all its representatives coincide with own centers.
Elements w arrangement in some space is usually estimated by some characteristic of arrangement density (concentration). For example, partition density is described by dispersion of clusters Wi in cluster–analysis. Partition of extreme grouping is described, for example, by sum of correlation squares (or absolute values) of each group Wi representative with its own features.
Now we can describe base algorithm to split some set W to a priori defined number K of subsets Wi.
Algorithm 1. Let K representatives be defined, for example, as K most dissimilar elements. Let for current step s splitting be produced by scanning all elements w to mark each of them as from that set Wi, if corresponding representative is most similar to it. If partition appears to be biased, let centers of sets Wi be their representatives. Let’s go to the next step s=s+1. This algorithm stops for unbiased partition.
It is easy to see we can get right now well–known base clustering (K–means), and extreme grouping algorithms. By using the same way we can get another well-known forms of these algorithms.
Let’s note for K=2 in base algorithm representatives are defined by simplest way as a pair of elements for extreme value from matrix D of similarity (maximum for distances, and minimum for absolute value of correlations). Let’s build non–hierarchical divisive partitioning algorithm.
Algorithm 2. Let for each step from K=1 a pair of most dissimilar objects from the least compact subset Wk be its representatives. Let Wk be split by Algorithm 1 on two subsets denoted as Wk, and WK+1. For K+1 subsets from new partitioning let representatives be defined as K-1 previous representatives v1,…vk-1, vk+1,…vK, and two new ones vk, and vK+1. Let the set W be split by Algorithm 1 on K+1 subsets again. Let’s go to the next step s=s+1. This algorithm stops for K = m.
This algorithm builds full sequence of partitions from the single set W1=W to single-element sets W1,...Wm. Not all partitions from this sequence are hierarchically nested in a common case. It is obvious two subsequent partitions on K, and K+1 subsets are hierarchically nested, if partitioning of the least compact subset Wk on two ones Wk, and WK+1 gives straight away unbiased partition W1,...WK+1. Let’s call previous partition on K subsets as stable one.
Let’s see elements w as objects. It is obvious a graph of shortest non-closed path (minimal spanning tree) is defined on the set W. If elements w are called as features then a graph of maximal non-closed correlation path is defined on the set W. Let’s say the non-closed graph of maximal similarity is defined on the set W.
Theorem. Let tree of partitions be built by non–hierarchical divisive partitioning algorithm. Then this tree is equivalent to non-closed graph of maximal similarity.
Corollary. If tree of partitions is built, then non–hierarchical divisive partitioning algorithm is equivalent to the “nearest neighbor” algorithm in cluster–analysis, and is equivalent to the “correlation pleiades” algorithm in grouping problem. In a common case this algorithm is equivalent to some set of hierarchical divisive algorithms, starting from single cluster (group), forming one of subsequences of hierarchically nested partitions, and finishing with single–element clusters (groups).
The considered method of non–hierarchical partitioning combines two essentially different approaches to clustering. From one side, well-known algorithms are used, and from other side, the result is evident like the dendrogram. Here we try to directly use intuitive assumption about “stability” of a good-quality partitioning.
It is obvious, some stable partition needs to be used as the optimal one from one of subsequences of hierarchically nested partitions.
It is obvious too, the general sequence can be limited. We can throw off partitions with weakly filled clusters (groups) starting from some K near of sample size m. We also can throw off partitions with friable clusters for small value of K. For example, for these partitions the average dispersion of clusters exceeds the dispersion of their centers. Then remained partitions will represent compact clusters corresponding to generally accepted heuristic cluster definition.
At the same time using only single type of compact cluster or group representing undoubtedly constricts a field of applicability of this method. But proposed algorithm appears to be quite simple, and fast to get at least initial information about structure of data under investigation.
It isn’t so hard to build agglomerative version of non–hierarchical partitioning algorithm. Its behavior is sufficiently evident.
We will show below using of non–hierarchical partitioning method for another important problem of restoring of missing data.