
One of possible ways to restore missing data is immediately based on the fundamental idea of cluster–analysis. So, if it is possible to split the set of elements to compact in some sense subsets, then if is naturally to restore missing values of some attributes of some elements as the most typical values for appropriate compact subsets. It is obvious, different methods depend on different ways to form such a partitioning (for example, linguistic algorithms by Braverman E.M., and I.B. Muchnik, collection of ZET algorithms by N.G. Zagoruiko). The problem of restoring of missing data seems to have been investigated enough in Russia by N.G.Zagoruiko’s team as restoring data by so called “local” methods. As a matter of fact, the collection of ZET algorithms is the highly detailed and approved technology for experimental data verification based on the hypothesis of their redundancy. By using the ZET algorithm, each separate missing value x_{ij} in data matrix X(m,n), i.e. missing j^{th} attribute of i^{th} object, is estimated by a predictor matrix. Such a matrix consists of m_{0} most similar objects, and n_{0} most similar features, where values m_{0}, and n_{0} are given a priori. The missing value is restored by regression equation for each pair of the predictor matrix rows, where one row from the pair contains the missing value (blank value or simply blank). By the same way the same blank is restored for each pair of the predictor matrix columns too. The final estimation is an average of such a previous weighted estimations, where weights are depended on some parameters. A necessity to set the predictor matrix size by values m_{0}, and n_{0}, and to set another important parameters leads to necessity to be sure of reliability of restored values. Just therefore modern versions of the ZET family algorithms present multilevel, and multi–step technology to set all necessary parameters based on the quality verification of predicted values. Such a technology has selfdependent merit, and is suitable for some set of prognostic problems. From other side it is convenient to define the predictor matrix size, for example, by an appropriate cluster and group after some clustering of objects, and grouping of features have been built. So, we need to select number of clusters and groups again. Let’s apply the method of non–hierarchical partitioning to restore blanks in the data matrix. Let’s denote blank as a symbol · . Let all incomplete pairs (x_{il}, x_{jl}), l=1,…n not be used while a similarity d_{ij} of elements w_{i}, and w_{j} is calculated. As it is known incomplete pair has at least one missing value. Let d_{ij} = d^{·}_{ij} be for correlation, and d_{ij} = d^{·}_{ij }n/(n–n^{·} ) be for Euclidean distances calculation, where d^{·}_{ij} is the similarity with incomplete pairs, n^{·} is number of incomplete pairs, n^{·} < n. Let a partitioning of the set W be built. For clustering problem we will restore missing value x_{ij} for object x_{i} for j^{th} feature as an average c^{k}_{j} of this j^{th} feature in cluster W_{k}, where x_{i}ÎW_{k}. For grouping problem we will restore missing value x_{ij} for feature x_{i} for j^{th} object (observation) as a weight–average c^{k}_{j} of this j^{th} observation by features in group W_{k}, where x_{i}ÎW_{k}. It needs to remind here it is convenient for us to use matrix X(m,n) with features as rows in it. Missing value x_{ij} can be restored by linear regression equation (x_{ij} – m (x_{i}))/s (x_{i}) = d_{iq}(x_{qj} – m (x_{q}))/s (x_{q}) for features x_{i}, and x_{q} as normalized value x^{q}_{ij} = d_{iq}(x_{qj} – m (x_{q}))/s (x_{q}), where m (x_{q}), and s (x_{q}) are the average, and the mean square deviation of the feature x_{q} (in a row). Then missing value is estimated as x_{ij} = m (x_{i}) + c^{k}_{j} s (x_{i}), c^{k}_{j} = S x^{q}_{ij} d_{iq} / S d_{ip} for x_{q}ÎW_{k}, x_{p}ÎW_{k}, where x_{qj}, x_{pj}Ï· . If values x_{lj} appear to be missed for all x_{l}ÎW_{k}, then missing value x_{ij} can’t be restored. Algorithm 3. Let’s build the complete sequence of partitions by Algorithm 2, and form a table of restored blanks. Let columns of this table correspond to blanked elements x_{ij}, and rows correspond to restored values for each stable partition in the sequence. Let’s scan the table by columns from bottom to top, and restore each blanked x_{ij} as first non–blanked value in the appropriate column. It is obvious, this algorithm tends to restore some blank of some attribute of element w_{i} simply as a value of the correspond attribute of the most similar to w_{i} element w_{q}. It is obvious right away for agglomerative version of the Algorithm 3. This algorithm has one important property. All blanks in the table are naturally ordered by moments of their restoring. It is obvious, the earlier restored blanks are the more suitable to use. We can additionally demand for each restored value x_{ij} the element w_{i} appears to be in appropriate subset W_{k} with the size W_{k}= m_{0}, m_{0}< m. Restored values can simultaneously be ordered by decreasing of density of sets W_{k}, where w _{i}ÎW_{k}. It can be supposed for some sufficiently large number of single blanks the general effect of data distortion can exist. So, it is naturally to use the next algorithm. Algorithm 4. Let all blanks be ordered by Algorithm 3. Let’s restore first or some first blanks. Let’s repeat such an actions for other blanks till all of them will be restored. Let for some matrix X(m,n) number of objects (rows) sufficiently exceeds number of features (columns). So, we suppose m >> n. Then to restore blanks it is naturally to use algorithm 4 for this matrix. If n >> m, then it is naturally to use algorithm 4 for transposed matrix X^{T}. If m» n, then the next algorithm can be used. Algorithm 5. Let all blanks be ordered by algorithm 3 in matrix X, and in matrix X^{T} too. Let some blank x_{ij} appears to be first in the first sequence, and appears to be on the k^{th} place in the second sequence. Let’s use the sequence where its first blank has the minimal sum of places 1+k. Let’s restore this blank as weightaverage value of two values from two sequences x_{ij} = x^{1}_{ij} k/(1+k) + x^{k}_{ij} /(1+k). Let’s repeat such an actions for other blanks till all of them will be restored. So, by using the method of non–hierarchical partitioning we define a family of simple algorithms to restore missing values. Let’s note it doesn’t need to define previously none of parameters. We can automatically restore all blanks in a natural way without any previously tuned parameters. From other side, quantitative estimation of the prediction error for missing values can be required for more detail analysis of data matrix. In such a case it needs to use both proper the scheme of non–hierarchical partitioning, and proposed here algorithms in a framework of showed above technology for quality verification of predicted values. Ways of using proposed algorithms for such a technology are sufficiently evident.
