Three more papers | Robust Analysis

If you do much exploratory data analysis you’re likely aware of two papers published back-to-back in Science in 2000 – one which introduced Isometric Feature Mapping (Isomap) and the other which introduced Locally Linear Embedding (LLE). If you’re not familiar with them, both papers describe methods for creating low-dimensional representations of high-dimensional data AND are able to learn the global structure of nonlinear manifolds. It’s that latter capability which makes them noteworthy. To wit, according to Google Scholar they’ve each been cited over 5,500 times! There were already good methods available for creating compact representations of high dimensional data where the global structure is linear, e.g., defined by a plane or hyperplane. The ability to learn nonlinear structure was new a big deal.

From the Abstract of the Isomap paper (emphasis added):

“Unlike classical techniques such as principal component analysis and multidimensional scaling, [Isomap] is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution…”

And from the Abstract of the LLE paper (emphasis added):

“[LLE] is an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE… maps its inputs into a single global coordinate system of lower dimensionality… LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.”

The Isomap and LLE papers are, respectively:

“A Global Geometric Framework for Nonlinear Dimensionality Reduction“, J. B. Tenenbaum, V. de Silva, and J. C. Langford, Science,vol. 290, (22 Dec 2000), pp.2319-2323.

“Nonlinear Dimensionality Reduction by Locally Linear Embedding“, S. T. Roweis and L. K. Saul, Science, vol. 290, (22 Dec 2000), pp.2323-2325.

The third paper has received nowhere near the attention of the first two paper. Why do I like it? Because it introduces a significant improvement to LLE: It makes the method robust against outliers. Their modification to LLE has the effect of making the method of significantly more practical use. Most real life data sets contain outliers, often times lots of them. From the Abstract of Chang and Yeung’s paper:

“Despite their appealing properties, … [nonlinear dimensionality reduction] methods [such as Isomap and LLE] are not robust against outliers in the data, yet so far very little has been done to address the robustness problem. In this paper, we address this problem in the context of … LLE. Based on robust estimation techniques, we propose an approach to make LLE more robust… Experimental results on both synthetic and real-world data show that [our algorithm] is very robust against outliers”

The full paper:

“Robust locally linear embedding,” H. Chang, D.-Y. Yeung, Pattern Recognition, vol. 39 (2006), pp.1053-1065.

[Note: I’ve created a page with a complete list of all the papers I’ve noted in posts as “particular instructive or useful”.]