A button is causing an action in the visualizations. Underlined words hightlight something in the visualizations.

is loading the dataset Iris.

t-SNE highlights the visualization of the t-SNE algorithm.

Activate the proximity view here, or temporary by pressing P.

Activate the brush here, or temporary by pressing B

Activate the component planes to show the distribution of a dimension in the projection, by clicking on the respective density plot here.

You can hover over a projected point to see their positions in the other projections and in the overview, and the details of the data point in the DIMENSIONS section.

WHO AND WHY

Who is using Dimensional Reduction (DR) techniques? Why they use it?

Data represent some characteristics of the real world as numbers and categories. Analysts, data scientists or domain experts seek for patterns in the data that might reveal interesting properties of the real world. A dataset may contain a lot of data but also a lot of attributes usually called features or dimensions. If data have one, two or three dimensions, you can imagine the data as a point cloud in that space or visualize it as a scatterplot (). Data patterns appear as particular shapes in the scatterplot like clusters of points or outliers, points aligned along lines or curves… With more than 3 dimensions it is harder to discover patterns in the data (). You can encode additional dimensions with color, size or shape of the points but it comes rapidly to a limit. Visual metaphors like parallel coordinates or scatterplot matrices have been proposed to visualize high-dimensional data. DR is one of them.

It consists in summarizing the high-dimensional data into a 2-dimension scatterplot. This summary attempts to preserve some sort of similarity so data similar in the high dimensional space are represented by nearby points in the scatterplot and vice-versa. The rationale behind this approach is that most of the sought after data patterns (clusters, outliers, curves…) only depend on data similarities, and our visual perception system is very good at detecting such patterns in scatterplots. So researchers working on DR techniques believe that transforming data similarities into point proximities is a way likely to make our eyes perceive these patterns more easily than with other visual metaphors. Researchers in that field attempt to propose solutions to technical issues like reducing computing time and increasing trustworthiness. They also try to quantify how much their driving assumption is true, in particular for which analytic tasks, for which kind of patterns or which kind of data…

The dataset consists a sample of songs from the Spotify database. Each entry has seven dimensions. These dimensions are features of the song computed by Spotify. Acousticness, for instance, is the probability how likely the song is pure acoustic. Danceability is a computed value based on the monotony of the song, which refers to the ability of the song to dance to it.

t-SNE
There are a dense group of songs, which are live ().

LLE, MDS, ISOMAP
There are five songs isolated from the other points. They are all more and .

WHAT

Visualization is used to analyze data. Here is a short description of the available datasets.

DATA

Artificial datasets with non-linear and manifold structure, which illustrate the advantages of LLE, MDS and ISOMAP. You can inspect the dataset in 3d in the overview, which you can rotate.

Simple artificial dataset, which get generated randomly each time. These datasets show the advantages and limits of t-SNE and PCA.

A somewhat realistic dataset, which contains a sample of songs from the spotify database. Spotify computes some features of the songs like their “Acousticness” or “Danceability”.

A well-known benchmark dataset, which contains fifty examples from each of the three species of iris flower. The four dimensions are the length and width of sepal and petal.

A well-known benchmark dataset and prime example for t-SNE, which contains handwritten digits from 0 to 9 on a 28\mathrm{px} \times 28\mathrm{px} plane. The grey value of each pixel codes for one dimension. Here, it doesn't make sense to visualize them with parallel coordinates or a scatterplot matrix, because it has too many dimensions (28 \times 28 = 784) and the value of a single pixel brings no meaningful information.

HOW

There are different kinds of DR techniques, which preserve different features of the data:

linear methods, like PCA (Principal Component Analysis)

nonlinear methods, like LLE (Locally Linear Embedding), MDS (Multidimensional Scaling), ISOMAP, t-SNE (t-distributed Stochastic Neighbor Embedding)

PCA

PCA computes the projection of the data, which has the maximum extent.

You can imagine a dataset as multidimensional solid point cloud. You light it from each direction that ends up in the widest shadow on a wall. This shadow on the wall is the result of PCA.

LLE

LLE computes the position of the two-dimensional points, by reconstructing them with the same linear combination of set of K nearest neighbors as existing in the multidimensional space.

You can discover a manifold in this Data . The data is folded like a swissroll. LLE tries to unroll out this manifold.

LLE tries to flatten the Waves from the dataset.

MDS

MDS tries to preserve the distances between all points on the two dimensional plane like in the multidimensional space. But in MDS, preserving larger distances is more important than preserving smaller ones.

You can imagine, that each point is connected with each other point through a spring. Then the points get pressed on a two dimensional plane. Here the springs get compressed or expanded. MDS releases the pressure in the springs. In that way the points have, as best it can be, the same distance to each other, like in the multidimensional space.

ISOMAP

ISOMAP applies MDS to a modified set of similarities. In MDS, you can imagine that all points are connected to other points, but in ISOMAP points are only connected to their K-nearest neighbors in the sense of the Euclidean norm, forming a K-Nearest Neighbor graph. This is as if some of the longest springs in MDS were cut and similarity between their end-points replaced by the sum of similarities between intermediate springs along the shortest path in the graph, which better lie along manifold structures. So MDS is applied to the distances measured in this proximity graph. If K is set to the number of data, then the graph is complete and ISOMAP is identical to MDS. K is a scale parameter that the user has to tune arbitrarily.

t-SNE

t-SNE tries to keep far away points apart and nearby points together. It relies on probabilities that points are neighbours given their relative distances. t-SNE tries to preserve these probabilities in the projection. In high-dimensions, the average distance between any pair of points tends to increase with the dimension, while their variance tends towards a constant. Distances are far greater in high dimension and they are far more similar to each other than in 2 dimensions. This is one of the phenomena to which refers the curse of the dimensionality and that DR techniques have to face. The probabilities used by t-SNE to represent data similarities are especially robust to this phenomenon which make t-SNE widely used when it is important to preserve cluster patterns.

For the dataset t-SNE is delivering the best results. It is keeping similar digits together and separated from other digits. If a digit is in the wrong cluster, a look at the image usually explains why it is so.

PCA delivers no result for in this tool, because there are technical limitations of the browser.

YOUR TURN

There are important aspects of DR techniques you can discover.

Linearity

If the dimensions of the projection are linear combinations of original dimensions, then the DR technique is linear, otherwise it is non-linear. Linearity is interesting to link projection to original dimensions.

For the dataset :

The projected rays of PCA are on a straight line as they are in the original data space, so PCA is a linear DR technique.

The projected rays of t-SNE are on a curve, so t-SNE is a non-linear DR technique.

Explanatory Dimensions

Original dimensions are very meaningful in many cases.

The dimensions of the dataset are meaningful, they say something about the song.

On the other hand, for the dataset a single dimension (pixel greyvalue) say not much about the digit itself.

Iterative Optimization

Some DR techniques use iterative optimization algorithms to find a optima for the projection. That usually means, that no closed form solution exist. The most common problem with optimization are local optimas, which mean that we are not sure to have the "best" possible projection and need to try several different initial conditions to get closer to it.

You see the iterative progress for t-SNE. PCA on the other hand, can be solved in one-shot.

Stability

If the projection depends on initial conditions, and if the result of a DR technique changes each times we run the DR technique, then the DR technique is not stable. Stability is important for users to increase trust in the projection, different results while same data means something unusual and uncontrollable so the trust in the DR technique decreases.

PCA and LLE (with the same parameterization) have each time the same result on the same dataset.

MDS, ISOMAP, and t-SNE on the other hand, have each time different results, because the points of the projection get randomly initialized before they get optimized.

Locating and Identifying Distortions

Distortions are very problematic as patterns we see might not exist in data (false patterns) and patterns that exist in data might be missed (missed patterns). The map is here to summarize data without loosing too much of interesting patterns so we need to understand

there exist distortions,

where they are,

what kind they are,

how strong they are,

how can we overcome them.

Dataset :

To examine distortions (i. - iv.) activate the proximity visualization. If a ray gets crossed by another ray in a DR result, the points on the two cutted ends are missed neighbors. If points of a ray are separated from the ray, then these are called false neighbors.

Continuous Mapping between Projections and Original Dimensions

For some DR techniques the mapping from original data to the projection is known, so you can project new points without recomputing all the map. This mapping can be linear as with PCA, or non-linear.

If you activate the component planes for a dimension, you see the gradient in the background of the DR results.

If the mapping is linear, then the DR technique is linear,and if the mapping is non-linear, then the DR technique is non-linear.

But this need not be the other way round. If a DR technique is non-linear, than it is possible that a mapping do not exist.

Connections between DR Techniques

Some DR techniques are related to each other.

If the number of neighbors K is equal to the number of data, the result of ISOMAP is the same as MDS

This concept can further examined in the EXPLORE mode.

Sensitivity to Parameters

Parameters are to be tunded by end-user, understanding what is their influence is curcial, they impact the type of distortions that appear, the speed of convergence, and the scale at which data are summarized.

This concept can further examined in the EXPLORE mode.

REFERENCES

ref1

ref2

OVERVIEW

B brush

P

proximity viewPoints with bright cells are near the hovered point in the original high-dimensional dataset, points in dark cells are far away.