DREDA

dimensionality reduced exploratory data analysis

What

Explore high dimensionality data via reduction and exploration using Three.js

Why

Sometimes you have too many fields, features, columns... dimensions, to your data.

Wikipedia:

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.

So you start to make a document-term matrix. There might be hundreds of thousands of terms... a.k.a. dimensions. Even if you truncate to the top 10,000 most important terms, you're still stuck with completely un-plottable data. Dimensionality reduction is used in many scenarios to visualize high dimensional data.

If you reduce your data down, say via singular value decomposition or t-distributed stochastic neighbor embedding or principal component analysis, to three dimensions, you can visualize it with this tool. Currently Dreda supports 4 dimensions (x,y,z, and color). Color assignments are chosen randomly from a seaborn HUSL color space (colors for humans).

I had been doing some visualizations in iPython and they left much to be desired. The data input here is just the ouput of a pandas data frame to_json() - a JSON object with an inner object for each column - index and value pairs in each inner object

E.g.

{
  "x":
  {
    "0":117.0501353217,
    "1":63.4789054268,
    "2":-92.4110611211
  },
  "y":
    {
      "0":-33.8277679817,
      "1":120.3959209587,
      "2":172.3790645372
    },
  "z":
    {
    "0":-17.441790351,
    "1":-34.4737315608,
    "2":-224.6172323059
    },
  "cid":
    {
      "0":4.0,
      "1":1.0,
      "2":4.0
    }
}

How

You can see it all running live at http://metasyn.pw/dreda How ever, if you want to change things around, you might want to clone, run python -m SimpleHTTPSever, and open localhost:8000 in your browser.


Authors

License