Skip to content

Latest commit

 

History

History
359 lines (241 loc) · 7.43 KB

2014-04-24-Visualization.markdown

File metadata and controls

359 lines (241 loc) · 7.43 KB

name: inverse layout: true class: left, top, inverse


Visualization in Data Mining


Your Brain

.left-column[

  • Pattern detector
  • Visualizations help you search for possible models
  • Help intuitively understand the data ] .right-column[

]

???

Visual

  • For most people, vision is the strongest sense
  • Recall improves 55% (10% => 65%) with the addition of a picture
  • We've talked about the need to understand the data before using algorithms on it. Visualization can speed that process up.

Patterns

  • Use visualizations that surface patterns and relationships
  • Know the context for the visualization
  • Verify results

???

Steps

  • For gaining intuition, focus on simple visualizations that help you see relationships in the data.
  • At this time, labels, titles, etc. not very important. Multiple dimension in multiple windows? Fine!
  • We'll discuss, but the context a visualization is going to be used in matters a lot. Don't feel like you have to import every cool infographic into your project
  • Clustering, classification, outlier selection can be verified visually, e.g., highlighting points. Use it to gut check conclusions, even if you have to drastically reduce dimensionality

Scatter

  • Great for multidimensional data
  • Just plot > 2 dimensions in different plots
  • Reveals correlation, clustering, distribution, ...

???

Data Mining

  • DM bread and butter. Often deal with high dimensionality, so scatter is one of the best ways to visualize
  • Wide variety of patterns can be searched

Multiple Dimensions

???

vp

  • This data is for body positions over time
  • Dimensions are the different angles for different body parts, like hip ankle, knee, over time
  • We can see some strong patterns. Maybe we'll need to kernelize them to make them learnable, but we have a good understanding that there are, or are not relationships between the data

Geographic

.center[ ]

???

Trade-offs

  • Coordinates intuitively understandable
  • Dependence on geographical area (e.g., when you'd like to depend on human impact instead)
  • Lots of ways to bucket/aggregate
  • 2004 Presidential election - Bush won 50.3% of the popular vote and Kerry 48.3%, does it looks like that here?
  • img: http://www-personal.umich.edu/~mejn/election/2004/

Geographic

.center[ ]

???

  • Don't be afraid to "bend" things to get different insights
  • Each pixel represents 1,000 votes in the 2004 Presidential election
    • Red ==> Bush
    • Blue ==> Kerry
    • Green ==> Nader
  • Because some parts of the country have far more than 1000 votes per pixel, draw the pixel on the closest part of the map that isn't already used
  • You lose precise vote locations, but you see how mixed the results actually are and how population density is involved

Other Chart Types

  • Box plot
    • aggregate data
  • Bar charts
    • simple summaries
  • Pie charts
    • compound proportions

???

Types

  • Box plots, for real data, still carry a lot of data
  • Bar charts nice for summarizing, not great for exploring
  • Same for pie charts. Pie charts are mostly bad, but can use in particular circumstances

Aesthetics

  • The visual aesthetics you use should be tied to the data

???

Aesthetics


Larger Value?

  • Position
  • Length / Angle
  • Area / Volume
  • Color: Chroma Luminance

???

Slide Switch


Color: HCL

.left-column[

  • Hue
    • color type
  • Chroma
    • colorfulness, perceived color intensity
  • Luminosity
    • brightness, light-dark ] .right-column[

]

???

Color Spaces


ColorBrewer


Careful

???

Line Lengths

  • Line lengths can appear to look smaller when extended instead of right next to each other

Careful

http://www.youtube.com/embed/FWSxSQsspiQ

???

Comparisons

  • We're good at comparing things side by side.
  • We're bad at comparing things from memory.

Grammar of Graphics

  • Geom
    • Graphic element
  • Aesthetics
    • appearance of a geom
  • Data
    • raw, context, statistical aggregations of data
  • Mapping
    • functions which map data to geom properties or aesthetics

???

Bringing Together

  • We've talked about different aesthetics of showing data, we've talked about data, all that's needed is to bring them together
  • Wilkinson, L. (2005), The Grammar of Graphics (2nd ed.). Statistics and Computing, New York: Springer.
  • Rigorous way of describing graphics beyond "scatter plot" or "bar chart"

Scatter Plot

???

Ice Cream

  • Plot shows hypothetical sales of ice cream vs temperature
  • Geoms
    • points (actually, ticks are geoms, too)
  • Data
    • sales, temperature (and context: how large is the potential plot size)
  • Mapping
    • sales ==> y, temp ==> x
  • img: http://www.mathsisfun.com/data/scatter-xy-plots.html

Bar Plot

.white-background[ ]

???

Fruit


Hipmonk

???

Fruit

  • Shows travel options from SFO to Ithica, connecting flights, airports, etc.
  • More complex, but still expressible via Grammar
  • Geoms?
    • rectangles, text, ticks
  • Data?
    • Carrier, flight time, layover time, cost, wifi available, airports
  • Mapping?
    • travel time ==> bar length, flight times ==> sub-bars, airline ==> color
  • img: http://www.hipmonk.com

Recursive

???

Complex

  • Reading will go into a further extension of this, where the geoms are themselves other plots

Tufte

  • Clarity from data
  • Avoid chart junk
  • Techniques for displaying many types of data

???

Tufte

  • No talk on visualization would be complete without mentioning Tufte
  • Great examples

Break