name: inverse layout: true class: left, top, inverse
.left-column[
- Pattern detector
- Visualizations help you search for possible models
- Help intuitively understand the data ] .right-column[
???
- For most people, vision is the strongest sense
- Recall improves 55% (10% => 65%) with the addition of a picture
- We've talked about the need to understand the data before using algorithms on it. Visualization can speed that process up.
- Use visualizations that surface patterns and relationships
- Know the context for the visualization
- Verify results
???
- For gaining intuition, focus on simple visualizations that help you see relationships in the data.
- At this time, labels, titles, etc. not very important. Multiple dimension in multiple windows? Fine!
- We'll discuss, but the context a visualization is going to be used in matters a lot. Don't feel like you have to import every cool infographic into your project
- Clustering, classification, outlier selection can be verified visually, e.g., highlighting points. Use it to gut check conclusions, even if you have to drastically reduce dimensionality
- Great for multidimensional data
- Just plot > 2 dimensions in different plots
- Reveals correlation, clustering, distribution, ...
???
- DM bread and butter. Often deal with high dimensionality, so scatter is one of the best ways to visualize
- Wide variety of patterns can be searched
???
- This data is for body positions over time
- Dimensions are the different angles for different body parts, like hip ankle, knee, over time
- We can see some strong patterns. Maybe we'll need to kernelize them to make them learnable, but we have a good understanding that there are, or are not relationships between the data
???
- Coordinates intuitively understandable
- Dependence on geographical area (e.g., when you'd like to depend on human impact instead)
- Lots of ways to bucket/aggregate
- 2004 Presidential election - Bush won 50.3% of the popular vote and Kerry 48.3%, does it looks like that here?
- img: http://www-personal.umich.edu/~mejn/election/2004/
???
- Don't be afraid to "bend" things to get different insights
- Each pixel represents 1,000 votes in the 2004 Presidential election
- Red ==> Bush
- Blue ==> Kerry
- Green ==> Nader
- Because some parts of the country have far more than 1000 votes per pixel, draw the pixel on the closest part of the map that isn't already used
- You lose precise vote locations, but you see how mixed the results actually are and how population density is involved
- Box plot
- aggregate data
- Bar charts
- simple summaries
- Pie charts
- compound proportions
???
- Box plots, for real data, still carry a lot of data
- Bar charts nice for summarizing, not great for exploring
- Same for pie charts. Pie charts are mostly bad, but can use in particular circumstances
- The visual aesthetics you use should be tied to the data
???
- What are some of the techniques we can use to tie data to a visual representation?
- img: Kevin Lynagh, http://keminglabs.com/talks/
- Position
- Length / Angle
- Area / Volume
- Color: Chroma Luminance
???
- Hadley Wickham slides, OSCON: http://cdn.oreillystatic.com/en/assets/1/event/80/Designing%20effective%20visualisations_%20matching%20data%20problems%20to%20our%20perceptual%20strengths%20%20Presentation.pdf
.left-column[
- Hue
- color type
- Chroma
- colorfulness, perceived color intensity
- Luminosity
- brightness, light-dark ] .right-column[
???
- Many other color spaces, probably most familiar with RGB
- HCL is useful because it separates the properties of a color into ones that can be mapped to data
- Hue: nominal, can't compare
- Chroma, Luminosity: numerical / comparable value
- Chroma vs Saturation: chroma perception relative to white, saturation measure of color intensity
- http://colorbrewer2.org/
- Type of comparison => type of color difference
- Lots of other practical features
- Some aesthetics can combine to form illusions
- http://www.michaelbach.de/ot/sze_sineIllusion/
???
- Line lengths can appear to look smaller when extended instead of right next to each other
http://www.youtube.com/embed/FWSxSQsspiQ
???
- We're good at comparing things side by side.
- We're bad at comparing things from memory.
- Geom
- Graphic element
- Aesthetics
- appearance of a geom
- Data
- raw, context, statistical aggregations of data
- Mapping
- functions which map data to geom properties or aesthetics
???
- We've talked about different aesthetics of showing data, we've talked about data, all that's needed is to bring them together
- Wilkinson, L. (2005), The Grammar of Graphics (2nd ed.). Statistics and Computing, New York: Springer.
- Rigorous way of describing graphics beyond "scatter plot" or "bar chart"
???
- Plot shows hypothetical sales of ice cream vs temperature
- Geoms
- points (actually, ticks are geoms, too)
- Data
- sales, temperature (and context: how large is the potential plot size)
- Mapping
- sales ==> y, temp ==> x
- img: http://www.mathsisfun.com/data/scatter-xy-plots.html
???
- Plot shows fruit popularity
- Geoms
- bars (and ticks, text)
- Data
- fruit to popularity
- Mapping
- popularity ==> height, fruit type ==> x, color
- img: http://www.mathsisfun.com/data/bar-graphs.html
???
- Shows travel options from SFO to Ithica, connecting flights, airports, etc.
- More complex, but still expressible via Grammar
- Geoms?
- rectangles, text, ticks
- Data?
- Carrier, flight time, layover time, cost, wifi available, airports
- Mapping?
- travel time ==> bar length, flight times ==> sub-bars, airline ==> color
- img: http://www.hipmonk.com
???
- Reading will go into a further extension of this, where the geoms are themselves other plots
- Clarity from data
- Avoid chart junk
- Techniques for displaying many types of data
???
- No talk on visualization would be complete without mentioning Tufte
- Great examples