Hive presentation at CloudCon 2013 in San Francisco
- Grover Hive presentation.pdf: Slides used in the presentation
- 2008.tar.gz: Flight delay dataset from 2008.
- airports.csv: Dataset linking airport codes to their full names. More details in Introduction section.
- README.md: This file.
There are 2 datasets in the repo.
a) The first dataset contains on-time flight performance data from 2008, originally released by Research and Innovative Technology Administration (RITA). The source of this dataset is http://stat-computing.org/dataexpo/2009/the-data.html. The dataset
b) The second dataset contains listing of various airport codes in continental US, Puerto Rico and US Virgin Islands. The source of this dataset is http://www.world-airport-codes.com/ The data was scraped from this website and then cleansed to be in its present CSV form.