Skip to content

Latest commit

 

History

History
 
 

setup

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
This directory contains Pig Latin scripts and code for UDFs used to prepare the
data for these examples.  

Baseball setup:
The tomap UDF, used to convert batting statistics into a map, can be compiled
by doing:

javac -cp <path>/pig.jar tomap.java

where <path> is the path to a copy of pig.jar.  It can then be placed in the
jar by doing 

jar -cf udf.jar tomap.class 

Once you have downloaded the baseball data (see ../data/README for information
on obtaining the data) you can run

<path>/bin/pig -x local baseball.pig 

NYSE setup:
The total NYSE data was too large to host and would take too long for convient 
examples.  So the example data contains only information for ticker symbols
beginning with 'C' from the year 2009. 

If you would like to load the entire data set into your cluster
for slightly more realistic testing, the examples should work once you adjust
the file paths for load and store.  I did not change the schema of the data.