forked from alanfgates/programmingpig
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
27 lines (19 loc) · 954 Bytes
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
This directory contains Pig Latin scripts and code for UDFs used to prepare the
data for these examples.
Baseball setup:
The tomap UDF, used to convert batting statistics into a map, can be compiled
by doing:
javac -cp <path>/pig.jar tomap.java
where <path> is the path to a copy of pig.jar. It can then be placed in the
jar by doing
jar -cf udf.jar tomap.class
Once you have downloaded the baseball data (see ../data/README for information
on obtaining the data) you can run
<path>/bin/pig -x local baseball.pig
NYSE setup:
The total NYSE data was too large to host and would take too long for convient
examples. So the example data contains only information for ticker symbols
beginning with 'C' from the year 2009.
If you would like to load the entire data set into your cluster
for slightly more realistic testing, the examples should work once you adjust
the file paths for load and store. I did not change the schema of the data.