forked from uwescience/datasci_course_materials
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
38 lines (21 loc) · 1.14 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Instructions on how to run example.pig.
================================================================
STEP 1:
Importing the myudfs.jar file in pig. You need this because
example.pig uses the function RDFSplit3(...) which is defined in myudfs.jar:
OPTION 1: Do nothing. example.pig is already configured to read
myudfs.jar from S3, through the line:
register s3n://uw-cse-344-oregon.aws.amazon.com/myudfs.jar
OPTION 2: do-it-yourself; run this on your local machine:
cd pigtest
ant -- this should create the file myudfs.jar
Next, modify example.pig to:
register ./myudfs.jar
Next, after you start the AWS cluster, copy myudfs.jar to the AWS
Master Node (see hw6-awsusage.html).
================================================================
STEP2
Start an AWS Cluster (see hw6-awsusage.html), start pig interactively,
and cut and paste the content of example.pig. I prefer to do this line by line
Note: The program may appear to hang with a 0% completion time... go check the job tracker. Scroll down. You should see a MapReduce job running with some non-zero progress.
Also note that the script will generate more than one MapReduce job.