datasci_course_materials/cloud_quiz at master · waichee/datasci_course_materials

History

Name		Name	Last commit message	Last commit date
parent directory ..
pigtest		pigtest
README.txt		README.txt
example.pig		example.pig
foxyproxy.xml		foxyproxy.xml

README.txt

Instructions on how to run example.pig.

================================================================

STEP 1:

Importing the myudfs.jar file in pig.  You need this because
example.pig uses the function RDFSplit3(...) which is defined in myudfs.jar:

OPTION 1: Do nothing.  example.pig is already configured to read
myudfs.jar from S3, through the line:

register s3n://uw-cse-344-oregon.aws.amazon.com/myudfs.jar


OPTION 2:  do-it-yourself; run this on your local machine:

cd pigtest
ant     --  this should create the file myudfs.jar

Next, modify example.pig to:

register ./myudfs.jar

Next, after you start the AWS cluster, copy myudfs.jar to the AWS
Master Node (see hw6-awsusage.html).

================================================================

STEP2

Start an AWS Cluster (see hw6-awsusage.html), start pig interactively,
and cut and paste the content of example.pig.


Note: The program may appear to hang with a 0% completion time... go check the job tracker. Scroll down. You should see a MapReduce job running with some non-zero progress. 

Also note that the script will generate more than one MapReduce job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud_quiz

cloud_quiz

README.txt

Files

cloud_quiz

Directory actions

More options

Directory actions

More options

Latest commit

History

cloud_quiz

Folders and files

parent directory

README.txt