Skip to content

Latest commit

 

History

History

lab03

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Lab03

Put files:

hdfs dfs -put pg100.txt input

Running python code with hadoop steaming:

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.1.jar -input input -output output -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py

Remeber to delete output directory if exist.

Running mrjob code:

python3 mr_CoTermNSStripe.py -r hadoop hdfs:///user/comp9313/input/pg100.txt -o hdfs:///user/comp9313/output

-o means output directory