docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 7180 4239cd2958c6 /usr/bin/docker-quickstart
hdfs dfs -help
hdfs dfs -help copyFromLocal
hdfs dfs -help ls
hdfs dfs -help cat
hdfs dfs -help setrep
hdfs dfs -ls /user/root/input
hdfs dfs -ls hdfs://hadoop-local:9000/data
output example:
-rw-r--r-- 1 root supergroup 5107 2017-10-27 12:57 hdfs://hadoop-local:9000/data/Iris.csv
^ factor of replication
hdfs dfs -setrep -w 4 /data/file.txt
hdfs dfs -mkdir /data
hdfs dfs -put /home/root/tmp/Iris.csv /data/
hdfs dfs -copyFromLocal /home/root/tmp/Iris.csv /data/
hdfs dfs -Ddfs.replication=2 -put /path/to/local/file /path/to/hdfs
copy ( small files only !!! ) from local to remote ( read from DataNodes and write to DataNodes !!!)
hdfs dfs -cp /home/root/tmp/Iris.csv /data/
hdfs distcp /home/root/tmp/Iris.csv /data/
hdfs get /path/to/hdfs /path/to/local/file
hdfs dfs -copyToLocal /path/to/hdfs /path/to/local/file
hdfs rm -r /path/to/hdfs-folder
hdfs rm -r -skipTrash /path/to/hdfs-folder
hdfs dfs -expunge
hdfs dfs -du -h /path/to/hdfs-folder
hdfs dfs -test /path/to/hdfs-folder
hdfs dfs -ls /
hdfs dfs -ls hdfs://192.168.1.10:8020/path/to/folder
the same as previous but with fs.defalut.name = hdfs://192.168.1.10:8020
hdfs dfs -ls /path/to/folder
hdfs dfs -ls file:///local/path == (ls /local/path)
show all sub-folders
hdfs dfs -ls -r
-touchz, -cat (-text), -tail, -mkdir, -chmod, -chown, -count ....
hdfs dfs -df -h
hdfs fsck /
hdfs balancer
hdfs dfsadmin -help
show statistic
hdfs dfsadmin -report
HDFS to "read-only" mode for external users
hdfs dfsadmin -safemode
hdfs dfsadmin -upgrade
hdfs dfsadmin -backup
hadoop jar {path to jar} {classname}
jarn jar {path to jar} {classname}
localhost:4200
root/hadoop
ssh root@localhost -p 2222
- shell web client (aka shell-in-a-box): localhost:4200 root / hadoop
- ambari-admin-password-reset
- ambari-agent restart
- login into ambari: localhost:8080 admin/{your password}
https://hortonworks.com/hadoop-tutorial/using-ipython-notebook-with-apache-spark/
PARK_MAJOR_VERSION is set to 2, using Spark2
Error in pyspark startup:
IPYTHON and IPYTHON_OPTS are removed in Spark 2.0+. Remove these from the environment and set PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS instead.
just set variable to using Spart1 inside script: SPARK_MAJOR_VERSION=1