SparkStreaming爱奇艺实时流统计项目——实战笔记
- 将项目克隆到本地
- 将Aiqiyi_SparkStreaming和Aiqiyi_Web项目分别导入IDEA
- 正确设置Aiqiyi_Data里文件的路径
- 阅读代码,运行项目
- hadoop-2.6.4
- zookeeper-3.4.5
- kafka_2.12-0.11.0.2
- apache-flume-1.6.0-bin
- hbase-0.99.2-bin
- spark-2.1.0-bin-hadoop2.6
- 统计爱奇艺每个视频类别的访问量
- 统计从搜索引擎引流过来的类别的访问量
- 编写python脚本模拟产生日志
- flume采集日志传送到kafka
- StreamingApp主程序从kafka获取日志并进行清洗
- 由清洗日志统计每个类别访问量,并保存到hbase数据库
- 由清洗日志统计从搜索引擎引流过来的类别的访问量,并保存到hbase数据库
- 通过读取hbase数据库的数据,进行数据可视化展示
[hadoop@mini1 hadoop]$ sbin/start-all.sh
[hadoop@mini1 zookeeper]$ bin/zkServer.sh start
- 启动kafka(三台机器)
[hadoop@mini1 kafka]$ bin/kafka-server-start.sh config/server.properties &
- 创建topic
[hadoop@mini1 kafka]$ bin/kafka-topics.sh
--create
--zookeeper mini1:2181
--replication-factor 1
--partitions 1
--topic flumeTopic
- 启动consumer
[hadoop@mini1 kafka]$ bin/kafka-console-consumer.sh
--zookeeper mini1:2181
--topic flumeTopic
--from-beginning
- 增加配置文件a1.conf
- 启动flume
[hadoop@mini1 flume]$ bin/flume-ng agent
-c conf
-f conf/a1.conf
-n a1
-Dflume.root.logger=INFO,console
- 启动hbase
[hadoop@mini1 hbase]$ bin/start-hbase.sh
- 启动hbase shell
[hadoop@mini1 hbase]$ bin/hbase shell
- 创建hbase表
hbase(main):001:0> create 'type','info'
hbase(main):001:1> create 'search','info'
运行StreamingApp程序,准备接收、处理、保存数据
[hadoop@mini1 aiqiyi_logs]$ ./log_generator.sh
hbase(main):007:2> scan 'type'
hbase(main):007:3> scan 'search'
- File → Project Structure → Artifacts → “+” → JAR → From modules with dependencies → Main Class:StreamingApp → OK
- Build → Build Artifacts → Build
[hadoop@mini1 spark]$ bin/spark-submit
--master yarn
--class main.StreamingApp
/home/hadoop/aiqiyi_logs/Aiqiyi_SparkStreaming.jar
[hadoop@mini1 ~]$ jps
8458 Main // hbase shell
7426 Hmaster
4325 NameNode
4470 SecondaryNameNode
2076 QuorumPeerMain
2941 Kafka
4605 ResourceManager
3517 Application // flume
7662 Jps
3230 ConsoleConsumer
[hadoop@mini2 ~]$ jps
2194 Kafka
1556 QuorumPeerMain
2582 NodeManager
2508 DataNode
4334 Jps
4063 HRegionServer
[hadoop@mini3 ~]$ jps
2497 DataNode
3749 Jps
3607 HRegionServer
2184 Kafka
2588 NodeManager
1647 QuorumPeerMain
localhost:8080/count
- 打包并上传jar包
Maven Projects -> Lifecycle -> package
- 运行作业
[hadoop@mini1 aiqiyi_logs]$ java –jar aiqiyiweb-0.0.1-SNAPSHOT.jar
- 浏览器访问
mini1:8080/count