TwitterStreaming

Fetching live tweets from twitter using Twitter API.

Real Time Data Streaming with Spark: Twitter Hashtag Count Analysis

Project Flow

Querying Data from Twitter using Twitter API Sending Data Streams into Spark and perform data processing using Spark Streaming Pushing the processed data on live Dashboard

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development purpose. See deployment for notes on how to deploy the project on a system.

Prerequisites: What things you need to install the software and how to install them

Install brew and scala to fulfill the spark characteristics

Install brew with below steps:

$ xcode-select --install $ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Installing:

Install Scala2.11.8

$ brew install scala

##Install java to support the apache spark indtallation with java version 1.8.0_144

Download the jre-8u65-macosx-x64.pkg file, launch th file and complete the installation

Install Apache Spark2.2.0 for data stream processing

$ brew install apache-spark

Make sure JDK is installed before beginning the above steps.

Install Elasticsearch

https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html

View on browser

http://127.0.0.1:9200/

Setup Elasticsearch

Delete index (if already exist) $ curl -XDELETE localhost:9200/twitter
Create index $ curl -XPUT 'http://127.0.0.1:9200/twitter' -d ' { "settings" : { "index" : { "number_of_shards" : 5, "number_of_replicas" : 1 } } }'
Add fields mapping $ curl -XPUT 'http://127.0.0.1:9200/twitter/_mapping/tweets' -d ' { "properties": { "count" : { "type" : "long" }, "hashtag" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "timestamp" : { "type" : "date", "format" : "yyyy/MM/dd HH:mm:ss" } } }'
Check settings and fields details of an index (open on browser) http://127.0.0.1:9200/twitter?pretty=true
Check data of an index (open on browser) http://127.0.0.1:9200/twitter/_search?pretty=true

Setup System

Create a virtual environment

$ virtualenv ve --no-site-packages

Activate virtual environment

$ source ve/bin/activate

Install pyspark (spark-2.2.0)

$ pip install -r REQUIREMENTS

Copy file elasticsearch-spark-20_2.11-5.5.0.jar in jars folder

$ cp jars/elasticsearch-spark-20_2.11-5.5.0.jar ve/lib/python/site-packages/pyspark/jars/

Run "twitterfetch_app.py" to get tweets

$ python twitterfetch_app.py

Run "sparkstreams_app.py" to store data in spark RDD and transfer it to elasticsearch

$ python sparkstreams_app.py

Spark UI

http://127.0.0.1:4042/

Install Kibana

https://www.elastic.co/downloads/kibana

View on browser

http://127.0.0.1:5601/

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
DS_Store		DS_Store
FinalPresentation_454.pptx		FinalPresentation_454.pptx
LICENSE		LICENSE
ProjectReport-CPSC454-Final.docx		ProjectReport-CPSC454-Final.docx
README.md		README.md
checkpoint-1512625368000.bk.crc		checkpoint-1512625368000.bk.crc
demo.py		demo.py
log-1512625338003-1512625398003		log-1512625338003-1512625398003
sparkstreams_app.py		sparkstreams_app.py
twitterfetch_app.py		twitterfetch_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TwitterStreaming

Real Time Data Streaming with Spark: Twitter Hashtag Count Analysis

Project Flow

Getting Started

Prerequisites: What things you need to install the software and how to install them

Install brew and scala to fulfill the spark characteristics

Install brew with below steps:

Installing:

Install Scala2.11.8

Download the jre-8u65-macosx-x64.pkg file, launch th file and complete the installation

Install Apache Spark2.2.0 for data stream processing

Make sure JDK is installed before beginning the above steps.

Install Elasticsearch

View on browser

Setup Elasticsearch

Setup System

Create a virtual environment

Activate virtual environment

Install pyspark (spark-2.2.0)

Copy file elasticsearch-spark-20_2.11-5.5.0.jar in jars folder

Run "twitterfetch_app.py" to get tweets

Run "sparkstreams_app.py" to store data in spark RDD and transfer it to elasticsearch

Spark UI

Install Kibana

View on browser

Create various Visualizer and add them to Kibana Dashboard.

About

Releases

Packages

Languages

License

nagask/Spark-with-Kibana-and-twitter-streaming

Folders and files

Latest commit

History

Repository files navigation

TwitterStreaming

Real Time Data Streaming with Spark: Twitter Hashtag Count Analysis

Project Flow

Getting Started

Prerequisites: What things you need to install the software and how to install them

Install brew and scala to fulfill the spark characteristics

Install brew with below steps:

Installing:

Install Scala2.11.8

Download the jre-8u65-macosx-x64.pkg file, launch th file and complete the installation

Install Apache Spark2.2.0 for data stream processing

Make sure JDK is installed before beginning the above steps.

Install Elasticsearch

View on browser

Setup Elasticsearch

Setup System

Create a virtual environment

Activate virtual environment

Install pyspark (spark-2.2.0)

Copy file elasticsearch-spark-20_2.11-5.5.0.jar in jars folder

Run "twitterfetch_app.py" to get tweets

Run "sparkstreams_app.py" to store data in spark RDD and transfer it to elasticsearch

Spark UI

Install Kibana

View on browser

Create various Visualizer and add them to Kibana Dashboard.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages