Fetching live tweets from twitter using Twitter API.
Querying Data from Twitter using Twitter API Sending Data Streams into Spark and perform data processing using Spark Streaming Pushing the processed data on live Dashboard
These instructions will get you a copy of the project up and running on your local machine for development purpose. See deployment for notes on how to deploy the project on a system.
$ xcode-select --install $ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
$ brew install scala
##Install java to support the apache spark indtallation with java version 1.8.0_144
$ brew install apache-spark
https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html
-
Delete index (if already exist) $ curl -XDELETE localhost:9200/twitter
-
Create index $ curl -XPUT 'http://127.0.0.1:9200/twitter' -d ' { "settings" : { "index" : { "number_of_shards" : 5, "number_of_replicas" : 1 } } }'
-
Add fields mapping $ curl -XPUT 'http://127.0.0.1:9200/twitter/_mapping/tweets' -d ' { "properties": { "count" : { "type" : "long" }, "hashtag" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "timestamp" : { "type" : "date", "format" : "yyyy/MM/dd HH:mm:ss" } } }'
-
Check settings and fields details of an index (open on browser) http://127.0.0.1:9200/twitter?pretty=true
-
Check data of an index (open on browser) http://127.0.0.1:9200/twitter/_search?pretty=true
$ virtualenv ve --no-site-packages
$ source ve/bin/activate
$ pip install -r REQUIREMENTS
$ cp jars/elasticsearch-spark-20_2.11-5.5.0.jar ve/lib/python/site-packages/pyspark/jars/
$ python twitterfetch_app.py
$ python sparkstreams_app.py