The purpose of this project is to:
- Get Apple's stock data for the past 30 days using Barchart OnDemand's free market data APIs.
- Get Apple's tweets for the past 30 days (if possible).
- Load data into BigQuery.
In this repository, you will find three files of interest:
- Apple_data.py: contains the code to get the stock data from Barchart API, the tweets data from the Twitter API, and to load both data into BigQuery.
- Data Audit & Analysis - Stock Data.ipynb: contains data audit, analysis from Apple's stock market data.
- Tweets_analysis.ipynb: contains data audit, analysis, and insights from @realDonaldTrump's tweets.
- BigQuery dataset can be publicly viewed here: https://console.cloud.google.com/bigquery?project=test-project-datalab-225214&p=test-project-datalab-225214&d=AAPL&t=tweets_realdonaldtrump&page=table
- Clone the repository and install necessary packages.
- You will also need to have a Google Cloud Platform account & project.
- You will need a service account with at least Editor role in BigQuery.
- You will need Barchart API_KEY, Twitter: {CONSUMER_KEY, CONSUMER_SECRET, TOKEN_KEY, TOKEN_SECRET}
In your terminal:
python Apple_data.py --symbol='COMPANY_SYMBOL' --consumerKey='YOUR_CONSUMER_KEY' --consumerSecret='YOUR_CONSUMER_SECRET' --tokenKey='YOUR_TOKEN_KEY' --tokenSecret='YOUR_TOKEN_SECRET' --twitterHandle="YOUR_TWITTER_HANDLE_CHOICE" --barchartKey='YOUR_BARCHART_API_KEY'
To make the job a recurring one, add the python script to crontab:
crontab -e
Edit crontab similar to:
*/10 * * * * /usr/bin/python /path/to/you/python/script.py
Restart cron for job to occur:
sudo systemctl restart cron
- @Apple has not posted any tweets, nor has @Apple been mentioned in other user tweets. Thus, for the purpose of demonstrating a functional ETL code, I have used @realDonaldTrump instead.
- This BigQuery dataset will be deleted after 2 weeks.
- Ensure duplicates are not stored in BigQuery before data ingestion.
- Improve most frequent word count in tweets by trouble shooting stop words used.
- For portability, create a python package for code, along with the setup and requirement files.
- For recurring job running on the Cloud, need to use App Engine Cron Service.