- A streaming processing project about NBA games based on Twitter API.
- Mainly focus on sentiment analysis by location and time.
- You can browser our website from here, but when we run out of GCP credit, the app will be turn down.
- Install all dependencies
pip install -r requirements.txt
- Run flask only
cd webpage/flask
python3 __init__.py
- Note that for Big Query: (if you find errors about Big Query)
- in local you should connect your CLI to your google account
- or in gcp you should give your account IAM permission and export like this:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key-file.json"
- Data/Input
- What data sources will you use?
- Will the data be stored and replayed, or pulled in live?
- Do you need to create new connectors to access the data?
- Techniques/Application
- What is the problem you will solve?
- Is there a set of references that motivate this?
- Do you need to integrate other tools?
- Results
- How will you present your results?
- What will you show as a demo?
- Next Steps
- Will you use any algorithms from class?
- Will you use any optimizations?
- We query several team names as hashtags to get streaming data during 4/13/2023-4/21/2023 (because we have ELEVATED level twitter api so we only can search about recent 7 days), and finally we use this csv for the further analysis.
- Also because there is a limit when using twitter api, if it exceeds the limit, we wait for 15 min and then repeat until all teams done.
- We use Spark streaming, using 30 min as a start time interval and 1 h as a duration to analyze data, as you can see here
- Also we try a smaller interval, using 30 min as a start time interval and 30 min as a duration to analyze data, as you can see here
- We use Flask as our web framework, and some html frontpages written by Jinjia.
- For the frontend design, we use this as our template.
- For plot design, we post info from '/plot', and generate graphs by this python script, then show them on the same page.
-
We run cluster (3 masters, 2 workers) on GCP to analyze our streaming data, save the results in Google Cloud Storage (GCS) and combine them into one csv.
-
Then import data to Big Query and save as a table, also write a .py file to query from the result.
-
Run a VM instance to deploy our web app and set up environment.
- Some codes and graphs showing the information we get
- An interactive web app to query streaming data and generate graphs
- You can browser our website from here