ELEN6889_Final_Project

A streaming processing project about NBA games based on Twitter API.
Mainly focus on sentiment analysis by location and time.
You can browser our website from here, but when we run out of GCP credit, the app will be turn down.

How to Run

Install all dependencies

pip install -r requirements.txt

Run flask only

cd webpage/flask
python3 __init__.py

Note that for Big Query: (if you find errors about Big Query)

in local you should connect your CLI to your google account
or in gcp you should give your account IAM permission and export like this:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key-file.json"

About Requirements

Data/Input
- What data sources will you use?
- Will the data be stored and replayed, or pulled in live?
- Do you need to create new connectors to access the data?
Techniques/Application
- What is the problem you will solve?
- Is there a set of references that motivate this?
- Do you need to integrate other tools?
Results
- How will you present your results?
- What will you show as a demo?
Next Steps
- Will you use any algorithms from class?
- Will you use any optimizations?

About Implementation

Dataset

We query several team names as hashtags to get streaming data during 4/13/2023-4/21/2023 (because we have ELEVATED level twitter api so we only can search about recent 7 days), and finally we use this csv for the further analysis.
Also because there is a limit when using twitter api, if it exceeds the limit, we wait for 15 min and then repeat until all teams done.

Analysis

We use Spark streaming, using 30 min as a start time interval and 1 h as a duration to analyze data, as you can see here
Also we try a smaller interval, using 30 min as a start time interval and 30 min as a duration to analyze data, as you can see here

Webpage

We use Flask as our web framework, and some html frontpages written by Jinjia.
For the frontend design, we use this as our template.
For plot design, we post info from '/plot', and generate graphs by this python script, then show them on the same page.

About GCP deployment

We run cluster (3 masters, 2 workers) on GCP to analyze our streaming data, save the results in Google Cloud Storage (GCS) and combine them into one csv.
Then import data to Big Query and save as a table, also write a .py file to query from the result.
Run a VM instance to deploy our web app and set up environment.

About Results

Some codes and graphs showing the information we get
An interactive web app to query streaming data and generate graphs
You can browser our website from here

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
analysis		analysis
dataset		dataset
test		test
webpage/flask		webpage/flask
.gitignore		.gitignore
6889 Final Project_Presentation.pdf		6889 Final Project_Presentation.pdf
Data_Analysis_in_NBA_Sport_Events_Streaming.pdf		Data_Analysis_in_NBA_Sport_Events_Streaming.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ELEN6889_Final_Project

How to Run

About Requirements

About Implementation

Dataset

Analysis

Webpage

About GCP deployment

About Results

About Screenshot

About

Releases

Packages

Contributors 3

Languages

License

Iris1e27/ELEN6889_Final_Project

Folders and files

Latest commit

History

Repository files navigation

ELEN6889_Final_Project

How to Run

About Requirements

About Implementation

Dataset

Analysis

Webpage

About GCP deployment

About Results

About Screenshot

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages