Skip to content

Iris1e27/ELEN6889_Final_Project

Repository files navigation

ELEN6889_Final_Project

  • A streaming processing project about NBA games based on Twitter API.
  • Mainly focus on sentiment analysis by location and time.
  • You can browser our website from here, but when we run out of GCP credit, the app will be turn down.

How to Run

  1. Install all dependencies
  • pip install -r requirements.txt
  1. Run flask only
  • cd webpage/flask
  • python3 __init__.py
  1. Note that for Big Query: (if you find errors about Big Query)
  • in local you should connect your CLI to your google account
  • or in gcp you should give your account IAM permission and export like this:
  • export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key-file.json"

About Requirements

  • Data/Input
    • What data sources will you use?
    • Will the data be stored and replayed, or pulled in live?
    • Do you need to create new connectors to access the data?
  • Techniques/Application
    • What is the problem you will solve?
    • Is there a set of references that motivate this?
    • Do you need to integrate other tools?
  • Results
    • How will you present your results?
    • What will you show as a demo?
  • Next Steps
    • Will you use any algorithms from class?
    • Will you use any optimizations?

About Implementation

Dataset

  • We query several team names as hashtags to get streaming data during 4/13/2023-4/21/2023 (because we have ELEVATED level twitter api so we only can search about recent 7 days), and finally we use this csv for the further analysis.
  • Also because there is a limit when using twitter api, if it exceeds the limit, we wait for 15 min and then repeat until all teams done.

Analysis

  • We use Spark streaming, using 30 min as a start time interval and 1 h as a duration to analyze data, as you can see here
  • Also we try a smaller interval, using 30 min as a start time interval and 30 min as a duration to analyze data, as you can see here

Webpage

  • We use Flask as our web framework, and some html frontpages written by Jinjia.
  • For the frontend design, we use this as our template.
  • For plot design, we post info from '/plot', and generate graphs by this python script, then show them on the same page.

About GCP deployment

  1. We run cluster (3 masters, 2 workers) on GCP to analyze our streaming data, save the results in Google Cloud Storage (GCS) and combine them into one csv.

  2. Then import data to Big Query and save as a table, also write a .py file to query from the result.

  3. Run a VM instance to deploy our web app and set up environment.

About Results

  • Some codes and graphs showing the information we get
  • An interactive web app to query streaming data and generate graphs
  • You can browser our website from here

About Screenshot

ce063309ccb94acaa15b5416bfcec53

56f8d63c832f2ff9c0b285c90d500d8

4f0af2d8b70892293674bdeafc4224f

image

image

e03052096b11c5c68c8a89ae8702c87

6814876c822c360846dcd1bcf7e273f

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •