Skip to content

Big Data project analyzing forum questions on Stack Exchange. Predicts accepted answers using Stack Exchange Data Dump from archive.org.

Notifications You must be signed in to change notification settings

KMChris/bigdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forum Question Analyzer

This project utilizes Big Data techniques to analyze questions from Stack Exchange forum. Using the Stack Exchange Data Dump from archive.org, it predicts whether a question will receive an accepted answer.

Data Source

The data is obtained from the Stack Exchange Data Dump, available here. Our project uses the TeX forum data.

Setup

  1. Download the Stack Exchange Data Dump from archive.org.
  2. Extract the data dump into the tex.stackexchange.com folder.
  3. Install Python 3.8 or higher.
  4. Install Spark (3.5.0 recommended).
  5. Install the required dependencies: pip install -r requirements.txt.
  6. Run jupyter notebook and open analysis.ipynb, features.ipynb or statistics.ipynb to see the results of our analysis.

Results

Our model achieved an accuracy of 74.24% in predicting accepted answers.

Contributors

  • Krzysztof Mizgała
  • Julia Czerniecka
  • Wiktoria Gałdusińska
  • Jerzy Grunwald
  • Maciej Kosierb

Feel free to contribute and improve our project!

About

Big Data project analyzing forum questions on Stack Exchange. Predicts accepted answers using Stack Exchange Data Dump from archive.org.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published