Forum Question Analyzer

This project utilizes Big Data techniques to analyze questions from Stack Exchange forum. Using the Stack Exchange Data Dump from archive.org, it predicts whether a question will receive an accepted answer.

Data Source

The data is obtained from the Stack Exchange Data Dump, available here. Our project uses the TeX forum data.

Setup

Download the Stack Exchange Data Dump from archive.org.
Extract the data dump into the tex.stackexchange.com folder.
Install Python 3.8 or higher.
Install Spark (3.5.0 recommended).
Install the required dependencies: pip install -r requirements.txt.
Run jupyter notebook and open analysis.ipynb, features.ipynb or statistics.ipynb to see the results of our analysis.

Results

Our model achieved an accuracy of 70.92% in predicting accepted answers.

Contributors

Krzysztof Mizgała
Julia Czerniecka
Wiktoria Gałdusińska
Jerzy Grunwald
Maciej Kosierb

Feel free to contribute and improve our project!

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
raport		raport
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
features.ipynb		features.ipynb
requirements.txt		requirements.txt
statistics.ipynb		statistics.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forum Question Analyzer

Data Source

Setup

Results

Contributors

About

Releases

Packages

Contributors 5

Languages

KMChris/bigdata

Folders and files

Latest commit

History

Repository files navigation

Forum Question Analyzer

Data Source

Setup

Results

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages