This project utilizes Big Data techniques to analyze questions from Stack Exchange forum. Using the Stack Exchange Data Dump from archive.org, it predicts whether a question will receive an accepted answer.
The data is obtained from the Stack Exchange Data Dump, available here. Our project uses the TeX forum data.
- Download the Stack Exchange Data Dump from archive.org.
- Extract the data dump into the
tex.stackexchange.com
folder. - Install Python 3.8 or higher.
- Install Spark (3.5.0 recommended).
- Install the required dependencies:
pip install -r requirements.txt
. - Run jupyter notebook and open
analysis.ipynb
,features.ipynb
orstatistics.ipynb
to see the results of our analysis.
Our model achieved an accuracy of 70.92% in predicting accepted answers.
- Krzysztof Mizgała
- Julia Czerniecka
- Wiktoria Gałdusińska
- Jerzy Grunwald
- Maciej Kosierb
Feel free to contribute and improve our project!