QQRank is licensed under ASL 2.0 and other lenient licenses, allowing its use for academic and commercial purposes without restrictions.
- Simply download the script
run_QQRank.py
from here. - Copy the file to a destination having enough space for a few downloads and run the script as
python run_QQRank.py
. - This downloads required data, creates folders and runs QQRank with some training and test files from the Qatar Living Data. To make your own changes, run on a different dataset or run stacking features follow procedures below.
- Download the jar file of the project from here.
- Alternatively, download the zip of the java project and import it as a maven project in eclipse for experimentation.
- QQRank requires a training xml file, a test xml file and unannotated data to train models.
- The system was trained and tested on SemEval 2017 - Task 3: Community Question Answering Subtask B data.
- Create a directory named xml_files in your local machine.
- The training+dev data can be downloaded from here.
- The test data for 2017 can be downloaded from here.
- After unzipping this folder, move to
semeval2016-task3-cqa-ql-traindev-v3.2/v3.2/train/
. The entire training data for Task 3 can be found here. - Choose
SemEval2016-Task3-CQA-QL-train-part1.xml
orSemEval2016-Task3-CQA-QL-train-part2.xml
train files for training and copy it to xml_files directory. Rename the training file train.xml. - Alternatively, combine various training xml files into one file train.xml for larger training data. Make sure to preserve the XML tree structure while doing this.
- Similarly, choose dev/test files in
semeval2016-task3-cqa-ql-traindev-v3.2/v3.2/dev/
orsemeval2016_task3_tests/SemEval2016_task3_test/English/
as test data and rename it test.xml. - The unannotated data can be downloaded from here.
- Download the python scripts required to run the system from here.
- Unzip this
resources_QQRank
folder in a suitable place. - The trained word embeddings on the large unannotated data can be found here.
- Running QQRank jar can be done in two sequential steps
- Run QQRank for the generic set of features
java -Xmx10g -jar QQRank.jar [absolute-path-to-xml_files-folder] [absolute-path-to-resources-folder]
- Subsequently, Run QQRank along with the stacking features obtained from the scores of QARank and QCRank
- For this, the jar file takes two additional arguments, the first being the path of the
result_files
folder of QARank (subtask A) output and the second being the path ofresult_files
folder of QCRank (subtask C) output - With these additional arguments, run QQRank as
java -Xmx10g -jar QQRank.jar [absolute-path-to-xml_files-folder] [absolute-path-to-resources-folder] [absolute-path-to-result_files-QARank] [absolute-path-to-result_files-QCRank]
- For this, the jar file takes two additional arguments, the first being the path of the
- The system will generate all folders and required files.
- The final MAP scores of the system and the SVM accuracy can be found in result_files/final_scores.txt file.
- Users can run the system on a different dataset, given the training and test files are in the format as in SemEval 2017 - Task 3.
- The evaluation scripts used in the system can be looked up here.