QARank is licensed under ASL 2.0 and other lenient licenses, allowing its use for academic and commercial purposes without restrictions.
- Simply download the script
run_QARank.py
from here. The code runs in Python 2. - Copy the file to a destination having enough space for a few downloads and run the script as
python run_QARank.py
. - This downloads required data, creates folders and runs QARank with some training and test files from the Qatar Living Data. To make your own changes or run on a different dataset follow procedures below.
- Download the jar file of the project from here.
- Alternatively, download the zip of the java project and import it as a maven project in eclipse for experimentation.
- QARank requires a training xml file, a test xml file and unannotated data to train models.
- The system was trained and tested on SemEval 2017 - Task 3: Community Question Answering Subtask A data.
- Create a directory named xml_files in your local machine.
- The training+dev data can be downloaded from here.
- The test data for 2017 can be downloaded from here.
- After unzipping this folder, move to
semeval2016-task3-cqa-ql-traindev-v3.2/v3.2/train/
. The entire training data for Task 3 can be found here. - Choose any of the subtask A train files for training and copy it to xml_files directory. Rename the training file train.xml.
- Alternatively, combine various training xml files into one file train.xml for larger training data. Make sure to preserve the XML tree structure while doing this.
- Similarly, choose one of the subtask A files in
semeval2016-task3-cqa-ql-traindev-v3.2/v3.2/dev/
orsemeval2016_task3_tests/SemEval2016_task3_test/English/
as test data and rename it test.xml. - The unannotated data can be downloaded from here.
- Download the python scripts required to run the system from here.
- Unzip this
resources_QARank
folder in a suitable place. - The trained word embeddings on the large unannotated data can be found here.
- Running QARank can be done with two formats of input xml files
- If the training and test files are specific to subtask A (does not contain Original Questions), then run the jar with 0 as the flag
java -Xmx10g -jar QARank.jar [absolute-path-to-xml_files-folder] [absolute-path-to-resources-folder] 0
- If the training and test files contain both Original and Related Questions, then run the jar with 1 as the flag
java -Xmx10g -jar QARank.jar [absolute-path-to-xml_files-folder] [absolute-path-to-resources-folder] 1
- The system will generate all folders and required files.
- The final MAP scores of the system and the SVM accuracy can be found in result_files/final_scores.txt file.
- Users can run the system on a different dataset, given the training and test files are in the format as in SemEval 2017 - Task 3.
- The evaluation scripts used in the system can be looked up here.