This repository contains code, data, and models for the NAACL 2022 findings paper POLITICS: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection.
ALERT: POLITICS is a pre-trained language model that specializes in comprehending news articles and understanding ideological content. However, POLITICS cannot be used out-of-the-box on downstream tasks such as predicting ideological leanings and discerning stances expressed in texts. To perform predictions on downstream tasks, you are advised to fine-tune POLITICS on your own dataset first.
We are still refactoring and cleaning the downstream evaluation code, please stay tuned and check back later!
Please check out our "sibling project" on entity-level stance detection if interested in modeling fine-grained stance detection or entity interaction.
- Continue pretrained POLITICS model is available on Huggingface with model card
launch/POLITICS
. - Cleaned BIGNEWS, BIGNEWSBLN, and BIGNEWSALIGN are available to download after you fill out this form.
- Code for continued pretraining.
To retrain POLITICS, simply run pretrain.sh
. You need to download the precessed data that contains the indices for entities and sentiment words from this form. After downloading, please move the data files and lexicon directory to the DATA_DIR
defined in pretrain.sh
.
Macro F1 score for all tasks are shown in the following table.
YT (cmt.) | CongS | HP | AllS | YT (user) | TW | Ideo. avg | SEval (seen) | SEval (unseen) | Basil (sent.) | VAST | Basil (art.) | Stan. avg | All avg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BERT (base) | 64.64 | 65.88 | 48.42 | 60.88 | 65.24 | 44.20 | 58.21 | 65.07 | 40.39 | 62.81 | 70.53 | 45.61 | 56.88 | 57.61 |
RoBERTa (base) | 66.72 | 67.25 | 60.43 | 74.75 | 67.98 | 48.90 | 64.34 | 70.15 | 63.08 | 68.16 | 76.25 | 41.36 | 63.80 | 64.09 |
POLITICS (base) | 67.83 | 70.86 | 70.25 | 74.93 | 78.73 | 48.92 | 68.59 | 69.41 | 61.26 | 73.41 | 76.73 | 51.94 | 66.55 | 67.66 |
POLITICS is shared under CC BY-NC-SA 4.0. The license applies to both data and pretrained models.
If you have any questions, please contact Yujian Liu <[email protected]>
or Xinliang Frederick Zhang <[email protected]>
or create a Github issue.
Please cite our paper if you our POLITICS model and/or BIGNEWS dataset as well as their derivatives from this repo:
@inproceedings{liu-etal-2022-politics,
title = "{POLITICS}: Pretraining with Same-story Article Comparison for Ideology Prediction and Stance Detection",
author = "Liu, Yujian and
Zhang, Xinliang Frederick and
Wegsman, David and
Beauchamp, Nicholas and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
pages = "1354--1374",
}