Skip to content

Latest commit

 

History

History
81 lines (50 loc) · 3.11 KB

README.md

File metadata and controls

81 lines (50 loc) · 3.11 KB

MITweet

Ideology Takes Multiple Looks: A High-Quality Dataset for Multifaceted Ideology Detection (EMNLP 2023)

Multifaceted Ideology Schema

The multifaceted ideology schema contains five domains that reflect different aspects of society. Under the five domains, there are twelve facets with ideological attributes of left- and right-leaning.

Multifaceted Ideology Schema

Multifaceted Ideology Schema

Multifaceted Ideology Schema

Illustration of Multifaceted Ideology Schema

The MITweet Dataset

Based on the schema, we construct a new high-quality dataset, MITweet, for a new multifaceted ideology detection (MID) task. MITweet contains 12,594 English Twitter posts, each manually annotated with a Relevance label, and an Ideology label if the Relevance label is “Related”, along each facet. Meanwhile, MITweet covers 14 highly controversial topics in recent years (e.g., abortion, covid-19 and Russo-Ukrainian war).

Label Distribution

Label Distribution of MITweet

Baselines

we develop baselines for the new MID task based on three widely-used PLMs (BERT, RoBERTa, BERTweet) under both in-topic and cross-topic settings. We split the multifaceted ideology detection procedure into two sub-tasks in a pipeline manner:

  1. Relevance Recognition
  2. Ideology Detection

In-topic Setting

results_in-topic

Cross-topic Setting

results_cross-topic

Reproduce

We provide the dataset and code for reproducing.

In the directory data , MITweet.csv is the complete dataset.

Each .csv data file contains the following columns:

  • topic

  • tweet

  • tokenized tweet : tokenized tweets using the tweet segmentation tool in nltk

  • R1 ~ R5 : relevance labels for the 5 domains. 1 means "Related", 0 means "Unelated"

  • R1-1-1 ~ R512-5-3 : relevance labels for the 12 facets. 1 means "Related", 0 means "Unrelated"

  • I1 ~ I12 : ideology labels for the 12 facets. 0 , 1 , 2 mean left-leaning, center, right-leaning, respectively. -1 means "Unrelated", so no ideology label

How to Run

  • Indicator Detection

    python log_odds_ratio.py
    
  • Relevance Recognition

    python train_relevance.py \
    	--train_data_path your_path \
    	--val_data_path your_path \
    	--test_data_path your_path
    
  • Ideology Detection

    python train_ideology.py \
    	--train_data_path your_path \
    	--val_data_path your_path \
    	--test_data_path your_path \
    	--indicator_file_path your_path