Skip to content

MarcosFP97/search-verify-simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEARCH SIMULATION FOR VERIFYING NEWS ACCURACY

In this code, we simulate different query generation strategies to examine their impact on search result quality. Our starting point is Study 5 from Aslett et al.'s data [1].

INSTALLATION

pip install -r requirements.txt

REPOSITORY STRUCTURE

All the main files of this repository are under 'src/' folder:

  • data: it contains three csv files: the corpus of articles used in the simulation, a list of newsguard scores for several domains and the actual Aslett's experiments search data.
  • sim.py: main file that models the entire simulation process.
  • figs/plot.py: script that generates Fig. 4 in the paper for evaluating NewsGuard Scores at different rank positions. It also runs stats tests at this different rank positions.
  • tests: folder that contains tests scripts for using Llama.
  • sim_output: output of the simulations evaluated in the paper.
  • analysis/stats_tests.py: script for computing the statistical tests reported in the paper.
  • traditional_sim_methods: as referred in Section 4.1, we tested traditional simulation and newer simulation approaches, including methods oriented to sample queries from classic language models and neural techniques based on docT5query and keyBERT. However, they tended to produce queries drifting away from the topic of the source of the article.

PREREQUISITES

LlaMa Model

We used an LLM for query generation. More specifically, a version of Llama3. To run the simulation ollama service needs to be up and running in the same environment:

ollama serve

We used the version llama3:8b-instruct-q4_0 so you need to pull it first:

ollama pull model_version

Bing Search API v7

To run searches, we used Bing official search API. You need to generate your search token first. Check here.

RUNNING SIMULATIONS

To run the simulation, the main file is sim.py. We need to provide as argument the different query generation strategies for the inital and subsequent search steps:

python sim.py gen1 gen2

LICENSE

This project is licensed under the GPL-v3 License. See the LICENSE file for details.

CITATION

Please cite our study.

REFERENCES

[1] Kevin Aslett, Zeve Sanderson, William Godel, Nathaniel Persily, Jonathan Nagler, and Joshua A Tucker. 2024. Online searches to evaluate misinformation can increase its perceived veracity. Nature 625, 7995 (2024), 548–556.

About

Search Simulation for Verifying News Accuracy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages