SEARCH SIMULATION FOR VERIFYING NEWS ACCURACY

In this code, we simulate different query generation strategies to examine their impact on search result quality. Our starting point is Study 5 from Aslett et al.'s data [1].

INSTALLATION

pip install -r requirements.txt

REPOSITORY STRUCTURE

All the main files of this repository are under 'src/' folder:

data: it contains three csv files: the corpus of articles used in the simulation, a list of newsguard scores for several domains and the actual Aslett's experiments search data.
sim.py: main file that models the entire simulation process.
figs/plot.py: script that generates Fig. 4 in the paper for evaluating NewsGuard Scores at different rank positions. It also runs stats tests at this different rank positions.
tests: folder that contains tests scripts for using Llama.
sim_output: output of the simulations evaluated in the paper.
analysis/stats_tests.py: script for computing the statistical tests reported in the paper.
traditional_sim_methods: as referred in Section 4.1, we tested traditional simulation and newer simulation approaches, including methods oriented to sample queries from classic language models and neural techniques based on docT5query and keyBERT. However, they tended to produce queries drifting away from the topic of the source of the article.

PREREQUISITES

LlaMa Model

We used an LLM for query generation. More specifically, a version of Llama3. To run the simulation ollama service needs to be up and running in the same environment:

ollama serve

We used the version llama3:8b-instruct-q4_0 so you need to pull it first:

ollama pull model_version

Bing Search API v7

To run searches, we used Bing official search API. You need to generate your search token first. Check here.

RUNNING SIMULATIONS

To run the simulation, the main file is sim.py. We need to provide as argument the different query generation strategies for the inital and subsequent search steps:

python sim.py gen1 gen2

LICENSE

This project is licensed under the GPL-v3 License. See the LICENSE file for details.

CITATION

Please cite our study.

REFERENCES

[1] Kevin Aslett, Zeve Sanderson, William Godel, Nathaniel Persily, Jonathan Nagler, and Joshua A Tucker. 2024. Online searches to evaluate misinformation can increase its perceived veracity. Nature 625, 7995 (2024), 548–556.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEARCH SIMULATION FOR VERIFYING NEWS ACCURACY

INSTALLATION

REPOSITORY STRUCTURE

PREREQUISITES

LlaMa Model

Bing Search API v7

RUNNING SIMULATIONS

LICENSE

CITATION

REFERENCES

About

Releases

Packages

Languages

License

MarcosFP97/search-verify-simulation

Folders and files

Latest commit

History

Repository files navigation

SEARCH SIMULATION FOR VERIFYING NEWS ACCURACY

INSTALLATION

REPOSITORY STRUCTURE

PREREQUISITES

LlaMa Model

Bing Search API v7

RUNNING SIMULATIONS

LICENSE

CITATION

REFERENCES

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages