GitHub - Hellg0re/news-coverage-PUBLIC

This project is a work in progress

Demo

http://ai-news-hound.sundai.club/

Overview

From a "Journalism Hack" week. This project aims to visualize and compare topic data from various sources, including Reddit, Arxiv, Hacker News, and news articles. By scraping and processing data from these sources, the tool generates text embeddings using the SentenceTransformers library. These embeddings are then visualized using UMAP (Uniform Manifold Approximation and Projection - a way to make large embeddings just two-dimensions so they can be compared) to identify topic clusters and information 'deserts'—areas with sparse or minimal discussion.

The visualization shows color-coded dots to indicate the source of each topic and green hexagons to highlight topics that are not being actively reported on. This tool is designed to help users easily explore and identify gaps in information coverage across different media sources.

Visualization Details

When looking at the visualization:

Dot colors show the source (e.g., Reddit, Arxiv, news articles).
Hexagon color indicates if something is being reported on (green means it's not).

How It Works

We use SentenceTransformers (pip library documentation) to generate embeddings, specifically the all-MiniLM-L6-v2 model. These embeddings are created out of text for specific topics obtained from various APIs. The embeddings are then visualized in a UMAP using Bokeh and Streamlit.

Usage

Running the Visualization

To see the visualization, navigate to the streamlit folder and run:

streamlit run app.py

Make sure to install the necessary requirements first. It is recommended to use a Python virtual environment for this.

Create a virtual environment:

macOS/Linux:

python -m venv env
source env/bin/activate

Windows:

python -m venv env
.\env\Scripts\activate

Install the requirements

pip install -r requirements.txt

The interface will load at http://localhost:8501/.

Loading New Topics

To load your own data, look under the data_source_apis folder. You can run topic_embeddings.py to use the Reddit, Mediastack, and Arxiv APIs (each in their own files).

Set up developer keys for:

Reddit: Reddit Apps
Mediastack: Mediastack Documentation

In addition to the above, there's a hacker-news.js file you'll need to run separately. Follow these steps:

Install dependencies:

npm install express axios

Start the server:

node index.js

Load articles:

Use the /search endpoint with a query parameter to search for articles. For example:

http://localhost:3000/search?q=agents

This will return 10 results, each containing the title of the post, the full body of comments as text, source, link, publication date, and type, formatted as a JSON response.

Updating Embeddings in Visualization

After generating new JSON files with the APIs, update the file names loaded into the visualization by replacing the inputs array in the get_embeddings_from_file function in app.py:

inputs = ["embeddings_AIAgents.json", "embeddings_AIAssistedHealthcare.json", "embeddings_AIDrivenPortfolioManagement.json", "embeddings_AIPublicPolicies.json"]

Replace with the appropriate file names.

Experimental Notebooks

In the notebooks folder, you will find our explorations in Jupyter notebooks for different models and UMAP settings.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data_source_apis		data_source_apis
documents		documents
notebooks		notebooks
readme-assets		readme-assets
streamlit-app		streamlit-app
.gitignore		.gitignore
README.md		README.md
embeddings.json		embeddings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demo

Overview

Visualization Details

How It Works

Usage

Running the Visualization

Create a virtual environment:

Install the requirements

Loading New Topics

Install dependencies:

Start the server:

Load articles:

Updating Embeddings in Visualization

Experimental Notebooks

About

Releases

Packages

Languages

Hellg0re/news-coverage-PUBLIC

Folders and files

Latest commit

History

Repository files navigation

Demo

Overview

Visualization Details

How It Works

Usage

Running the Visualization

Create a virtual environment:

Install the requirements

Loading New Topics

Install dependencies:

Start the server:

Load articles:

Updating Embeddings in Visualization

Experimental Notebooks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages