Read2Me

Overview

Read2Me is a FastAPI application that fetches content from provided URLs, processes the text, converts it into speech using Microsoft Azure's Edge TTS or with the local TTS models F5-TTS, StyleTTS2 or Piper TTS, and tags the resulting MP3 files with metadata. You can either turn the full text into audio or have an LLM convert the seed text into a podcast. Currently Ollama and any OpenAI compatible API is supported. You can install the provided Chromium Extension in any Chromium-based browser (e.g. Chrome or Microsoft Edge) to send current urls or any text to the sever, add sources and keywords for automatic fetching.

This is a currently a beta version but I plan to extend it to support other content types (e.g., epub) in the future and provide more robust support for languages other than English. Currently, when using the default Azure Edge TTS, it already supports other languages and tries to autodetect it from the text but quality might vary depending on the language.

Features

Fetches and processes content from HTML URLs and saves it as a markdown file.
Converts text to speech using Microsoft Azure's Edge TTS (currently randomly selecting from the available multi-lingual voices to easily handle multiple languages).
Tags MP3 files with metadata, including the title, author, and publication date, if available.
Adds a cover image with the current date to the MP3 files.
For urls from wikipedia, uses the wikipedia python library to extract article content
Automatic retrieval of new articles from specified sources at defined intervals (currently hard coded to twice a day at 5AM and 5PM local time). Sources and keywords can be specified via text files.
Turn any seed text (url or manually entered text) into a podcast (currently works with edge-tts and F5)
Chrome Extension available on the Chrome WebStore: READ2ME Browser Companion. If you prefere installing the Extension from source, it's available in this repository as well.

Requirements

Python 3.10 or higher
Dependencies listed in requirements.txt for edge-tts, separate requirements for F5 and StyleTTS2.

Installation

Python Installation

Clone the repository:

git clone https://github.com/WismutHansen/READ2ME.git
cd read2me

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate

or if you like to use uv for package management:

uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt (or uv pip install -r requirements.txt)

For the local styleTTS2 text-to-speech model, please also install the additional dependencies:

pip install -r requirements_stts2.txt (or uv pip install -r requirements_stts2.txt)

For the F5-TTS model, please also install the additional dependencies:

pip install -r requirements_F5.txt (or uv pip install -r requirements_F5.txt)

Install playwright

playwright install

If using uv please also install:

uv pip install pip

For local piperTTS support:

python3 -m TTS.piper_tts.instalpipertts (MacOS and Linux) or python -m TTS.piper_tts.instalpipertts (on Windows)

Note: ffmpeg is required when using either StyleTTS2 or PiperTTS for converting wav files into mp3. StyleTTS also requires espeak-ng to be installed on your system.

Set up environment variables:

Rename .env.example file in the root director to .env and edit the content to your preference:

OUTPUT_DIR=Output # Directory to store output files
SOURCES_FILE=sources.json # File containing sources to retrieve articles from twice a day
IMG_PATH=front.jpg # Path to image file to use as cover
OLLAMA_BASE_URL=http://localhost:11434    # Standard Port for Ollama
OPENAI_BASE_URL=http://localhost:11434/v1 # Example for Ollama Open AI compatible endpoint
OPENAI_API_KEY=skxxxxxx                   # Your OpenAI API Key in case of using the official OpenAI API
MODEL_NAME=llama3.2:latest
LLM_ENGINE=Ollama #Valid Options: Ollama, OpenAI

You can use either Ollama or any OpenAI compatible API for title and podcast script generation (summary function also coming soon)

Docker Installation

Clone the repository and switch into it :

git clone https://github.com/WismutHansen/READ2ME.git && cd read2me

Copy the .env.example to .env and edit the contents: Important: When using a local LLM-engine e.g. ollama, the url needs to follow this format "host.docker.internal:11434" (for Ollama) or "host.docker.internal:1234" (for LMStudio)
Build the docker container
```
 docker build -t read2me . 
```
Note: build time takes a long time, be patient
Run the docker container
```
 docker run -p 7777:7777 -d read2me
```
Note: build time takes a long time, be patient

Usage

Prepare the environment variables file (.env):

copy and rename .env.example to .env. Edit the content of this file as you wish, specifying the output directory, task file and image path to use for the mp3 file cover as well as the sources and keywords file.

Run the FastAPI application:

uvicorn main:app --host 0.0.0.0 --port 7777

or, if you're connected to a Linux server e.g. via ssh and want to keep the app running after closing your session

nohup uvicorn main:app --host 0.0.0.0 --port 7777 &

this will write all commandline output into a file called nohup.out in your current working directory.

Add URLs for processing:

Send a POST request to http://localhost:7777/v1/url/full with a JSON body containing the URL:
```
{
  "url": "https://example.com/article"
}
```
You can use curl or any API client like Postman to send this request like this:
```
curl -X POST http://localhost:7777/v1/url/full/ \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'
  -d '{"tts-engine": "edge"}'
```
The repository also contains a working Chromium Extension that you can install in any Chromium-based browser (e.g. Google Chrome) when the developer settings are enabled.
Processing URLs:

The application periodically checks the tasks.json file for new Jobs to process. It fetches the content for a given url, extracts text, converts it to speech, and saves the resulting MP3 files with appropriate metadata.
Specify Sources and keywords for automatic retrieval:

Create a file called sources.json in your current working directory with URLs to websites that you want to monitor for new articles. You can also set global keywords and per-source keywords to be used as filters for automatic retrieval. If you set "*" for a source, all new articles will be retrieved. Here is an example structure:

{
  "global_keywords": [
    "globalkeyword1",
    "globalkeyword2"
  ],
  "sources": [
    {
      "url": "https://example.com",
      "keywords": ["keyword1","keyword2"]
    },
    {
      "url": "https://example2.com",
      "keywords": ["*"]
    }
  ]
}

Location of both files is configurable in .env file.

Frontend

To use the next.js frontend, make sure you have node.js installed on your system. Note: Frontend is currently in an early experimental stage so expect lots of bugs: First, switch into the frontend directory

cd frontend

then install the required node dependencies:

npm install

then to start the frontend run:

npm run dev

you can access the frontend on http://localhost:3000

API Endpoints

POST /v1/url/full

Adds a URL to the processing list.

Request Body:

{
  "url": "https://example.com/article",
  "tts-engine": "edge"
}

Response:

{
  "message": "URL added to the processing list"
}

POST /v1/url/podcast
POST /v1/text/full
POST /v1/text/podcast

File Structure

main.py: The main FastAPI application file.
requirements.txt: List of dependencies.
.env: Environment variables file.
database/: Directory containing the sqlite database and all database-related code
TTS/: Directory containing the code for all of the TTS-engines
utils/: Directory with helper functions for task handling, text extraction etc.
Output/: Directory where the output files (MP3 and MD) are saved unless you specified a different directory int the .env file.

Dependencies

FastAPI: Web framework for building APIs.
Uvicorn: ASGI server implementation for serving FastAPI applications.
edge-tts: Microsoft Azure Edge Text-to-Speech library.
mutagen: Library for handling audio metadata.
Pillow: Python Imaging Library (PIL) for image processing.
trafilatura: Library for web scraping and text extraction.
requests: HTTP library for sending requests.
BeautifulSoup: Library for parsing HTML and XML documents.
pdfminer: Library for extracting text from PDF documents.
python-dotenv: Library for managing environment variables.
newspaper4k: Library for extracting articles from news websites.
wikipedia: Library for extracting information from Wikipedia articles.
schedule: Library for scheduling tasks. Used to schedule automatic news retrieval twice a day.
and many more but I plan on reducing the dependencies a bit by removing redundancies etc.

Contributing

Fork the repository.

Create a new branch:

git checkout -b feature/your-feature-name

Make your changes and commit them:
```
git commit -m 'Add some feature'
```

Push to the branch:

git push origin feature/your-feature-name

Submit a pull request.

License

This project is licensed under the Apache License Version 2.0, January 2004, except for the styletts2 code, which is licensed under the MIT License. The F5-TTS abd styletts2 pre-trained models are under their own license.

StyleTTS2 Pre-Trained Models: Before using these pre-trained models, you agree to inform the listeners that the speech samples are synthesized by the pre-trained models, unless you have the permission to use the voice you synthesize. That is, you agree to only use voices whose speakers grant the permission to have their voice cloned, either directly or by license before making synthesized voices public, or you have to publicly announce that these voices are synthesized if you do not have the permission to use these voices.

Roadmap

language detection and voice selection based on detected language (currently only works for edge-tts).
Add support for handling of pdf files
Add support for local text-to-speech (TTS) engine like StyleTTS2.
Add support for LLM-based text processing like podcast transcript with local LLMs through Ollama or the OpenAI API
Add support for F5-TTS
Add support for automatic image captioning using local vision models or the OpenAI API

Acknowledgements

I would like to thank the following repositories and authors for their inspiration and code:

F5-TTS - Currently the best open weights TTS model!
stylyetts2 - A great open source TTS engine, and really fast if using NVIDIA/CUDA
piperTTS - Another good local TTS engine that also works on low spec systems
AlwaysReddy - Thanks to these guys, I got piper TTS working in my project
rvc-python - For improving generated speech
edge-tts - Best free online TTS engine

Name		Name	Last commit message	Last commit date
Latest commit History 245 Commits
Chromium_Extension		Chromium_Extension
Firefox_Extension		Firefox_Extension
Fonts		Fonts
TTS		TTS
database		database
frontend		frontend
llm		llm
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.project-root		.project-root
Banner.png		Banner.png
CHANGELOG.md		CHANGELOG.md
Chromium_Extension.crx		Chromium_Extension.crx
Chromium_Extension.pem		Chromium_Extension.pem
LICENSE		LICENSE
README.md		README.md
dockerfile		dockerfile
feeds.json		feeds.json
front.jpg		front.jpg
main.py		main.py
package_versions.log		package_versions.log
requirements.txt		requirements.txt
requirements_F5.txt		requirements_F5.txt
requirements_fish.txt		requirements_fish.txt
requirements_stts2.txt		requirements_stts2.txt
roadmap.md		roadmap.md
tasks.json		tasks.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Read2Me

Overview

Features

Requirements

Installation

Python Installation

Docker Installation

Usage

Frontend

API Endpoints

File Structure

Dependencies

Contributing

License

Roadmap

Acknowledgements

About

Releases 1

Packages

Languages

License

WismutHansen/READ2ME

Folders and files

Latest commit

History

Repository files navigation

Read2Me

Overview

Features

Requirements

Installation

Python Installation

Docker Installation

Usage

Frontend

API Endpoints

File Structure

Dependencies

Contributing

License

Roadmap

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages