Skip to content

Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files

License

Notifications You must be signed in to change notification settings

WismutHansen/READ2ME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Read2Me

READ2ME Banner

Overview

Read2Me is a FastAPI application that fetches content from provided URLs, processes the text, converts it into speech using Microsoft Azure's Edge TTS or with the local TTS models F5-TTS, StyleTTS2 or Piper TTS, and tags the resulting MP3 files with metadata. You can either turn the full text into audio or have an LLM convert the seed text into a podcast. Currently Ollama and any OpenAI compatible API is supported. You can install the provided Chromium Extension in any Chromium-based browser (e.g. Chrome or Microsoft Edge) to send current urls or any text to the sever, add sources and keywords for automatic fetching.

This is a currently a beta version but I plan to extend it to support other content types (e.g., epub) in the future and provide more robust support for languages other than English. Currently, when using the default Azure Edge TTS, it already supports other languages and tries to autodetect it from the text but quality might vary depending on the language.

Features

  • Fetches and processes content from HTML URLs and saves it as a markdown file.
  • Converts text to speech using Microsoft Azure's Edge TTS (currently randomly selecting from the available multi-lingual voices to easily handle multiple languages).
  • Tags MP3 files with metadata, including the title, author, and publication date, if available.
  • Adds a cover image with the current date to the MP3 files.
  • For urls from wikipedia, uses the wikipedia python library to extract article content
  • Automatic retrieval of new articles from specified sources at defined intervals (currently hard coded to twice a day at 5AM and 5PM local time). Sources and keywords can be specified via text files.
  • Turn any seed text (url or manually entered text) into a podcast (currently works with edge-tts and F5)
  • Chrome Extension available on the Chrome WebStore: READ2ME Browser Companion. If you prefere installing the Extension from source, it's available in this repository as well.

Requirements

  • Python 3.10 or higher
  • Dependencies listed in requirements.txt for edge-tts, separate requirements for F5 and StyleTTS2.

Installation

Python Installation

  1. Clone the repository:

    git clone https://github.com/WismutHansen/READ2ME.git
    cd read2me
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate   # On Windows: .venv\Scripts\activate

    or if you like to use uv for package management:

    uv venv
    source .venv/bin/activate # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt (or uv pip install -r requirements.txt)

    For the local styleTTS2 text-to-speech model, please also install the additional dependencies:

    pip install -r requirements_stts2.txt (or uv pip install -r requirements_stts2.txt)

    For the F5-TTS model, please also install the additional dependencies:

    pip install -r requirements_F5.txt (or uv pip install -r requirements_F5.txt)

    Install playwright

    playwright install

    If using uv please also install:

    uv pip install pip

For local piperTTS support:

python3 -m TTS.piper_tts.instalpipertts (MacOS and Linux) or python -m TTS.piper_tts.instalpipertts (on Windows)

Note: ffmpeg is required when using either StyleTTS2 or PiperTTS for converting wav files into mp3. StyleTTS also requires espeak-ng to be installed on your system.

  1. Set up environment variables:

    Rename .env.example file in the root director to .env and edit the content to your preference:

    OUTPUT_DIR=Output # Directory to store output files
    SOURCES_FILE=sources.json # File containing sources to retrieve articles from twice a day
    IMG_PATH=front.jpg # Path to image file to use as cover
    OLLAMA_BASE_URL=http://localhost:11434    # Standard Port for Ollama
    OPENAI_BASE_URL=http://localhost:11434/v1 # Example for Ollama Open AI compatible endpoint
    OPENAI_API_KEY=skxxxxxx                   # Your OpenAI API Key in case of using the official OpenAI API
    MODEL_NAME=llama3.2:latest
    LLM_ENGINE=Ollama #Valid Options: Ollama, OpenAI

    You can use either Ollama or any OpenAI compatible API for title and podcast script generation (summary function also coming soon)

Docker Installation

  1. Clone the repository and switch into it :

    git clone https://github.com/WismutHansen/READ2ME.git && cd read2me
  2. Copy the .env.example to .env and edit the contents: Important: When using a local LLM-engine e.g. ollama, the url needs to follow this format "host.docker.internal:11434" (for Ollama) or "host.docker.internal:1234" (for LMStudio)

  3. Build the docker container

     docker build -t read2me . 

    Note: build time takes a long time, be patient

  4. Run the docker container

     docker run -p 7777:7777 -d read2me

    Note: build time takes a long time, be patient

Usage

  1. Prepare the environment variables file (.env):

copy and rename .env.example to .env. Edit the content of this file as you wish, specifying the output directory, task file and image path to use for the mp3 file cover as well as the sources and keywords file.

Run the FastAPI application:

uvicorn main:app --host 0.0.0.0 --port 7777

or, if you're connected to a Linux server e.g. via ssh and want to keep the app running after closing your session

nohup uvicorn main:app --host 0.0.0.0 --port 7777 &

this will write all commandline output into a file called nohup.out in your current working directory.

  1. Add URLs for processing:

    Send a POST request to http://localhost:7777/v1/url/full with a JSON body containing the URL:

    {
      "url": "https://example.com/article"
    }

    You can use curl or any API client like Postman to send this request like this:

    curl -X POST http://localhost:7777/v1/url/full/ \
      -H "Content-Type: application/json" \
      -d '{"url": "https://example.com/article"}'
      -d '{"tts-engine": "edge"}'

    The repository also contains a working Chromium Extension that you can install in any Chromium-based browser (e.g. Google Chrome) when the developer settings are enabled.

  2. Processing URLs:

    The application periodically checks the tasks.json file for new Jobs to process. It fetches the content for a given url, extracts text, converts it to speech, and saves the resulting MP3 files with appropriate metadata.

  3. Specify Sources and keywords for automatic retrieval:

Create a file called sources.json in your current working directory with URLs to websites that you want to monitor for new articles. You can also set global keywords and per-source keywords to be used as filters for automatic retrieval. If you set "*" for a source, all new articles will be retrieved. Here is an example structure:

{
  "global_keywords": [
    "globalkeyword1",
    "globalkeyword2"
  ],
  "sources": [
    {
      "url": "https://example.com",
      "keywords": ["keyword1","keyword2"]
    },
    {
      "url": "https://example2.com",
      "keywords": ["*"]
    }
  ]
}

Location of both files is configurable in .env file.

Frontend

To use the next.js frontend, make sure you have node.js installed on your system. Note: Frontend is currently in an early experimental stage so expect lots of bugs: First, switch into the frontend directory

cd frontend

then install the required node dependencies:

npm install

then to start the frontend run:

npm run dev

you can access the frontend on http://localhost:3000

API Endpoints

  • POST /v1/url/full

    Adds a URL to the processing list.

    Request Body:

    {
      "url": "https://example.com/article",
      "tts-engine": "edge"
    }

    Response:

    {
      "message": "URL added to the processing list"
    }
  • POST /v1/url/podcast

  • POST /v1/text/full

  • POST /v1/text/podcast

File Structure

  • main.py: The main FastAPI application file.
  • requirements.txt: List of dependencies.
  • .env: Environment variables file.
  • database/: Directory containing the sqlite database and all database-related code
  • TTS/: Directory containing the code for all of the TTS-engines
  • utils/: Directory with helper functions for task handling, text extraction etc.
  • Output/: Directory where the output files (MP3 and MD) are saved unless you specified a different directory int the .env file.

Dependencies

  • FastAPI: Web framework for building APIs.
  • Uvicorn: ASGI server implementation for serving FastAPI applications.
  • edge-tts: Microsoft Azure Edge Text-to-Speech library.
  • mutagen: Library for handling audio metadata.
  • Pillow: Python Imaging Library (PIL) for image processing.
  • trafilatura: Library for web scraping and text extraction.
  • requests: HTTP library for sending requests.
  • BeautifulSoup: Library for parsing HTML and XML documents.
  • pdfminer: Library for extracting text from PDF documents.
  • python-dotenv: Library for managing environment variables.
  • newspaper4k: Library for extracting articles from news websites.
  • wikipedia: Library for extracting information from Wikipedia articles.
  • schedule: Library for scheduling tasks. Used to schedule automatic news retrieval twice a day.
  • and many more but I plan on reducing the dependencies a bit by removing redundancies etc.

Contributing

  1. Fork the repository.

  2. Create a new branch:

    git checkout -b feature/your-feature-name
  3. Make your changes and commit them:

    git commit -m 'Add some feature'
  4. Push to the branch:

    git push origin feature/your-feature-name
  5. Submit a pull request.

License

This project is licensed under the Apache License Version 2.0, January 2004, except for the styletts2 code, which is licensed under the MIT License. The F5-TTS abd styletts2 pre-trained models are under their own license.

StyleTTS2 Pre-Trained Models: Before using these pre-trained models, you agree to inform the listeners that the speech samples are synthesized by the pre-trained models, unless you have the permission to use the voice you synthesize. That is, you agree to only use voices whose speakers grant the permission to have their voice cloned, either directly or by license before making synthesized voices public, or you have to publicly announce that these voices are synthesized if you do not have the permission to use these voices.

Roadmap

  • language detection and voice selection based on detected language (currently only works for edge-tts).
  • Add support for handling of pdf files
  • Add support for local text-to-speech (TTS) engine like StyleTTS2.
  • Add support for LLM-based text processing like podcast transcript with local LLMs through Ollama or the OpenAI API
  • Add support for F5-TTS
  • Add support for automatic image captioning using local vision models or the OpenAI API

Acknowledgements

I would like to thank the following repositories and authors for their inspiration and code:

  • F5-TTS - Currently the best open weights TTS model!
  • stylyetts2 - A great open source TTS engine, and really fast if using NVIDIA/CUDA
  • piperTTS - Another good local TTS engine that also works on low spec systems
  • AlwaysReddy - Thanks to these guys, I got piper TTS working in my project
  • rvc-python - For improving generated speech
  • edge-tts - Best free online TTS engine

About

Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published