monofy-ai

Simple and multifaceted API for AI

What's in the box?

Python APIs for using large language models, text-to-speech, and Stable Diffusion similarly in your projects
HTML/Javascript chat interface with image generation and PDF reading abilities, code blocks, chat history, and more
Gradio interface for experimenting with various features

Requirements

Windows or Linux, WSL is supported (recommended, even)
12GB VRAM (RTX3060 or Ti4060 recommended)
32GB RAM (64GB recommended)
Python 3.10 (may work on 3.11, file an issue if you have any)
CUDA 12.3 Toolkit

Will it run on less than 12GB VRAM?

Your mileage may vary. If you have a lot of CPU RAM, many features will still work (slowly and/or with lower resolution etc).

What is included?

Large language model using Exllamav2 (Llama 3.1 8b by default, other options available)
Vision: YOLOS, Moondream, Owl, LLaVA, DepthAnything, Midas, Canny, and more
Speech dictation using Whisper
Image Generation: (SD1.5, SDXL, SD3, Turbo, Lightning, Cascade, IC Relight, Flux, and more)
Video: Stable Video Diffusion XT, LivePortrait, AnimateLCM with multiple modes available
Audio: MusicGen, AudioGen, MMAudio
Text-to-speech: XTTS with instant voice cloning from 6-20sec samples, edge TTS api also included
Canny and depth detection with text-to-image IP adapter support
3D model generation: Shap-E, TripoSR, LGM Mini
Endpoints with combinations of features to automate workflow
Easy plugin system that copilot understands (write plugins for new HF models in minutes or seconds) ... and much more!

Are all of these features available out of the box?

Yes! Models and other resources are downloaded automatically. This project aims to fully to utilize the Hugging Face cache system.

Why did you make this?

I just wanted a unified python API for LLM/TTS and possibly even generating simple images. Too many projects require complicated setups, Docker, etc. Many have also become stale or obsolete as huggingface has generously provided improved APIs and examples. Mainly I wanted something simple enough to modify for my exact needs in any scenario without a huge learning curve. I tried to leave everything accessible enough for you to do the same.

This project has 3 main goals in mind.

Do what I personally need for my projects (I hope it serves you too!)
No complicated installation steps
Something ready to use, fine-tuned right out of the box

Startup:

(Note: Some of this is temporary until I decide on a proper way of handling settings.)

A working run.bat is included for reference, but feel free to use your environment of choice (conda, WSL, etc).

The following API endpoints are available (please note that this is not a complete list as new features are being added constantly):

Image Processing

/img/canny
/img/depth
/img/depth/midas
/img/rembg
/vid2densepose

Image Generation

/txt2img
/img2img
/inpaint
/txt2img/flux
/txt2img/canny
/txt2img/depth
/txt2img/openpose
/txt2img/relight
/txt2img/instantid
/txt2img/cascade
/txt2img/controlnet

3D Model Generation

/txt2model/shape
/img2model/lgm
/img2model/tsr

Video Generation

/img2vid/xt
/txt2vid/animate
/txt2vid/zero
/txt2vid/zeroscope
/img2vid/liveportrait

Computer Vision

/detect/yolos
/vision

Image-to-Text

/img2txt/llava

Audio

/txt2wav/musicgen
/mmaudio
/piano2midi

Text Generation

/chat/completions
/chat/stream
/txt/summary
/txt/profile

YouTube Tools

/youtube/download
/youtube/captions
/youtube/grid
/youtube/frames

Reddit Tools

/reddit/download

Text-to-Speech (TTS)

/tts

Other

/google/trends

Adding additional TTS voices

Add wav files containing samples of the voices you want to use into the voices/ folder. A single example female1.wav is included. The voice parameter of the tts API expects the name of the file (without .wav on the end). There is no training required!

Name		Name	Last commit message	Last commit date
Latest commit History 659 Commits
.vscode		.vscode
characters		characters
classes		classes
deprecated		deprecated
github-images		github-images
models/Stable-diffusion		models/Stable-diffusion
modules		modules
plugins		plugins
public_html		public_html
requirements		requirements
res		res
submodules		submodules
ts		ts
utils		utils
voices		voices
webui		webui
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
run.bat		run.bat
run.py		run.py
run.sh		run.sh
settings.py		settings.py
upgrade.bat		upgrade.bat
upgrade.sh		upgrade.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

monofy-ai

What's in the box?

Requirements

Will it run on less than 12GB VRAM?

What is included?

Are all of these features available out of the box?

Why did you make this?

This project has 3 main goals in mind.

Startup:

The following API endpoints are available (please note that this is not a complete list as new features are being added constantly):

Image Processing

Image Generation

3D Model Generation

Video Generation

Computer Vision

Image-to-Text

Audio

Text Generation

YouTube Tools

Reddit Tools

Text-to-Speech (TTS)

Other

Adding additional TTS voices

Thanks for trying this project! Please file issue reports for feature requests including additional API parameters, etc!

About

Releases

Packages

Languages

License

monofy-org/monofy-ai

Folders and files

Latest commit

History

Repository files navigation

monofy-ai

What's in the box?

Requirements

Will it run on less than 12GB VRAM?

What is included?

Are all of these features available out of the box?

Why did you make this?

This project has 3 main goals in mind.

Startup:

The following API endpoints are available (please note that this is not a complete list as new features are being added constantly):

Image Processing

Image Generation

3D Model Generation

Video Generation

Computer Vision

Image-to-Text

Audio

Text Generation

YouTube Tools

Reddit Tools

Text-to-Speech (TTS)

Other

Adding additional TTS voices

Thanks for trying this project! Please file issue reports for feature requests including additional API parameters, etc!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages