Lists (1)
Sort Name ascending (A-Z)
Stars
Shapley Interactions and Shapley Values for Machine Learning
High-resolution models for human tasks.
A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!
A simple screen parsing tool towards pure vision based GUI agent
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable (?), WIP
该系列的目的是让读者可以在基础的pytorch上,不依赖任何其他现成的外部库,从零开始理解并实现一个大语言模型的所有组成部分,以及训练微调代码,因此读者仅需python,pytorch和最基础深度学习背景知识即可。
Instant voice cloning by MIT and MyShell. Audio foundation model.
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
https://textbehindimage.rexanwong.xyz - create text behind image designs easily
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
PDF2zh for Zotero | Zotero PDF中文翻译插件
Object Detection Metrics. 14 object detection metrics: mean Average Precision (mAP), Average Recall (AR), Spatio-Temporal Tube Average Precision (STT-AP). This project supports different bounding b…
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, and more.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
Termora is a terminal emulator and SSH client for Windows, macOS and Linux.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Fully open reproduction of DeepSeek-R1
Janus-Series: Unified Multimodal Understanding and Generation Models
FireFlyer Record file format, writer and reader for DL training samples.
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2E, F5-TTS, CosyVoice), with Whisper audio processing, RVC voice changer, YouTube downlo…
PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant refe…