Stars
Collection of open-source libraries and tools for Robotic Process Automation (RPA), designed to be used with both Robot Framework and Python
A simple screen parsing tool towards pure vision based GUI agent
抖音批量下载工具,去水印,支持视频、图集、合集、音乐(原声)。免费!免费!免费!
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Wan: Open and Advanced Large-Scale Video Generative Models
A family of diffusion models for text-to-audio generation.
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
A fork to add multimodal model training to open-r1
Solve Visual Understanding with Reinforced VLMs
The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1
Witness the aha moment of VLM with less than $3.
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
基于小红书 Web 端进行的请求封装。https://reajason.github.io/xhs/
小红书(XiaoHongShu、RedNote)链接提取/作品采集工具:提取账号发布、收藏、点赞、专辑作品链接;提取搜索结果作品、用户链接;采集小红书作品信息;提取小红书作品下载地址;下载小红书无水印作品文件
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Rembg is a tool to remove images background
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
GUI for a Vocal Remover that uses Deep Neural Networks.
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/