CarolineTong

Follow

CarolineTong

Follow

2 followers · 47 following

Starred repositories

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 23,031 1,668 Updated Jan 3, 2025

lukas-blecher / LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 13,196 1,054 Updated Dec 5, 2024

deepseek-ai / DeepSeek-V3

Python 14,432 1,003 Updated Jan 3, 2025

openai / prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 1,780 104 Updated Jun 1, 2023

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 21,380 1,683 Updated Jan 3, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…

Python 4,882 428 Updated Jan 3, 2025

QwenLM / ProcessBench

Python 103 3 Updated Dec 17, 2024

KwaiVGI / Koala-36M

Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content".

Python 85 3 Updated Nov 8, 2024

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,469 566 Updated Dec 31, 2024

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,270 400 Updated Aug 7, 2024

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 38,265 6,155 Updated Dec 9, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 37,184 4,584 Updated Jan 3, 2025

QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 11,447 694 Updated Dec 24, 2024

microsoft / LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 11,040 698 Updated Dec 17, 2024

ociubotaru / transcripts

421 186 Updated Sep 11, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,983 195 Updated Sep 25, 2024

mlfoundations / dclm

DataComp for Language Models

HTML 1,194 108 Updated Dec 11, 2024

salesforce / CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.

Python 4,971 382 Updated Mar 17, 2024

deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Python 9,322 618 Updated May 21, 2024

wenbochang888 / house

有完整版的PDF下载。

Java 3,222 403 Updated Nov 28, 2024

minyoungg / platonic-rep

Python 485 32 Updated Jul 29, 2024

jondurbin / airoboros

Customizable implementation of the self-instruct paper.

Python 1,034 71 Updated Mar 7, 2024

ur-whitelab / chemcrow-public

Chemcrow

Python 663 101 Updated Dec 19, 2024

e2b-dev / awesome-ai-agents

A list of AI autonomous agents

12,600 938 Updated Jan 2, 2025

microsoft / JARVIS

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

Python 23,849 1,982 Updated Sep 26, 2024

AntonOsika / gpt-engineer

Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app

Python 52,784 6,870 Updated Nov 17, 2024

apache / camel

Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

Java 5,663 4,974 Updated Jan 3, 2025

THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 5,658 473 Updated Dec 31, 2024

THUDM / CodeGeeX

CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

Python 8,325 610 Updated Aug 13, 2024

OpenBMB / ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Python 26,073 3,263 Updated Dec 30, 2024

Starred topics

Natural language processing

named-entity-recognition