Stars
🔥中文 prompt 精选🔥,ChatGPT 使用指南,提升 ChatGPT 可玩性和可用性!🚀
ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
A generative world for general-purpose robotics & embodied AI learning.
a script for downloading papers on ACL Anthology (https://aclweb.org/anthology/)
Complete download for papers in various top conferences
This web app aims to help scientists with their literature review using metadata from OpenAlex (OA), Semantic Scholar (S2) and Crossref (CR) in local citation networks.
Unofficial Python client library for Semantic Scholar APIs.
A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
✨✨Latest Advances on Multimodal Large Language Models
Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.
A collection of resources and papers on AI Scientist / Robot Scientist
The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
[NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training data, instruction fine-tuning data, and In-Context learning …