This web app aims to help scientists with their literature review using metadata from OpenAlex (OA), Semantic Scholar (S2) and Crossref (CR) in local citation networks.

JavaScript 113 16 Updated Nov 11, 2024

fym202 / Crawling_and_Analyzing_CV_Papers

爬取和分析CV会议论文

Jupyter Notebook 2 Updated Dec 22, 2023

danielnsilva / semanticscholar

Unofficial Python client library for Semantic Scholar APIs.

Python 335 46 Updated Jan 1, 2025

mbzuai-oryx / VideoGLaMM

A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

37 Updated Dec 13, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

13,363 847 Updated Jan 2, 2025

wlin-at / CaD-VI

Comparison Visual Instruction Tuning (CaD-VI)

Python 6 Updated Jun 14, 2024

chaoyajiang / MaVEn

3 1 Updated Dec 5, 2024

allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.

203 31 Updated Dec 10, 2024

AkariAsai / OpenScholar

This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.

Python 494 53 Updated Dec 19, 2024

Quinn777 / AtomThink

Python 49 Updated Dec 13, 2024

scofield7419 / MUIE-REAMO

Code of the Grounded MUIE model, REAMO

Python 4 Updated Dec 3, 2024

universea / Awesome-AI-Scientist-Papers

A collection of resources and papers on AI Scientist / Robot Scientist

7 Updated Dec 26, 2024

DAMO-NLP-SG / Inf-CLIP

The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

Python 216 9 Updated Oct 30, 2024

yfzhang114 / Awesome-Multimodal-Large-Language-Models

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

227 7 Updated Dec 22, 2024

yliu-cs / PiTe

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

Python 15 2 Updated Oct 8, 2024

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Python 1,612 230 Updated Jan 3, 2025

rayleizhu / GLMix

[NeurIPS 2024] official code release for our paper "Revisiting the Integration of Convolution and Attention for Vision Backbone".

Python 30 3 Updated Nov 22, 2024

SongTang-x / SwinLSTM

Python 133 12 Updated Jul 30, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,203 282 Updated Oct 16, 2024

SkyworkAI / Vitron

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 453 26 Updated Oct 20, 2024

showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,123 47 Updated Dec 26, 2024

NExT-GPT / NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,373 344 Updated Nov 3, 2024

NiuTrans / Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

Python 92 6 Updated Oct 16, 2024

jinbo0906 / Awesome-MLLM-Datasets

This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training data, instruction fine-tuning data, and In-Context learning …

15 Updated Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mr_qone anothersin

Achievements