Highlights
- Pro
Stars
Curated list of awesome computer networking resources
A Curated List of Multiplayer Game Network Programming Resources
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Implementation of ViViT: A Video Vision Transformer
Video Summarization Dataset, Papers, Codes
Discourse Processing in Videos https://arxiv.org/abs/1903.02252
DSPy: The framework for programming—not prompting—language models
Papers and Datasets on Instruction Tuning and Following. ✨✨✨
Modeling, training, eval, and inference code for OLMo
EDUVSUM is a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features to identify important temporal segments in educational videos.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-contex…
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)
A beautiful, simple, clean, and responsive Jekyll theme for academics
Visualize PyTorch tensors with a single line of code.
a state-of-the-art-level open visual language model | 多模态预训练模型
[NeurIPS 2022] PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓