- All languages
- AGS Script
- ActionScript
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- D
- Dockerfile
- Factor
- GCC Machine Description
- Go
- HTML
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- LLVM
- Logos
- Lua
- MATLAB
- MDX
- Makefile
- Mathematica
- Objective-C
- PHP
- Pascal
- PowerShell
- Python
- R
- Ruby
- Rust
- Scala
- Shell
- Starlark
- Swift
- SystemVerilog
- Tcl
- TypeScript
- VHDL
- Verilog
- Visual Basic
- Zig
Starred repositories
A curated list of foundation models for vision and language tasks
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
ImageBind One Embedding Space to Bind Them All
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
[ICCV2023] VLPart: Going Denser with Open-Vocabulary Part Segmentation
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Use Florence 2 to auto-label data for use in training fine-tuned object detection models.
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
A coding-free framework built on PyTorch for reproducible deep learning studies. PyTorch Ecosystem. 🏆25 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemen…
Official repository of the first-ranking solution for the UPAR2024 Challenge - Track 1.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A deep learning library for video understanding research.
Code2Prompt is a powerful command-line tool that simplifies the process of providing context to Large Language Models (LLMs) by generating a comprehensive Markdown file containing the content of yo…
A CLI tool to convert your codebase into a single LLM prompt with source tree, prompt templating, and token counting.
Measures and metrics for image2image tasks. PyTorch.
Fully open reproduction of DeepSeek-R1
Python tool for converting files and office documents to Markdown.