Skip to content
View xiaoyazhu's full-sized avatar

Block or report xiaoyazhu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The official Meta Llama 3 GitHub site

Python 28,263 3,268 Updated Jan 26, 2025

[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion

Python 59 1 Updated Dec 31, 2024

[TPAMI reviewing] Towards Visual Grounding: A Survey

Shell 76 9 Updated Feb 10, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 3,435 1,442 Updated Feb 9, 2025
Python 60 8 Updated Dec 30, 2024

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Python 311 30 Updated Jan 8, 2024

Evaluation code for Ref-L4, a new REC benchmark in the LMM era

Python 26 Updated Dec 28, 2024

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,477 417 Updated Aug 19, 2024

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"

Python 1,255 114 Updated Dec 20, 2023

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 745 38 Updated Aug 13, 2024

A General-purpose Person Re-identification Task with Instructions

Python 136 6 Updated Apr 1, 2024

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs (CVPR 2022)

Python 891 87 Updated Apr 24, 2024

Multimodal chatbot with computer vision capabilities integrated

Python 100 9 Updated May 17, 2024

Ultralytics YOLO11 🚀

Python 36,393 7,016 Updated Feb 11, 2025

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Python 288 19 Updated Feb 6, 2025

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Python 14,271 2,091 Updated Jul 24, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 20,714 2,592 Updated Feb 6, 2025

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Python 112 7 Updated Mar 20, 2024

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Python 4,847 481 Updated Aug 6, 2024

这是各个主干网络分类模型的源码,可以用于训练自己的分类模型。

Python 416 81 Updated Nov 6, 2022

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 33,120 4,841 Updated Jan 31, 2025

A Python toolkit for the OmniLabel benchmark providing code for evaluation and visualization

Python 21 4 Updated Feb 1, 2025

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,842 126 Updated Oct 30, 2024

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 1,007 68 Updated Oct 6, 2024

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

Python 1,111 133 Updated Dec 29, 2024

COCO API - Dataset @ http://cocodataset.org/

Jupyter Notebook 6,166 3,765 Updated Apr 17, 2024

EVA Series: Visual Representation Fantasies from BAAI

Python 2,410 176 Updated Aug 1, 2024
Next