Skip to content
View talebolano's full-sized avatar

Block or report talebolano

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Solve Visual Understanding with Reinforced VLMs

Python 3,553 210 Updated Feb 27, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 10,963 1,092 Updated Feb 27, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,114 573 Updated Feb 26, 2025

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Python 241 15 Updated Jan 14, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 11,794 765 Updated Feb 27, 2025

Vision agent

Python 3,032 349 Updated Feb 24, 2025
Python 27 4 Updated Jan 20, 2025
Python 59 7 Updated Jan 7, 2025

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Python 155 8 Updated Jan 24, 2025
Python 201 29 Updated Nov 14, 2024

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 3,639 310 Updated Feb 20, 2025

Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.

Python 152 16 Updated Feb 25, 2025

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 762 46 Updated Feb 26, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 24,095 2,078 Updated Feb 27, 2025

类似按键精灵的鼠标键盘录制和自动化操作 模拟点击和键入 | automate mouse clicks and keyboard input

Python 7,968 1,126 Updated Aug 31, 2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,199 1,598 Updated Feb 26, 2025

Memory-Guided Diffusion for Expressive Talking Video Generation

Python 153 8 Updated Dec 18, 2024

[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence

Python 318 26 Updated Feb 27, 2025

[ICCV 2023] BoxSnake official repository.

Python 60 6 Updated May 28, 2024

[ICLR'23 Spotlight & IJCV'24] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

Python 1,217 187 Updated Oct 28, 2024

[CVPR 2025] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving

Python 458 23 Updated Feb 27, 2025

A new tensorrt integrate. Easy to integrate many tasks

Cuda 409 82 Updated Apr 2, 2023

[ACM MM 2022] Official Rail-DB and Rail-Net

Python 53 6 Updated Aug 17, 2023

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,728 440 Updated Jan 12, 2025

Dataset for German Railway Signals

Python 15 Updated Feb 29, 2024

A C++ framework for programming real-time applications

C++ 239 29 Updated Nov 25, 2024

Awesome Incremental Learning

3,950 586 Updated Jan 2, 2025

Python scripts for the Segment Anythin 2 (SAM2) model in ONNX

Python 220 14 Updated Aug 29, 2024

Automate browser-based workflows with LLMs and Computer Vision

Python 12,379 921 Updated Feb 27, 2025
Next
Showing results