Stars
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Code for "Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation"
Janus-Series: Unified Multimodal Understanding and Generation Models
A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a TensorFlow implementation of goal-driven navigation agents solving the task published in “L…
Multimodal Large Language Models for Remote Sensing (RS-MLLMs): A Survey
[AAAI 2025]This repo contains evaluation code for the paper “UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios”
Awesome-Remote-Sensing-Vision-Language-Models
The official pytorch implementation of Exploring the Interactive Guidance for Unified and Effective Image Matting [Arxiv]
The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
✨✨Latest Advances on Multimodal Large Language Models
[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.
Awesome lists about framework figures in papers
[ECCV 2024] About The official implementation of the paper "Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network“.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Official code for CVPR 2022 paper "Rethinking Visual Geo-localization for Large-Scale Applications"
[ECCV-2020 (spotlight)] Self-supervising Fine-grained Region Similarities for Large-scale Image Localization. 🌏 PyTorch open-source toolbox for image-based localization (place recognition).
A Survey on Vision-Language Geo-Foundation Models (VLGFMs)
Papers related to remote sensing in CVPR 2024
[CVPR 2024] 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
A professional list on Multi-modal Data Fusion Models and Key Datasets for Urban Computing.
An Awesome Collection of Urban Foundation Models (UFMs).
[CVPR 2024, Highlight] The official implementation of the paper "SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation". [email protected]
Open-Sora: Democratizing Efficient Video Production for All
Official implementation of CVPR 2024 paper: "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition"