Lists (9)
Sort Name ascending (A-Z)
Stars
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
Generative Models by Stability AI
图像配准算法。包括 SIFT、ORB、SURF、AKAZE、BRIEF、matchTemplate
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
This is the pytorch implement of our paper "RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model"
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
An open source implementation of CLIP.
Research Code for Multimodal-Cognition Team in Ant Group
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
SD-Trainer. LoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.
PyTorch package for the discrete VAE used for DALL·E.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
整体的介绍 FastAPI,快速上手开发,结合 API 交互文档逐个讲解核心模块的使用。视频学习地址:
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
A Vue 3 Component Library. Fairly Complete. Theme Customizable. Uses TypeScript. Fast.
⭐️ 基于 FastAPI+Vue3+Naive UI 的现代化轻量管理平台 A modern and lightweight management platform based on FastAPI, Vue3, and Naive UI.
End-to-End Object Detection with Transformers
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.