forked from Haiyang-W/GiT
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
17 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,14 +22,14 @@ This repo is the official implementation of [**ECCV2024**](https://eccv.ecva.net | |
> [Haiyang Wang*](https://scholar.google.com/citations?user=R3Av3IkAAAAJ&hl=en&oi=ao), [Hao Tang*](https://scholar.google.com/citations?user=MyarrsEAAAAJ&hl=en), [Li Jiang](https://scholar.google.com/citations?user=5cIodxsAAAAJ&hl=en) $^\dagger$, [Shaoshuai Shi](https://scholar.google.com/citations?user=DC9wzBgAAAAJ&hl=en&oi=ao), [Muhammad Ferjad Naeem](https://scholar.google.com/citations?user=PR2DwYYAAAAJ&hl=en), [Hongsheng Li](https://scholar.google.com/citations?user=BN2Ze-QAAAAJ&hl=en&oi=ao), [Bernt Schiele](https://scholar.google.com/citations?user=z76PBfYAAAAJ&hl=en), [Liwei Wang](https://scholar.google.com/citations?user=VZHxoh8AAAAJ&hl=en) $^\dagger$ | ||
> - Primary contact: Haiyang Wang ( [email protected] ), Hao Tang ( [email protected] ) | ||
## Overview | ||
- [💫 What we want to do](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-what-we-want-to-do) | ||
- [🤔 Introduction](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-introduction) | ||
- [👀 Todo](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-todo) | ||
- [🚀 Main Results](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-main-results) | ||
- [🛠️ Quick Start](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#%EF%B8%8F-quick-start) | ||
- [👍 Acknowledgments](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-acknowledgement) | ||
- [📘 Citation](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-citation) | ||
<div align="center"> | ||
<img src="assets/Figure1.png" width="800"/> | ||
</div> | ||
|
||
## 📣 News | ||
- [24-7-01] 🤗 Our GiT was accepted by [ECCV2024](https://eccv.ecva.net/). | ||
- [24-3-15] 🚀 Training and inference Code is released. | ||
- [24-3-15] 👀 GiT is released on [arXiv](https://arxiv.org/abs/2403.09394). | ||
|
||
## 💫 What we want to do | ||
### The Model Architectures across various AI domains are converging towards <font color=Red>*Multi-Layer Plain Transformers*</font>. | ||
|
@@ -43,23 +43,22 @@ This repo is the official implementation of [**ECCV2024**](https://eccv.ecva.net | |
|
||
We aim to unify the model architecture of vision and language through a plain transformer, **reducing human biases** such as modality-specific encoders and task-specific heads. A key advancement in deep learning is the shift from hand-crafted to autonomously learned features, inspiring us to reduce human-designed aspects in architecture. Moreover, benefiting from the flexibility of plain transformers, our framework can extend to more modalities like [point clouds](https://github.com/Haiyang-W/UniTR) and graphs. | ||
|
||
## Overview | ||
- [💫 What we want to do](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-what-we-want-to-do) | ||
- [🤔 Introduction](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-introduction) | ||
- [👀 Todo](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-todo) | ||
- [🚀 Main Results](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-main-results) | ||
- [🛠️ Quick Start](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#%EF%B8%8F-quick-start) | ||
- [👍 Acknowledgments](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-acknowledgement) | ||
- [📘 Citation](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-citation) | ||
|
||
## 🤔 Introduction | ||
Building a universal computation model across all tasks stands as the cornerstone of artificial intelligence, reducing the need for task-specific designs. In this project, we introduce GiT (**G**eneralist V**i**sion **T**ransformer). GiT has the following characteristics: | ||
- 😮 **Minimalist architecture design similar to LLM**: GiT consists solely of a single transformer, without the inclusion of additional vision encoders and adapters. | ||
- 🚀 **Covering all types of visual understanding tasks**: GiT addresses a spectrum of visual tasks, including object-level tasks (e.g., object detection), pixel-level tasks (e.g., semantic segmentation), and vision-language tasks (e.g., image captioning). | ||
- 🤗 **Achieving multi-task ability by unified language interface**: Similar to LLM, GiT observes the task synergy effect in multi-task training. It fosters mutual enhancement across tasks, leading to significant improvements compared to isolated training. | ||
- 🔥 **Strong performance on zero-shot and few-shot benchmark**: GiT scales well with model size and data, demonstrating remarkable generalizability across diverse scenarios after training on 27 datasets. | ||
|
||
|
||
<div align="center"> | ||
<img src="assets/Figure1.png" width="800"/> | ||
</div> | ||
|
||
## 📣 News | ||
- [24-7-01] 🤗 Our GiT was accepted by [ECCV2024](https://eccv.ecva.net/). | ||
- [24-3-15] 🚀 Training and inference Code is released. | ||
- [24-3-15] 👀 GiT is released on [arXiv](https://arxiv.org/abs/2403.09394). | ||
|
||
## 👀 Todo | ||
|
||
- [x] Release the [arXiv](https://arxiv.org/abs/2403.09394) version. | ||
|