Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Haiyang-W committed Jul 2, 2024
1 parent 891c7b2 commit fba3e5c
Showing 1 changed file with 17 additions and 18 deletions.
35 changes: 17 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ This repo is the official implementation of [**ECCV2024**](https://eccv.ecva.net
> [Haiyang Wang*](https://scholar.google.com/citations?user=R3Av3IkAAAAJ&hl=en&oi=ao), [Hao Tang*](https://scholar.google.com/citations?user=MyarrsEAAAAJ&hl=en), [Li Jiang](https://scholar.google.com/citations?user=5cIodxsAAAAJ&hl=en) $^\dagger$, [Shaoshuai Shi](https://scholar.google.com/citations?user=DC9wzBgAAAAJ&hl=en&oi=ao), [Muhammad Ferjad Naeem](https://scholar.google.com/citations?user=PR2DwYYAAAAJ&hl=en), [Hongsheng Li](https://scholar.google.com/citations?user=BN2Ze-QAAAAJ&hl=en&oi=ao), [Bernt Schiele](https://scholar.google.com/citations?user=z76PBfYAAAAJ&hl=en), [Liwei Wang](https://scholar.google.com/citations?user=VZHxoh8AAAAJ&hl=en) $^\dagger$
> - Primary contact: Haiyang Wang ( [email protected] ), Hao Tang ( [email protected] )
## Overview
- [💫 What we want to do](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-what-we-want-to-do)
- [🤔 Introduction](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-introduction)
- [👀 Todo](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-todo)
- [🚀 Main Results](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-main-results)
- [🛠️ Quick Start](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#%EF%B8%8F-quick-start)
- [👍 Acknowledgments](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-acknowledgement)
- [📘 Citation](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-citation)
<div align="center">
<img src="assets/Figure1.png" width="800"/>
</div>

## 📣 News
- [24-7-01] 🤗 Our GiT was accepted by [ECCV2024](https://eccv.ecva.net/).
- [24-3-15] 🚀 Training and inference Code is released.
- [24-3-15] 👀 GiT is released on [arXiv](https://arxiv.org/abs/2403.09394).

## 💫 What we want to do
### The Model Architectures across various AI domains are converging towards <font color=Red>*Multi-Layer Plain Transformers*</font>.
Expand All @@ -43,23 +43,22 @@ This repo is the official implementation of [**ECCV2024**](https://eccv.ecva.net

We aim to unify the model architecture of vision and language through a plain transformer, **reducing human biases** such as modality-specific encoders and task-specific heads. A key advancement in deep learning is the shift from hand-crafted to autonomously learned features, inspiring us to reduce human-designed aspects in architecture. Moreover, benefiting from the flexibility of plain transformers, our framework can extend to more modalities like [point clouds](https://github.com/Haiyang-W/UniTR) and graphs.

## Overview
- [💫 What we want to do](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-what-we-want-to-do)
- [🤔 Introduction](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-introduction)
- [👀 Todo](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-todo)
- [🚀 Main Results](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-main-results)
- [🛠️ Quick Start](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#%EF%B8%8F-quick-start)
- [👍 Acknowledgments](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-acknowledgement)
- [📘 Citation](https://github.com/Haiyang-W/GiT?tab=readme-ov-file#-citation)

## 🤔 Introduction
Building a universal computation model across all tasks stands as the cornerstone of artificial intelligence, reducing the need for task-specific designs. In this project, we introduce GiT (**G**eneralist V**i**sion **T**ransformer). GiT has the following characteristics:
- 😮 **Minimalist architecture design similar to LLM**: GiT consists solely of a single transformer, without the inclusion of additional vision encoders and adapters.
- 🚀 **Covering all types of visual understanding tasks**: GiT addresses a spectrum of visual tasks, including object-level tasks (e.g., object detection), pixel-level tasks (e.g., semantic segmentation), and vision-language tasks (e.g., image captioning).
- 🤗 **Achieving multi-task ability by unified language interface**: Similar to LLM, GiT observes the task synergy effect in multi-task training. It fosters mutual enhancement across tasks, leading to significant improvements compared to isolated training.
- 🔥 **Strong performance on zero-shot and few-shot benchmark**: GiT scales well with model size and data, demonstrating remarkable generalizability across diverse scenarios after training on 27 datasets.


<div align="center">
<img src="assets/Figure1.png" width="800"/>
</div>

## 📣 News
- [24-7-01] 🤗 Our GiT was accepted by [ECCV2024](https://eccv.ecva.net/).
- [24-3-15] 🚀 Training and inference Code is released.
- [24-3-15] 👀 GiT is released on [arXiv](https://arxiv.org/abs/2403.09394).

## 👀 Todo

- [x] Release the [arXiv](https://arxiv.org/abs/2403.09394) version.
Expand Down

0 comments on commit fba3e5c

Please sign in to comment.