Skip to content

Commit

Permalink
add caption inference
Browse files Browse the repository at this point in the history
  • Loading branch information
JustinLin610 committed Feb 10, 2022
1 parent fa60b70 commit 0c7f4c9
Show file tree
Hide file tree
Showing 1,061 changed files with 279,467 additions and 3 deletions.
87 changes: 87 additions & 0 deletions .idea/workspace.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 29 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,39 @@ OFA is a unified multimodal pretrained model that unifies modalities (i.e., cros
(e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.)
to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: [Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework](http://arxiv.org/abs/2202.03052).

We plan to release the code and colab notebooks soon (Feb. 2022).


# Approach
![approach](examples/approach.jpg)

# Examples

# Requirements
* python 3.7.4
* pytorch 1.8.1

# Installation
```bash
git clone https://github.com/OFA-Sys/OFA
pip install -r requirements.txt
```

# Datasets and Checkpoints
See [datasets.md](datasets.md) and [checkpoints.md](checkpoints.md).

# Pretraining
To release soon:)

# Finetuning & Inference
Below we provide methods for fintuning and inference on different downstream tasks. At this moment we only provide the scripts for inference, and we will soon release those for finetuning.
## Caption
1. Download data and files and put them in the correct directory
2. Run the commands below,

```bash
cd run_scripts/caption
sh evaluate_caption.sh
```

# Gallery
Below we provide examples of OFA in text-to-image generation and open-ended VQA. Also, we demonstrate its performance in unseen task (Grounded QA) as well as unseen domain (Visual Grounding on images from unseen domains).

## Text-to-Image Generation (normal query)
Expand Down
3 changes: 3 additions & 0 deletions checkpoints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
We provide links for you to download our checkpoints. We will release all the checkpoints including pretrained and finetuned models on different tasks.

* <a href="https://zheluo-mm.oss-cn-beijing.aliyuncs.com/ofa/checkpoints/caption_large_best.pt"> Finetuned checkpoint for Caption on COCO </a>
2 changes: 2 additions & 0 deletions criterions/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .scst_loss import ScstRewardCriterion
from .label_smoothed_cross_entropy import AjustLabelSmoothedCrossEntropyCriterion
Loading

0 comments on commit 0c7f4c9

Please sign in to comment.