In this directory, you can find several notebooks that illustrate how to use NAVER AI Lab's ViLT both for fine-tuning on custom data as well as inference. It currently includes the following notebooks:
- fine-tuning ViLT for visual question answering (VQA) (based on the VQAv2 dataset)
- performing inference with ViLT to illustrate visual question answering (VQA)
- masked language modeling (MLM) with a pre-trained ViLT model
- performing inference with ViLT for image-text retrieval
- performing inference with ViLT to illustrate natural language for visual reasoning (based on the NLVRv2 dataset).
All models can be found on the hub.