Skip to content

IEIT-AGI/MIX-Kalman

Repository files navigation

MIX·Kalman

Introduction

English | 简体中文

Multimodal Artificial Intelligence Framework (MIX·Kalman) is an open source multi-modal model building toolbox. This framework is based on the out-of-the-box design concept. It is compatible with rich multi-modal tasks, models and datasets. It is scalable, ease to use and in high performance.

The master branch works based on PyTorch.

demo_image

License

This project is released under the Apache 2.0 license.

Changelog

MIX·Kalman v0.1 supports mainstream multi-modal datasets, models and mixed precision training. And it supports distribute training across multiple GPUs and multiple nodes.

MIX·Kalman's subsequent version will optimize the framework further. We will add more dual-stream and single-stream pre-training models, add more data process methods such as mask, back translation and unsupervised data enhancement, and support launch multiple jobs for training on a single machine simultaneously.

Benchmark and model zoo

Results and models are available in the model zoo.

All supported models and tasks are shown in the table below.

Supported backbones:

task LXMERT UNITER ViLBERT DeVLBert Oscar VinVL MCAN LCGN HGL R2C VisDial-BERT
VQA
GQA
NLVR
VQA_large
NLVR_large
GussWhatPointing
VisualEntailment
GussWhat
VCR_QAR
VCR_QA
Visual7w
RetrivalFlickr30k
GenomeQA
Retrivalcoco
refcocog
refcoco
refcoco+
VisDial

Installation

Please refer to get_started.md for installation.

Getting Started

Please see quickrun for the basic usage of MIX·Kalman and visual interface for inference. We provide basic introduction of MIX·Kalman core module engine, full guidance for configuration, and all the results and model. There are also tutorials for finetuning models, adding new dataset, customizing models, customizing runtime settings and useful tools.

Contributing

We appreciate all contributions to improve MIX·Kalman. Please refer to CONTRIBUTING.md for the contributing guideline.

Acknowledgement

MIX·Kalman is an open source project that is contributed by researchers and engineers from IEIT. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors.

Citation

If you use this toolbox or benchmark in your research, please cite this project.

@misc{fan2021MIX·Kalman,
  author =       {Baoyu Fan, Liang Jin, Runze Zhang, Xiaochuan Li, Cong Xu, Hongzhi Shi, Jian Zhao, Yinyin Chao, Yingjie Zhang, Binqiang Wang, Zhenhua Guo, Yaqian Zhao, Rengang Li},
  title =        {MIX·Kalman: A multimodal framework for vision and language research},
  howpublished = {[MIX-Kalman]{https://github.com/IEIT-AGI/MIX-Kalman}},
  year =         {2021}
}

About

A framework for Multimodal Intelligence research.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages