Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
screenai		screenai
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
agorabanner.png		agorabanner.png
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

Screen AI

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding". The flow is: img + text -> patch sizes -> vit -> embed + concat -> attn + ffn -> cross attn + ffn + self attn -> to out. PAPER LINK:

Install

pip3 install screenai

Usage

import torch
from screenai.main import ScreenAI

# Create a tensor
image = torch.rand(1, 3, 224, 224)
text = torch.randn(1, 1, 512)

# Model
model = ScreenAI(
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)


# Forward
out = model(text, image)

# Print the output shape
print(out.shape)

License

MIT

Citation

@misc{baechler2024screenai,
    title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, 
    author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma},
    year={2024},
    eprint={2402.04615},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Screen AI

Install

Usage

License

Citation

About

Releases

Packages

Languages

License

eddiezero/ScreenAI

Folders and files

Latest commit

History

Repository files navigation

Screen AI

Install

Usage

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages