Skip to content

google-research/inksight

Repository files navigation

Organization Icon

InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write

Blagoj Mitrevski†Arina Rak†Julian Schnitzler†Chengkun Li†Andrii Maksai‡Jesse BerentClaudiu Musat

First authors (random order)   |   Corresponding author: [email protected]

Google Research Blog Read the Paper Try Demo on Hugging Face Project Page Hugging Face Dataset Example colab


Inksight
Animated teaser

Overview

InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code (listed in the releases section).

Derender Diagram
InkSight system diagram (gif version)

Releases

⚠️ Notice: Please use TensorFlow and tensorflow-text between version 2.15.0 and 2.17.0. Versions later than 2.17.0 may lead to unexpected behavior. We are currently investigating these issues.

We provide open resources for InkSight public version model. Choose the options that best fit your needs:

News

GPU Inference Environment Setup with Conda

To set up the environment and run the model inference locally on GPU, you can use the following steps:

# Clone the repository
git clone https://github.com/google-research/inksight.git
cd inksight

# Create and activate conda environment
conda env create -f environment.yml
conda activate inksight

If you encounter any issues during setup or running the model, please open an issue with details about your environment and the error message.

Run Gradio 🤗 Playground Locally

To set up and run the Gradio Playground locally, you can use the following steps:

# Clone the huggingface space
git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground

# Install the dependencies
cd Model-Output-Playground
pip install -r requirements.txt

Then you can run the following command to interact with the playground:

# Run the Gradio Playground
python app.py

Licenses

Code License The code in this repository is released under the Apache 2 license.

Disclaimer

Please note: This is not an officially supported Google product.

Citation

If you find our code or dataset useful for your research and applications, please cite using BibTeX:

@article{mitrevski2024inksight,
  title={InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write},
  author={Mitrevski, Blagoj and Rak, Arina and Schnitzler, Julian and Li, Chengkun and Maksai, Andrii and Berent, Jesse and Musat, Claudiu},
  journal={arXiv preprint arXiv:2402.05804},
  year={2024}
}