Blagoj Mitrevski† • Arina Rak† • Julian Schnitzler† • Chengkun Li† • Andrii Maksai‡ • Jesse Berent • Claudiu Musat
† First authors (random order) | ‡ Corresponding author: [email protected]
InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code (listed in the releases section).
InkSight system diagram (gif version)
⚠️ Notice: Please use TensorFlow and tensorflow-text between version 2.15.0 and 2.17.0. Versions later than 2.17.0 may lead to unexpected behavior. We are currently investigating these issues.
We provide open resources for InkSight public version model. Choose the options that best fit your needs:
- Model weights:
- Public version Small-p model for CPU/GPU inference
- Public version Small-p model for TPU inference
- A dataset containing subsets of:
- Model-generated samples in universal
inkML
format - Human expert digital ink traces in
npy
format
- Model-generated samples in universal
- Example inference code: Demonstrates both word-level and full-page text inference using free, open-source alternatives to the Google Cloud Vision Handwriting Text Detection API. The implementation supports docTR and Tesseract OCR.
- Samples of model outputs.
-
October 2024: We release Small-p model weights and our dataset on Hugging Face.
-
October 2024: Our work is now featured on the Google Research Blog!
-
February 2024: The InkSight Demo on Hugging Face is live!
To set up the environment and run the model inference locally on GPU, you can use the following steps:
# Clone the repository
git clone https://github.com/google-research/inksight.git
cd inksight
# Create and activate conda environment
conda env create -f environment.yml
conda activate inksight
If you encounter any issues during setup or running the model, please open an issue with details about your environment and the error message.
To set up and run the Gradio Playground locally, you can use the following steps:
# Clone the huggingface space
git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground
# Install the dependencies
cd Model-Output-Playground
pip install -r requirements.txt
Then you can run the following command to interact with the playground:
# Run the Gradio Playground
python app.py
The code in this repository is released under the Apache 2 license.
Please note: This is not an officially supported Google product.
If you find our code or dataset useful for your research and applications, please cite using BibTeX:
@article{mitrevski2024inksight,
title={InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write},
author={Mitrevski, Blagoj and Rak, Arina and Schnitzler, Julian and Li, Chengkun and Maksai, Andrii and Berent, Jesse and Musat, Claudiu},
journal={arXiv preprint arXiv:2402.05804},
year={2024}
}