InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write

Blagoj Mitrevski† • Arina Rak† • Julian Schnitzler† • Chengkun Li† • Andrii Maksai‡ • Jesse Berent • Claudiu Musat

^† First authors (random order) | ^‡ Corresponding author: [email protected]

Animated teaser

Overview

InkSight is an offline-to-online handwriting conversion system that transforms photos of handwritten text into digital ink through a Vision Transformer (ViT) and mT5 encoder-decoder architecture. By combining reading and writing priors in a multi-task training framework, our models process handwritten content without requiring specialized equipment, handling diverse writing styles and backgrounds. The system supports both word-level and full-page conversion, enabling practical digitization of physical notes into searchable, editable digital formats. In this repository we provide the model weights of Small-p, dataset, and example inference code (listed in the releases section).

InkSight system diagram (gif version)

Releases

⚠️ Notice: Please use TensorFlow and tensorflow-text between version 2.15.0 and 2.17.0. Versions later than 2.17.0 may lead to unexpected behavior. We are currently investigating these issues.

We provide open resources for InkSight public version model. Choose the options that best fit your needs:

Model weights:
- Public version Small-p model for CPU/GPU inference
- Public version Small-p model for TPU inference
A dataset containing subsets of:
- Model-generated samples in universal inkML format
- Human expert digital ink traces in npy format
Example inference code: Demonstrates both word-level and full-page text inference using free, open-source alternatives to the Google Cloud Vision Handwriting Text Detection API. The implementation supports docTR and Tesseract OCR.
Samples of model outputs.

News

October 2024: We release Small-p model weights and our dataset on Hugging Face.
October 2024: Our work is now featured on the Google Research Blog!
February 2024: The InkSight Demo on Hugging Face is live!

GPU Inference Environment Setup with Conda

To set up the environment and run the model inference locally on GPU, you can use the following steps:

# Clone the repository
git clone https://github.com/google-research/inksight.git
cd inksight

# Create and activate conda environment
conda env create -f environment.yml
conda activate inksight

If you encounter any issues during setup or running the model, please open an issue with details about your environment and the error message.

Run Gradio 🤗 Playground Locally

To set up and run the Gradio Playground locally, you can use the following steps:

# Clone the huggingface space
git clone https://huggingface.co/spaces/Derendering/Model-Output-Playground

# Install the dependencies
cd Model-Output-Playground
pip install -r requirements.txt

Then you can run the following command to interact with the playground:

# Run the Gradio Playground
python app.py

Licenses

The code in this repository is released under the Apache 2 license.

Disclaimer

Please note: This is not an officially supported Google product.

Citation

If you find our code or dataset useful for your research and applications, please cite using BibTeX:

@article{mitrevski2024inksight,
  title={InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write},
  author={Mitrevski, Blagoj and Rak, Arina and Schnitzler, Julian and Li, Chengkun and Maksai, Andrii and Berent, Jesse and Musat, Claudiu},
  journal={arXiv preprint arXiv:2402.05804},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docs		docs
figures		figures
test_inputs		test_inputs
utils		utils
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
colab.ipynb		colab.ipynb
environment.yml		environment.yml
visualize_dataset.py		visualize_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write

Overview

Releases

News

GPU Inference Environment Setup with Conda

Run Gradio 🤗 Playground Locally

Licenses

Disclaimer

Citation

About

Releases

Packages

Contributors 2

Languages

License

google-research/inksight

Folders and files

Latest commit

History

Repository files navigation

InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write

Overview

Releases

News

GPU Inference Environment Setup with Conda

Run Gradio 🤗 Playground Locally

Licenses

Disclaimer

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages