Skip to content
@LazarusNLP

Lazarus NLP

Lazarus NLP is a collective initiative to revive the dying languages of Indonesia through speech and language technology.

Lazarus NLP is a collective initiative to revive the dying languages of Indonesia through speech and language technology.

logo

Projects

IndoT5: T5 Language Models for the Indonesian Language

IndoT5 is a T5-based language model trained specifically for the Indonesian language. With just 8 hours of training on a limited budget, we developed a competitive sequence-to-sequence, encoder-decode model capable of fine-tuning tasks such as summarization, chit-chat, and question-answering. Despite the limited training constraints, our model is competitive when evaluated on the IndoNLG (text generation) benchmark.

Indonesian Sentence Embedding Models

We trained open-source sentence embedding models for Indonesian, enabling applications such as information retrieval (useful for retrieval-augmented generation!) semantic text similarity, and zero-shot text classification. We leverage existing pre-trained Indonesian language models like IndoBERT and state-of-the-art unsupervised techniques and established sentence embedding benchmarks.

Indonesian Natural Language Inference Models

Open-source lightweight NLI models that are competitive with larger models on IndoNLI benchmark, with significantly less parameters. We applied knowledge distillation methods to small existing pre-trained language models like IndoBERT Lite. These models offer efficient solutions for tasks requiring natural language inference capabilities while minimizing computational resources such as cross-encoder-based semantic search.

Many-to-Many Multilingual Translation Models

Adapting mT5 to 45 languages of Indonesia, we developed a robust baseline model for multilingual translation for languages of Indonesia. This facilitates further fine-tuning for niche domains and low-resource languages, contributing to greater linguistic inclusivity. Our models are competitive with existing multilingual translation models on the NusaX benchmark.

Pinned Loading

  1. indonesian-sentence-embeddings indonesian-sentence-embeddings Public

    Embedding Representation for Indonesian Sentences!

    Jupyter Notebook 13 2

  2. machine-translation machine-translation Public

    Many-to-Many Multilingual Translation Model for Languages of Indonesia

    Python 1

  3. IndoT5 IndoT5 Public

    T5 Language Models for the Indonesian Language!

    Python 6

  4. NusaBERT NusaBERT Public

    NusaBERT: Teaching IndoBERT to be multilingual and multicultural!

    Python

Repositories

Showing 10 of 15 repositories
  • indonesian-sentence-embeddings Public

    Embedding Representation for Indonesian Sentences!

    LazarusNLP/indonesian-sentence-embeddings’s past year of commit activity
    Jupyter Notebook 13 Apache-2.0 2 0 0 Updated Aug 14, 2024
  • NusaBERT Public

    NusaBERT: Teaching IndoBERT to be multilingual and multicultural!

    LazarusNLP/NusaBERT’s past year of commit activity
    Python 0 Apache-2.0 0 0 0 Updated Aug 7, 2024
  • mteb Public Forked from embeddings-benchmark/mteb

    MTEB: Massive Text Embedding Benchmark

    LazarusNLP/mteb’s past year of commit activity
    Python 0 Apache-2.0 291 0 0 Updated May 16, 2024
  • EasyDeL Public Forked from erfanzar/EasyDeL

    EasyDeL is an OpenSource Library to make your training faster and more Optimized With cool Options for training and serving Both in Python And Mojo🔥

    LazarusNLP/EasyDeL’s past year of commit activity
    Python 0 Apache-2.0 24 0 0 Updated Mar 22, 2024
  • FJFormer Public Forked from erfanzar/FJFormer

    Embark on a journey of paralleled/unparalleled computational prowess with FJFormer - an arsenal of custom Jax Flax Functions and Utils that elevate your AI endeavors to new heights!

    LazarusNLP/FJFormer’s past year of commit activity
    Python 0 Apache-2.0 4 0 0 Updated Mar 18, 2024
  • IndoT5 Public

    T5 Language Models for the Indonesian Language!

    LazarusNLP/IndoT5’s past year of commit activity
    Python 6 Apache-2.0 0 1 0 Updated Mar 6, 2024
  • lazarusnlp.github.io Public

    Lazarus NLP is a collective initiative to revive the dying languages of Indonesia through speech and language technology.

    LazarusNLP/lazarusnlp.github.io’s past year of commit activity
    0 0 0 0 Updated Mar 6, 2024
  • LazarusNLP/indorobusta’s past year of commit activity
    Jupyter Notebook 0 1 0 0 Updated Mar 1, 2024
  • .github Public
    LazarusNLP/.github’s past year of commit activity
    0 0 0 0 Updated Feb 13, 2024
  • nanoT5 Public Forked from PiotrNawrot/nanoT5

    Fast & Simple repository for pre-training and fine-tuning T5-style models

    LazarusNLP/nanoT5’s past year of commit activity
    Python 0 Apache-2.0 75 0 0 Updated Feb 6, 2024

Top languages

Loading…

Most used topics

Loading…