-
Pacific Northwest National Laboratory
- Seattle
- http://samtube405.github.io/_profile/
- @samtube405
Stars
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
Data processing with ML, LLM and Vision LLM
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
potato: portable text annotation tool
This is a repo with links to everything you'd ever want to learn about data engineering
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
AgentSearch is a framework for powering search agents and enabling customizable local search.
library supporting NLP and CV research on scientific papers
Implementation of Nougat Neural Optical Understanding for Academic Documents
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
✨✨Latest Advances on Multimodal Large Language Models
A curation of awesome tools, documents and projects about LLM Security.
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
Reimplementation of the task generation part from the Alpaca paper
Fast & Simple repository for pre-training and fine-tuning T5-style models
Tools for understanding how transformer predictions are built layer-by-layer