🚀 What is tinyllm?

tinyllm is a lightweight framework for developing, debugging and monitoring LLM and Agent powered applications at scale. The main goal of the library is to keep code as simple and readable as possible while allowing user to create complex agents or LLM workflows in production.

Function and its streaming equivalent FunctionStream are the core classes in tinyllm. They are designed to standardize and control LLM, ToolStore and any relevant calls for scalable production use in stream mode and otherwise.

It provides a structured approach to handle various aspects of function execution, including input/output validation, output processing, error handling, evaluation, all while keeping code readable. You can create a chain with its own prompt, LLM model and evaluators all in a single file. No need to jump through many class definitions, no spaghetti code. Any other library agent/chain (langchain/llama-index...) can also seamlessly be imported as a tinyllm Function.

🚀 Install

pip install tinyllm

🚀 Getting started

🚀 Features

Build LLM apps with:

LiteLLM integration: 20+ model providers available (OpenAI, Huggingface etc ...)
Langfuse integration: Monitor trace and debug LLMs, Agents, Tools, RAG pipelines etc in structured run trees
Agents: An agent is an LLM with Memory, a Toolkit and an ExampleManager
ToolStore and Toolkits: let your Agent run python functions using ToolStore
Example manager: constant examples + variable examples using and example selector with similarity search
Memory: conversations history
Retrieval Augmented Generation: RAG tools to search and generate answers
Evaluation: Evaluators can be defined to evaluate and log the quality of the function's output in real-time
PGVector store: PostgreSQL DB with the pgvector extension for vector storage.
Prompt engineering tools: utility modules for prompt engineering, optimization and string formatting

🚀 Deploy to production with:

Layered validation: 3 validations happen during the Function lifecycle: input, output and output processing.
IO Standardization: Maintains consistent response patterns and failure handling across different function implementations.
Observability: Integrates with Langfuse for
Logging: Records detailed logs for debugging and auditing purposes.
Finite State Machine design: Manages the function's lifecycle through defined states, ensuring controlled and predictable execution.

Tiny function wrapper

class RiskScoreOutput(BaseModel):
    risk_score: float

@tiny_function(output_model=RiskScoreOutput)
async def calculate_risk_score(bank_account_history: str, employment_history: str):
    """
    <system>
    Extract a Risk Score between 0 and 1 for a Credit Card application based on bank account and employment history.
    </system>

    <prompt>
    Given the bank account history: {bank_account_history}
    And the employment history: {employment_history}
    Calculate the risk score for a credit card application.
    </prompt>
    """
    pass

Tracing with Langfuse

Background and goals

Many of the LLM libraries today (langchain, llama-index, deep pavlov...) have made serious software design commitments which I believe were too early to make given the infancy of the industry. The goals of tinyllm are:

Solve painpoints from current libraries: lack of composability (within + between libraries), complex software designs, code readability, debugging and logging.
High level, robust abstractions: tinyllm is designed to be as simple as possible to use and integrate with existing and living codebases.
Human and machine readable code to enable AI powered and autonomous chain development

API model

LLM Functions are designed to behave like a web API. All Functions will always, even if failed, return a dictionary response.

Validation

Validations are defined through a Pydantic model and are provided to the Function using input_validator, output_validator and output_processing_validator args to a Function

Tracing

tinyllm is integrated with Langfuse for tracing chains, functions and agents.

Managing configs and credentials

Configs are managed through a tinyllm.yaml file. It gets picked up at runtime in tinyllm.init and can be placed in any of /Documents, your root folder, or the current working directory. An empty tinyllm.yaml file is at the source of the repo to get you setup.

⚡ Concurrency vs Parallelism vs Chaining

These tend to be confusing across the board. Here's a quick explanation:

Concurrency : This means more than 1 Input/Ouput request at a time. Just like you can download 10 files concurrently on your web browser, you can call 10 APIs concurrently.
Chaining : An ordered list of Functions where a Function's output is the input of the next Function in the chain.
Parallelism : compute/calculations being performed on more than 1 process/CPU Core on the same machine. This is what model providers like OpenAI do using large GPU clusters (Nvidia, AMD...). This is used for "CPU Bound" tasks.

Tinyllm does not care about Parallelism. Parallelism is implemented by LLM providers on a GPU/CPU level and should be abstracted away using an LLM microservice. Tinyllm only cares about Concurrency, Chaining and organizing IO Bound tasks.

Logging

Finite state machine with predictable and controlled state transitions for easy debugging of your chains/compute graphs.

Below is the start and end of a trace for asking "What is the weather in Puerto Rico?" to an Agent with a get_weather Tool.

INFO | tinyllm.function | 2023-12-25 19:37:10,617 : [Standard example selector] transition to: States.INIT 
INFO | tinyllm.function | 2023-12-25 19:37:12,720 : [BufferMemory] transition to: States.INIT 
INFO | tinyllm.function | 2023-12-25 19:37:12,729 : [get_weather] transition to: States.INIT 
INFO | tinyllm.function | 2023-12-25 19:37:12,729 : [Toolkit] transition to: States.INIT 
INFO | tinyllm.function | 2023-12-25 19:37:12,731 : [LiteLLM] transition to: States.INIT 
...
...
INFO | tinyllm.function | 2023-12-25 19:37:17,150 : [AnswerCorrectnessEvaluator] transition to: States.PROCESSING_OUTPUT 
INFO | tinyllm.function | 2023-12-25 19:37:17,151 : [AnswerCorrectnessEvaluator] transition to: States.PROCESSED_OUTPUT_VALIDATION 
INFO | tinyllm.function | 2023-12-25 19:37:17,151 : [AnswerCorrectnessEvaluator] transition to: States.COMPLETE 
INFO | tinyllm.function | 2023-12-25 19:37:17,846 : [Agent] transition to: States.PROCESSING_OUTPUT 
INFO | tinyllm.function | 2023-12-25 19:37:17,847 : [Agent] transition to: States.PROCESSED_OUTPUT_VALIDATION 
INFO | tinyllm.function | 2023-12-25 19:37:17,847 : [Agent] transition to: States.COMPLETE 
{'status': 'success', 'output': {'response': {'id': 'chatcmpl-8ZpjY0QmXbDiMIcSRwKuCUny4sxul', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "It is 25 degrees celsius in Puerto Rico", 'role': 'assistant'}}], 'created': 1703551035, 'model': 'gpt-3.5-turbo-0613', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'completion_tokens': 12, 'prompt_tokens': 138, 'total_tokens': 150}, '_response_ms': 785.606}}}

⚡ Concurrency vs Parallelism vs Chaining

These tend to be confusing across the board. Here's a quick explanation:

Concurrency : This means more than 1 Input/Ouput request at a time. Just like you can download 10 files concurrently on your web browser, you can call 10 APIs concurrently.
Chaining : An ordered list of Functions where a Function's output is the input of the next Function in the chain.
Parallelism : compute/calculations being performed on more than 1 process/CPU Core on the same machine. This is what model providers like OpenAI do using large GPU clusters (Nvidia, AMD...). This is used for "CPU Bound" tasks.

Tinyllm does not care about Parallelism. Parallelism is implemented by LLM providers on a GPU/CPU level and should be abstracted away using an LLM microservice. Tinyllm only cares about Concurrency, Chaining and organizing IO Bound tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 418 Commits
.github/workflows		.github/workflows
docs		docs
tinyllm		tinyllm
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.excalidraw		architecture.excalidraw
commit.sh		commit.sh
release.sh		release.sh
setup.py		setup.py
tinyllm.yaml		tinyllm.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 What is tinyllm?

🚀 Install

🚀 Getting started

Setup

Examples

🚀 Features

Build LLM apps with:

🚀 Deploy to production with:

Tiny function wrapper

Tracing with Langfuse

Background and goals

API model

Validation

Tracing

Managing configs and credentials

⚡ Concurrency vs Parallelism vs Chaining

Logging

⚡ Concurrency vs Parallelism vs Chaining

About

Releases

Packages

Contributors 2

Languages

License

zozoheir/tinyllm

Folders and files

Latest commit

History

Repository files navigation

🚀 What is tinyllm?

🚀 Install

🚀 Getting started

Setup

Examples

🚀 Features

Build LLM apps with:

🚀 Deploy to production with:

Tiny function wrapper

Tracing with Langfuse

Background and goals

API model

Validation

Tracing

Managing configs and credentials

⚡ Concurrency vs Parallelism vs Chaining

Logging

⚡ Concurrency vs Parallelism vs Chaining

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages