This repo contains companion video explanation and code walkthrough from my YouTube channel @johnnycode. If the code and video helped you, please consider:
ChromaDB is a user-friendly vector database that lets you quickly start testing semantic searches locally and for free—no cloud account or Langchain knowledge required. I’ll guide you through installing ChromaDB, creating a collection, adding data, and querying the database using semantic search.
Do you use YouTube for learning? I'll show you how to generate high-quality notes from YouTube videos using just a bit of Python. We’ll extract transcripts from videos and use Google’s Gemini Flash Large Language Model (LLM) to convert them into concise notes. Then, we’ll save these notes in a vector database (ChromaDB) and show you how to use LLM to ask questions on your saved notes.
I'll guide you through how to set up a ChromaDB instance using Docker Compose, including configuring authentication methods like Token-based and Role-based access control. We’ll start by getting ChromaDB up and running quickly in a Docker container, accessible via an HTTP client without authentication. Then, we'll add token-based authentication for a single admin user, followed by role-based token authentication to support multiple users with different permissions. Additionally, see how to build a custom Docker image to resolve the 'ModuleNotFoundError: No module named hypothesis' error in ChromaDB version 0.5.2 and install additional packages into the container.
I’ll show you how to build a multimodal vector database using Python and the ChromaDB library. We’ll start by setting up an Anaconda environment, installing the necessary packages, creating a vector database, and adding images to it. I’ll guide you through querying the database with text to retrieve matching images and demonstrate how to use the 'Where' metadata filter to refine your search results.
I'll show you how I was able to vectorize 33,000 embeddings in about 3 minutes using Python's Multiprocessing capability and my GPU (CUDA). The key is to split the work into two processes: a producer that reads data and puts it into a queue, and a consumer that pulls data from the queue and vectorizes it using a local model. I tested this on the entire Game of Thrones script, and the results show that using a GPU significantly speeds up the process compared to using the CPU. Give it a try and let me know how it goes for you!
I’ll show you how to easily upgrade your semantic searches by swapping out the default ChromaDB model for the Gemini Pro embedding model. With just a few lines of code, you can enhance your search results using one of the best language models available.
I'll show you how to build a cooking recipe database using ChromaDB and persist the vector database in Google Colab and Google Drive. I first extracted recipes from YouTube cooking videos using Gemini Pro and then stored them in ChromaDB. You can then search for recipes and find the ones that are most relevant to your query! This is part of my Recipe Database tutorial series at RecipeDB Repo.