Repo for AI Republic's AI Engineering Course - Winter 2024
Start Date: October 12, 2024
Schedule: Every Saturday (except November 2nd, Holiday)
Instructors:
- Carlo Almendral
- Doc Ligot
- Xavier Puspus
- Danielle Meer
- Xy De Mesa
- Amber Teng
Gain a foundational understanding of Large Language Models (LLMs) and Natural Language Processing (NLP).
Slides for today: https://docs.google.com/presentation/d/1C3Hx8F_cJKvGfEPX9B1UnEJX2FUlAB34rTjO7KBjyIY/edit?usp=sharing
Day 1 Course Notes and Blog Post
Topic | Time |
---|---|
Introduction | 9:00 AM - 9:30 AM |
Ethics in AI | 9:30 AM - 10:30 AM |
Introduction to NLP (Slides) | 10:30 AM - 11:00 AM |
Introduction to NLP (Notebook) | 11:00 AM - 11:30 AM |
Text Processing Activity | 11:30 AM - 12:00 PM |
Lunch | 12:00 PM - 1:00 PM |
Introduction to LLMs | 1:00 PM - 1:30 PM |
Open-Source vs Closed-Source LLMs | 1:30 PM - 2:00 PM |
Try Your Own LLM Activity | 2:00 PM - 2:30 PM |
Environment Setup | 2:30 PM - 3:00 PM |
Hands-on Activity: Sentiment Analysis on IMDB | 3:00 PM - 4:30 PM |
Introduction to Capstone Project | 4:30 PM - 5:00 PM |
- AI & Ethics
- Fundamentals of NLP and its practical applications
- Introduction to LLMs: Core concepts and operational mechanics
- Differences between open-source and closed-source LLMs
- Setting up your development environment: Google Colab, Anaconda, Terminal
- Intro to Streamlit for quick app deployment
- Capstone Intro
Construct a simple NLP pipeline, including text preprocessing, tokenization, and basic text analysis.
Participants will grasp the essentials of NLP and LLMs, and build a basic NLP pipeline.
Dive into the Hugging Face ecosystem to work with open-source LLMs.
Day 2 Course Notes and Blog Post
Slides for today:
- Introduction to the Hugging Face library and its tools
- Loading and working with pre-trained models and tokenizers
- Exploring few-shot learning with practical examples
- Fine-tuning models on custom datasets using Colab Pro
- Evaluating LLM performance with appropriate metrics
Fine-tune a pre-trained Hugging Face model for a text classification task.
Participants will learn to fine-tune, evaluate, and implement Hugging Face models for specific tasks, gaining insight into few-shot learning.
Explore and utilize the advanced features of Gemma 2B, an open-source LLM.
- Introduction to Gemma 2B and its architectural design
- Setting up and using Gemma 2B for various tasks
- Training custom LLMs with Low-Rank Adaptation (LoRA)
- Comparing Gemma 2B with other open-source models
Fine-tune a Gemma 2B model on a selected dataset (e.g., sentiment analysis, text generation) and train a custom model using LoRA.
Participants will gain practical experience with Gemma 2B, including training custom models and understanding LoRA techniques.
Learn how to effectively utilize OpenAI GPT models for various applications.
- Overview of the OpenAI GPT series and their capabilities
- Accessing and using the OpenAI API
- Building applications with OpenAI GPT (e.g., chatbots, text summarization)
- Ethical considerations and best practices in AI
Develop a chatbot using OpenAI GPT-3/4 API and implement a text summarization tool.
Participants will be equipped to integrate OpenAI GPT models into their projects while considering ethical implications.
Delve into proprietary LLMs, with a focus on Anthropic and other leading models.
- Introduction to Anthropic and its LLM offerings
- Comparing Anthropic with other proprietary models (e.g., Cohere, Gemini)
- Accessing and using proprietary models via APIs
- Real-world applications and case studies
Build a text generation application using Anthropic’s models and compare its performance with other proprietary LLMs.
Participants will gain familiarity with proprietary LLMs and learn to implement applications using their APIs.
Master advanced LLM techniques including LangChains and Retrieval-Augmented Generation (RAG).
- Introduction to LangChains and their use in complex NLP tasks
- Building LangChains for multi-step processes
- Understanding and implementing Retrieval-Augmented Generation (RAG)
- Introduction to Vector DB - Pinecone and Crew AI
Develop a LangChain for a complex NLP task (e.g., document processing and summarization) and create a RAG system integrating retrieval with LLMs.
Participants will learn to build sophisticated NLP pipelines using LangChains and enhance text generation with RAG techniques.
Complete a capstone project and undergo a certification assessment.
- Review of key concepts and techniques from the bootcamp
- Guidelines for the capstone project
Work on a capstone project (e.g., comprehensive NLP application, chatbot, text classification system) and present it to the group for feedback and assessment.
Participants will finalize a capstone project, demonstrating their skills and understanding of the bootcamp content.
Focus on the fundamentals of testing, deployment, and real-world applications of LLMs, followed by a comprehensive certification assessment.
- Develop and deploy chatbots for customer service.
- Create automated content generation tools for marketing.
- Implement sentiment analysis for social media monitoring.
- Build text summarization tools for news aggregation.
- Fine-tune models for specialized industry applications (e.g., legal, medical).
- Develop language translation applications.
- Create personalized recommendation systems based on user text data.
- Implement intelligent virtual assistants for business processes.
- Develop automated code generation and documentation tools.
- Build educational tools for language learning and tutoring.
Focus on the advanced techniques for deploying LLMs in production environments and integrating them into industry-specific applications.
- Advanced deployment strategies for LLMs (e.g., containerization, orchestration)
- Continuous Integration/Continuous Deployment (CI/CD) pipelines for AI projects
- Monitoring and scaling LLMs in production
- Industry-specific case studies (e.g., healthcare, finance, legal)
- Ethical AI deployment: ensuring fairness, transparency, and accountability
Set up a CI/CD pipeline for deploying an LLM-based application and implement monitoring and scaling strategies.
Participants will learn how to deploy LLMs in a production environment, integrate them into specific industries, and maintain ethical AI practices.
Explore how to apply the knowledge and skills from the bootcamp to solve real-world problems, followed by a graduation ceremony.
- Building AI solutions for real-world problems
- Collaborating with cross-functional teams (e.g., product, design, engineering)
- AI ethics and compliance in real-world deployments
- Career paths and opportunities in AI engineering
- Post-bootcamp resources and learning pathways
- Project Implementation (40%)
- Completeness and functionality of the project
- Effective use of LLMs and integration of techniques learned
- Code quality and documentation
- Presentation (20%)
- Clarity and organization of the project presentation
- Explanation of key concepts and design choices
- Ability to answer questions and justify decisions
- Written Exam (30%)
- Multiple-choice and short-answer questions covering workshop topics
- Problem-solving questions requiring code snippets or explanations
- Participation and Engagement (10%)
- Active participation in hands-on activities
- Contribution to discussions and group work
Work in teams to develop a proposal for an AI solution to a real-world problem, integrating the skills learned throughout the bootcamp.
Participants will apply their learning to design AI solutions for practical challenges, preparing them for real-world AI roles. The day will conclude with a graduation ceremony to celebrate the completion of the bootcamp.