Innovative AI/ML Engineer and dynamic Data Scientist providing a diverse range of services, including project development, teaching, workshops, technical writing, and career coaching. My skill set includes (not limited to):
โ Programming Languages: Python, Java, Rust, JavaScript, C++
โ AI/ML Frameworks: PyTorch, TensorFlow, JAX, Keras, XGBoost, sktime, LightGBM, tinygrad, micrograd, CatBoost, langchain, Llama-index, Haystack, langGraph, AutoGen Crew AI, Agentic Transformer
โ ML Architecture: Feature/Training/Inference pipeline Architecture, Online Real-time pipeline/Asynchronous pipeline/Batch ML serving architecture, 4-stage architecture, Two-tower model (flexible neural network design)
โ Key Domains: Regression, Classification, NLP, LLM, RAG, Computer Vision, Neural Networks, Ensemble Methods, Clustering, Dimensionality Reduction
โ Data Engineering: dbt, Terraform, SQL, BigQuery, PySpark, Databricks
โ MLOps: AWS(AWS SageMaker, Fargate, lambda, S3 Bucket, SQS, VPN), GCP(Vertex AI, GCR, GKE, GCS, pub/sub), Azure(Data Factory, SQL Data Warehouse, Synapse), Cosmos DB, W&B, MLFlow, Comet ML,Qwak, Databricks, Apache Spark, Kafka, Bytewax
โ LLM: OpenAI, Anthropic, Azure, Llama-3, Mistral, Multi-modal LLM(TTS/STT/VST/AST), SDXL, Gemini, Perplexity, RAG, TAG, KAG
โ APIs: Flask, FastAPI
โ Apps: Streamlit, Gradio
โ Cloud Platforms: GCP, AWS, Azure
โ CI/CD: Git, GitLab, Jenkins, Docker, Kubernetes, CircleCI, Terraform, CDK, Pulumi
โ Orchestrator: Docker Swarm, ECS, K8s, Airflow, Kubeflow, ZenML, PipeDream
โ Streaming: Apache Kafka, Bytewax, CDC pattern, RabbitMQ, GCP pub/sub
๐ฐ My end-to-end projects can be found in these repositories. Feel free to click โญ if you like them ๐
Project Name | Main Libraries | Cloud Service | App | DevOps Best Practice |
---|---|---|---|---|
ML/MLOps | ||||
MLOps Credit Default | Scikit-learn LightGBM MLflow |
AWS/Databricks | Experiment Tracking Model Registry Model/Data Monitoring Data Validation Linting Formatting Testing Error Handling Pre-Commit IaC CI/CD |
|
Medical Insurance Costs Prediction | Scikit-learn TensorFlow SageMaker Comet ML Flask |
AWS | Experiment Tracking Model Registry Model/Data Monitoring Model/Data Linting Formatting Testing Error Handling Coverage IaC CI/CD |
|
Stroke Prediction | Scikit-learn XGBoost SageMaker Comet ML Flask Docker |
AWS | Experiment Tracking Model Registry Model Monitoring Containerization Testing Error Handling |
|
Car Price Prediction | Scikit-learn TensorFlow MLFlow Prefect Flask Docker Grafana Terraform |
AWS | Experiment Tracking Model Registry Model Monitoring Orchestration Containerization Linting Formatting Testing Error Handling IaC CI/CD |
|
Taxi Rides Prediction | Scikit-learn TensorFlow MLFlow Prefect FastAPI Docker |
GCP | Experiment Tracking Model Registry Model Monitoring Orchestration Containerization Error Handling |
|
Music Clustering | Scikit-learn FastAPI Docker |
GCP | Streamlit | Experiment Tracking Model Registry Model Monitoring Orchestration ContainerizationError Handling |
Birds Classification | Pytorch | Gradio | ||
Food Prediction | Scikit-learn TensorFlow OpenCV FastAPI Docker |
GCP | Streamlit | Containerization |
LLM, RAG and Fine-tuning | ||||
RAG Hybrid Search and Semantic Caching | Qdrant FastEmbed SPLADE Hugging Face Transformers |
Error Handling Linting Formatting |
||
Multimodal Bill Scan System | AWS Bedrock AWS DynamoDB AWS SQS/SNS AWS CDK Claude 3 Sonnet |
AWS | Error Handling Linting Formatting IaC |
|
IaC in RAG Applications with Terraform | AWS Bedrock LangChain AWS Opensearch Terraform Titan |
AWS | Testing Error Handling Linting Formatting IaC |
|
Scalable RAG in AWS with Fargate | OpenAI LlamaIndex Qdrant AWS CDK/Fargate FastAPI |
AWS | Testing Error Handling |
|
RAG Deployment with Azure Functions | OpenAI LangChain Qdrant Azure Functions App |
Azure | Linting Formatting Testing Error Handling Error Handling |
|
Scalable RAG with Kubernetes | OpenAI LlamaIndex Qdrant Docker FastAPI GKE |
GCP | Streamlit | Containerization Linting Formatting Testing Error Handling CI/CD |
Research Papers Semantic Search | OpenAI LangChain Qdrant Docker AWS API Gateway |
AWS | Streamlit | Containerization Linting Formatting Testing Error Handling |
Video Summarization | Hugging Face Transformers Whisper Langchain ChromaDB |
Streamlit | Error Handling | |
Multimodal RAG with Video Frames | Gemini LlamaIndex Qdrant |
|||
Books Reranking Semantic Search | OpenAI LlamaIndex Deep Lake |
|||
RAG Evaluation with Ragas | OpenAI Hugging Face Transformers Faiss LangChain Ragas |
|||
PII RAG LlamaIndex Milvus | OpenAI Presidio LlamaIndex Milvus |
|||
Multimodal RAG with PyMuPDF | OpenAI Qdrant LlamaIndex PyMuPDF |
|||
Agentic RAG LlamaIndex Milvus | OpenAI Claude LlamaIndex Milvus |
|||
Agentic RAG with LangChain | OpenAI Groq LangChain Pinecone |
|||
Agentic RAG with CrewAI | OpenAI LangChain Qdrant CrewAI Agents |
|||
Fine Tuning Gemma 2B | Hugging Face Transformers PEFT (LoRA/QLoRA) |
HuggingFace | ||
Data Analysis + Modeling | ||||
News Classification | Scikit-learn (Multinomial Naive Bayes) Tensorflow (CNN, RNN, feedforward) |
Streamlit | ||
Breast Cancer Classification | Scikit-learn Spark |
IBM | ||
Bank Churn Classification | Scikit-learn LightGBM XGBoost CatBoost |
|||
Data Engineering | ||||
Hotel Reviews | Prefect Spark SQL BigQuery dbt Terraform Looker |
GCP | Orchestration Linting Formatting Error Handling Pre-Commit IaC CI/CD |
|
Air Quality Switzerland | Mage dbt SQL BigQuery Docker Terraform Looker |
GCP | Orchestration IaC Containerization CI/CD |
|
Miscellaneous | ||||
Justicio Web Scraping | Beautiful Soup MySQL |
Error Handling |
Visual Studio Code HTML5 CSS3 Jupyter Notebook MySQL SQLite Tableau PowerBI Looker Studio Python Pandas NumPy Plotly Matplotlib Databricks Spark scikit-learn TensorFlow OpenAI FastAPI Flask Docker Kubernetes Anaconda Linux Ubuntu Google Cloud AWS Terraform Prefect dbt MLflow GitHub Actions Git Streamlit