- Chicago, IL
-
08:51
(UTC -05:00) - in/jesus-cantu217
- https://medium.com/@jesus.cantu217
Stars
A simple screen parsing tool towards pure vision based GUI agent
The repo for "Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator"
ICML 2024 "From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation"
We write your reusable computer vision tools. 💜
Behavioral probing of language acquisition models at the lexical and syntactic level
Pytorch code of for our CVPR 2018 paper "Neural Baby Talk"
Scrape data from Sephora and do product Analysis
A web scraper that gets product names, brands, formatted ingredients, images, and available sizes from Sephora's makeup category, and inserts them into a relational DB with several foreign key rela…
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source …
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…
Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep lear…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Code for paper, 'Extracting Entities of Interest from Comparative Product Reviews', CIKM'17
A curated list of awesome LLVM (including Clang, etc) related resources.
Introduction to predictive modeling in Spark with applications in pharmaceutical bioinformatics
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhan…
Visual Speech Recognition for Multiple Languages
[CVPR2020] "Detecting Attended Visual Targets in Video"
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025)
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection. Generative pretrained transformer for time series trained on over 100B data points. It's …
Lightning ⚡️ fast forecasting with statistical and econometric models.
👕 Open-source course on architecting, building and deploying a real-time personalized recommender for H&M fashion articles.