Stars
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
pytorch1.0 updated. Support cpu test and demo. (Use detectron2, it's a masterpiece)
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Tutorials on how to implement a few key architectures for image classification using PyTorch and TorchVision.
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Automatic image captioning model based on Caffe, using features from bottom-up attention.
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"
neural module network on the GQA dataset