Starred repositories
Notes talking about the design and implementation of Apache Spark
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Elasticsearch in Action Book
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
a fast, scalable, multi-language and extensible build system
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
4th Place solution for the Kaggle CommonLit Readability Prize
A Python package for manipulating 2-dimensional tabular data structures
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Annotations of the interesting ML papers I read
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
This repo contains annotated research papers that I found really good and useful
Docker images and test runners that replicate the live AWS Lambda environment
Latex code for making neural networks diagrams
A cross-platform unofficial Google Assistant Client for Desktop (powered by Google Assistant SDK)
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
This repository contains demos I made with the Transformers library by HuggingFace.
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
GUI for annotation of images in darknet format
Research papers with annotations, illustrations and explanations
A learning rate range test implementation in PyTorch