Awesome Machine Learning Engineer

For more awesomeness, check out .

What is this and how do I use it?

This is a curated list of delightful resources for everything you need to develop Machine Learning solutions.
All resources are structured as follows: [Content level] [Page title] - [Description] ([Reading time]).
- There are three content levels:
  1. 🐥 Essential reading for all ML engineers
  2. 🐍 Advanced reading for professional ML engineers
  3. 🦄 Expert material for expert ML engineers
- Descriptions are written so that they complete the sentence "After reading this article you will have learned to ...".

Communication

🐥 BLUF: The Military Standard That Can Make Your Writing More Powerful - Make your communication more powerful. (5 min)
🐥 The XY Problem - Focus on explaining your end goal when asking for help. (5 min)
🐍 Understanding MECE - Write structured lists in your documents and communication. (10 min)
🦄 Nonviolent communication - Deliver constructive feedback in difficult situations. (10 min)
🐍 SMART criteria - Define goals in a structured way. (5 min)
🐥 Sombody else's problem - Don't make it SEP. (1 min)
🦄 The Halo effect - Account for the cognitive bias that might influence the way you view others. (10 min)
🐍 SCQA: What is it, how does it work, and how can it help me? - Structure your presentation, proposals, and sales outlines. (5 min)
🐥 E-mail like a boss - Write better e-mails. (1 min)
🦄 Mythical Man Month - The Cliff Notes - Understand the relationship between person-days and throughput time in a project (5 min)
🐥 Bike-shedding: how mature are you as an engineer? - Call out and avoid bike-shedding. (5 min)
🐥 Presentation Rules - Make your slides satisfy essential best practices. (5 min)
🦄 Four-sides model - Carefully consider how you communicate to optimize its result. (15 min)
🐥 How to write in plain English - Write in plain English. (15 min)
🐍 No More Misunderstandings - Avoid misunderstandings by paraphrasing. (15 min)

Software Engineering

API design

🐍 FastAPI docs - Build RESTful APIs that correspond one-to-one with the OpenAPI spec. (1-X hours)
🦄 Zalando's RESTful API guidelines - Design sustainable REST APIs. (X hours)
🦄 Microsoft's REST API guidelines - Design sustainable REST APIs. (X hours)
🦄 gRPC compared to REST - Compare the two leading solutions for communication between services. (5 min)
🦄 HTTP response headers for the responsible developer - Optimize your APIs with HTTP headers. (2 min)
🐍 Falsehoods programmers believe about time - Avoid most of the assumptions made about time. (2 min)
🐍 Falsehoods programmers believe about names - Avoid most of the assumptions made about personal names. (5 min)
🐥 Semantic versioning - Assign and increment version numbers of your software. (10 min)
🦄 Keep a changelog - Keep a well-maintained changelog in your software. (10 min)

Version control

🐍 Learn Git Branching - Work on your version control skills at beginner or advanced level. (1 hour)
🐥 The seven rules of a great Git commit message - Write concise and consistent Git commit messages. (5 min)
🐍 Trunk Based Development - Use a simple branching approach that scales well to teams. (10 min)

Code review

🐍 Google's "How to do a code review" - Review code in a way your colleagues will love (Note: Change List ~= Pull Request). (30 min)
🐍 Code Health: Respectful Reviews == Useful Reviews - Resolve code review comments respectfully. (5 min)

Python patterns

🐥 The Definitive Guide to Python import Statements - Resolve common importing problems. (15 min)
🐍 PEP8 style guide, and why is it important? - What PEPs are and what PEP8 is. (5 min)
🐍 PEP20 "The Zen of Python" - Get to know the guiding principles for Python's design. (1 min)
🦄 Python Design Patterns - High-level software engineering architecture patterns in Python. (30 min)
🦄 SOLID - High-level software engineering architecture principles. (30 min)
🦄 The Little Book of Python Anti-Patterns - Low-level Python idioms. (45 min)
🐍 Understanding Python's logging module - Use the logging module effectively. (10 min)
🐍 Please fix your decorators - Why you should probably use wrapt to write your decorators. (10 min)
🦄 Effective Python - Apply 59 ways to write better Python (X hours)

Typing

🐥 Python Type Hints - How to apply both basic and advanced type hints. (5 min)
🐥 The state of type hints in Python - Why you should be using type hints. (5 min)
🐥 Leveraging type system to avoid mistakes - More motivation why you should be using type hints. (5 min)
🦄 Mypy protocols - Use advanced concepts such as Protocols. (15 min)
🐍 Pydantic overview - Stop writing Dict[str, Any] type hints and instead use BaseModels. (10 min - 1 hour)
🐍 Enums - Stop writing magic strs and instead use Enums. (5 min)

Curated Python packages

🐥 Black: The Uncompromising Code Formatter - Use Black to end all formatting discussions. (5 min)
🐍 bump2version - Release new versions of your packages with a single command. (15 min)
🐍 coloredlogs: Colored terminal output for Python's logging module - Scan logs more easily by coloring them. (1 min)
🐍 hvPlot: A high-level plotting API - Use a pandas to create plots with HoloViews, rendered by Bokeh. (30 min)
🐥 Flake8 - Use Flake8: pyflakes for common errors, pycodestyle for PEP8-compiancy, and mccabe for code complexity. (10 min)
🐍 Portray: Your Project with Great Documentation - Generate documentation for your projects with no configuration. (30 min)
🐍 pydocstyle - Use pydocstyle to check compliance with Python docstring conventions. (5 min)
🐍 birdseye - Graphically debug your Python code. (10 min)
🦄 hypothesis-auto - Write fully automatic unit tests based on type annotations. (30 min)
🐍 scalene: a high-performance CPU and memory profiler for Python - Profile CPU and memory usage by line in Python. (10 min)
🐍 SnakeViz - Use an interactive profiler in Jupyter Lab to identify bottlenecks. (10 min)
🐍 tqdm - Easily add progress bars to long-running jobs. (5 min)

Machine Learning

Practical theory

While, in theory, you can just download Tensorflow and start making deep neural networks, it doesn’t hurt to know some of the theory and philosophy that lies behind the algorithms that so many of us know and love today.

Learning Machine Learning: An online comic from Google AI - Understand the basics of supervised and unsupervised learning. (15 min)
Rules of Machine Learning by Google
Bias and variance - Distinguish between different types of prediction error. (5 min)
Bias and variance and the .632 rule - Balance bias and variance when bootstrapping. (10 min)
Generalization performance & model selection, nested cross-validation - Use best practices for cross-validation. (10 min)
Stacking strategies with and without leaks - Choose the right cross-validation strategy when stacking. (15 min)
Backpropagation is the chain rule to compute the gradient - Make the connection between backpropagation and the chain rule. (20 min)
Backprop is not just the chain rule - Make the connection between backpropagation and Lagrange multipliers. (20 min)
You're all calculating churn rates wrong - Correctly define what churn is. (15 min)

Pandas

🐍 Modern Pandas series (Part 1 - 7) - Write idiomatic pandas. (1 hour)

Sklearn

Custom Estimators - Create your own custom estimator (20 min)
Pipelines - Combine transformers and estimators into pipelines (15 min)
Pipelines and custom Estimators
Tuning hyperparameters - Implement grid search and randomized search for parameter optimization. (10 min)

DevOps

CI/CD

🐍 invoke - Implement common tasks you run on your projects as a CLI. (30 min)

Package management

Understanding Conda and Pip - Know the advantages of Conda over Pip. (5 min)
Conda tutorial - Manage packages and reproducible environments using one tool. (15 min)
Conda package index - Search for packages in Anaconda Cloud. (1 min)
Conda myths - Debunk some common myths and misconceptions about Conda. (5 min)
Conda in-depth

Containerization

Docker getting started
Dockerfile best practices - Build efficient images (30 min)
Dockerizing Python is hard
Multi-stage builds #3: Why your build is surprisingly slow, and how to speed it up
BuildKit Features You Might Want to Know About

Shell

Terraform

Terraform best practices

Infrastructure

Related awesome lists

The Missing Semester of Your CS Education - A collection of skills that are often expected to be self-taught.
A Survey of Deep Learning for Scientific Discovery - An overview of Deep Learning tasks and approaches.
Flake8 extensions - An overview of Flake8 extensions.

To add

TODO: Mypy strict mode.
TODO: Raymond Hettinger
TODO: Gridsearch vs random search vs Bayesian hyperparam optimization (gaussian processes)
TODO: Comparison of bayesian hyperparam optimizers (PyGPGO)
TODO: conda vs virtualenv, pyenv, pipenv.
TODO: explain how conda-forge works.
TODO: explain registries (Docker Hub, ECR, GitLab)
TODO: explain environment.yml + interactions with Docker.
TODO: S3, DynamoDB, MongoDB
TODO: CVE scans (frontend and backend)
TODO: OSS license scan
TODO: mutual TLS, IP whitelisting, (VPN)
TODO: Kinesis streams
TODO: Linting built-in to Terraform with -check.
TODO: Tech in our pure cookiecutter scaffolding.
TODO: Cherry picking?
TODO: MLOps
TODO: KISS, DRY
TODO: mamba
TODO: pre-commit
TODO: selected Flake8 extensions
TODO: selected Pytest extensions
TODO: cookiecutter & cruft (as a standalone repo?)
TODO: https://pypi.org/project/snoop/

Curated by Radix

Radix is a Belgium-based Machine Learning company.

We invent, design and develop AI-powered software. Together with our clients, we identify which problems within organizations can be solved with AI, demonstrating the value of Artificial Intelligence for each problem.

Our team is constantly looking for novel and better-performing solutions and we challenge each other to come up with the best ideas for our clients and our company.

Here are some examples of what we do with Machine Learning, the technology behind AI:

Help job seekers find great jobs that match their expectations. On the Belgian Public Employment Service website, you can find our job recommendations based on your CV alone.
Help hospitals save time. We extract diagnosis from patient discharge letters.
Help publishers estimate their impact by detecting copycat articles.

We work hard and we have fun together. We foster a culture of collaboration, where each team member feels supported when taking on a challenge, and trusted when taking on responsibility.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Machine Learning Engineer

What is this and how do I use it?

Contents

Communication

Software Engineering

API design

Version control

Code review

Python patterns

Typing

Curated Python packages

Machine Learning

Practical theory

Pandas

Sklearn

DevOps

CI/CD

Package management

Containerization

Shell

Terraform

Infrastructure

Related awesome lists

To add

Curated by Radix

About

Releases

Packages

laranea/awesome-machine-learning-engineer

Folders and files

Latest commit

History

Repository files navigation

Awesome Machine Learning Engineer

What is this and how do I use it?

Contents

Communication

Software Engineering

API design

Version control

Code review

Python patterns

Typing

Curated Python packages

Machine Learning

Practical theory

Pandas

Sklearn

DevOps

CI/CD

Package management

Containerization

Shell

Terraform

Infrastructure

Related awesome lists

To add

Curated by Radix

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages