Data Science

LIFE CYCLE

When I used to do consulting, I’d always seek to understand an organization’s context for developing data projects, based on these considerations:

Strategy: What is the organization trying to do (objective) and what can it change to do it better (levers)?
Data: Is the organization capturing necessary data and making it available?
Analytics: What kinds of insights would be useful to the organization?
Implementation: What organizational capabilities does it have?
Maintenance: What systems are in place to track changes in the operational environment?
Constraints: What constraints need to be considered in each of the above areas?

WORKFLOWS

kaggle

PLATFORMS

Uber, google, netflix, airbnb, etc

STACK

Medium on canonical stack

Data Science OKR KPI

OKR vs KPI, strategic vs tactical
Difference between KPI targets and goals
Comet ml on medium

For the Data Driven manager (not ds)
Measuring DS business value
Best KPIS for DS - the best is what not to do

Being a DS / Researcher

A day in a life
Advice for a ds, business kpi are not research kpi, etc
Review of deep learning papers and co authorship
Full stack DS Uri Weiss

Team Building / Group Cohesion

DS vs DA vs MLE - the most intensive diagram post ever. This is the mother load of figure references.

References:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Why data science needs generalists not specialists

(good advice) Building a DS function (team)

Agile for data-science-research

How to manage a data science research team using agile methodology, not scrum and not kanban
Workflow for data science research projects
Tips for data science research management
IMO a really bad implementation of agile for data-science-projects

SOTA AND CURRENT TRENDS SUMMARIES

ICLR 2019
Medium
State of ai, a yearly report

Building Data/DS teams

(great) the data team a short story by erik bern

YOUTUBE COURSES

DEEPNET.TV YOUTUBE (excellent)
Mitchel ML Lectures (too long)
Quoc Les (google) wrote DNN tutorials and 3H video (not intuitive)
KDnuggets: numpy, panda, scikit, tutorials.
Deep learning online book (too wordy)
Genetic Algorithms - grid search hyper params better than brute force.. obviously
- CNN tutorial
Introduction to programming in scikit
SVM in scikit python
Sklearn scipy PCA tutorial
RNN
Matrix Multiplication - linear algebra

Deep learning Course

Kadenze - deep learning tensor flow - Histograms for (Image distribution - mean distribution) / std dev, are looking quite good.

Machine Learning Course

Recommended: Udacity includes ML and DL
Week1: Introduction Lesson 4: Supervised, unsupervised.
Lesson 6: model regression, cost function
Lesson 71: optimization objective, large margin classification
PCA at coursera #1
PCA at coursera #2
PCA #3
SVM at coursera #1 - simplified

Predictive Analytics Course

Syllabus

Week 2: Lesson 29: supervised learning

Lesson 36: From rules to trees

Lesson 43: overfitting, then validation, then accuracy

Lesson 46: bootstrap, bagging, boosting, random forests.

Lesson 52: NN

Lesson 55: Gradient Descent

Lesson 59: Logistic regression, SVM, Regularization, Lasso, Ridge regression

Lesson 64: gradient descent, stochastic, parallel, batch.

Unsupervised: Lesson X K-means, DBscan

BOOKS & NOTEBOOKS

Machine learning design patterns, git notebooks!, medium
1. DP1 - transform Moving an ML model to production is much easier if you keep inputs, features, and transforms separate
2. DP2 - checkpoints Saving the intermediate weights of your model during training provides resilience, generalization, and tunability
3. DP3 - virtual epochs Base machine learning model training and evaluation on total number of examples, not on epochs or steps
4. DP4 - keyed predictions Export your model so that it passes through client keys
5. DP5 - repeatable sampling use the hash of a well distributed column to split your data into training, validation, and testing
Gensim notebooks - from w2v, doc2vec to nmf, lda, pca, sklearn api, cosine, topic modeling, tsne, etc.
Deep learning with python - francois chollet, deep learning & vision git notebooks!, official notebooks.
Yandex school, nlp notebooks
Machine learning engineering book (i.e., data science)
Interpretable Machine Learning book

COST

GPT2/3

Patents

Method Patent Exceptionalism

General Advice

(really good) Practical advice for analysis of large, complex data sets - distributions, outliers, examples, slices, metric significance, consistency over time, validation, description, evaluation, robustness in measurement, reproducibility, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-science.md

data-science.md

Data Science

LIFE CYCLE

WORKFLOWS

PLATFORMS

STACK

Data Science OKR KPI

Being a DS / Researcher

Team Building / Group Cohesion

Agile for data-science-research

SOTA AND CURRENT TRENDS SUMMARIES

Building Data/DS teams

YOUTUBE COURSES

Deep learning Course

Machine Learning Course

Predictive Analytics Course

BOOKS & NOTEBOOKS

COST

Patents

General Advice

Files

data-science.md

Latest commit

History

data-science.md

File metadata and controls

Data Science

LIFE CYCLE

WORKFLOWS

PLATFORMS

STACK

Data Science OKR KPI

Being a DS / Researcher

Team Building / Group Cohesion

Agile for data-science-research

SOTA AND CURRENT TRENDS SUMMARIES

Building Data/DS teams

YOUTUBE COURSES

Deep learning Course

Machine Learning Course

Predictive Analytics Course

BOOKS & NOTEBOOKS

COST

Patents

General Advice