When I used to do consulting, I’d always seek to understand an organization’s context for developing data projects, based on these considerations:
- Strategy: What is the organization trying to do (objective) and what can it change to do it better (levers)?
- Data: Is the organization capturing necessary data and making it available?
- Analytics: What kinds of insights would be useful to the organization?
- Implementation: What organizational capabilities does it have?
- Maintenance: What systems are in place to track changes in the operational environment?
- Constraints: What constraints need to be considered in each of the above areas?
- OKR vs KPI, strategic vs tactical
- Difference between KPI targets and goals
- Comet ml on medium
- For the Data Driven manager (not ds)
- Measuring DS business value
- Best KPIS for DS - the best is what not to do
- A day in a life
- Advice for a ds, business kpi are not research kpi, etc
- Review of deep learning papers and co authorship
- Full stack DS Uri Weiss
- DS vs DA vs MLE - the most intensive diagram post ever. This is the mother load of figure references.
References:
Why data science needs generalists not specialists
- (good advice) Building a DS function (team)
- How to manage a data science research team using agile methodology, not scrum and not kanban
- Workflow for data science research projects
- Tips for data science research management
- IMO a really bad implementation of agile for data-science-projects
- DEEPNET.TV YOUTUBE (excellent)
- Mitchel ML Lectures (too long)
- Quoc Les (google) wrote DNN tutorials and 3H video (not intuitive)
- KDnuggets: numpy, panda, scikit, tutorials.
- Deep learning online book (too wordy)
- Genetic Algorithms - grid search hyper params better than brute force.. obviously
- Introduction to programming in scikit
- SVM in scikit python
- Sklearn scipy PCA tutorial
- RNN
- Matrix Multiplication - linear algebra
Kadenze - deep learning tensor flow - Histograms for (Image distribution - mean distribution) / std dev, are looking quite good.
- Recommended: Udacity includes ML and DL
- Week1: Introduction Lesson 4: Supervised, unsupervised.
- Lesson 6: model regression, cost function
- Lesson 71: optimization objective, large margin classification
- PCA at coursera #1
- PCA at coursera #2
- PCA #3
- SVM at coursera #1 - simplified
Week 2: Lesson 29: supervised learning
Lesson 36: From rules to trees
Lesson 43: overfitting, then validation, then accuracy
Lesson 46: bootstrap, bagging, boosting, random forests.
Lesson 59: Logistic regression, SVM, Regularization, Lasso, Ridge regression
Lesson 64: gradient descent, stochastic, parallel, batch.
Unsupervised: Lesson X K-means, DBscan
- Machine learning design patterns, git notebooks!, medium
- DP1 - transform Moving an ML model to production is much easier if you keep inputs, features, and transforms separate
- DP2 - checkpoints Saving the intermediate weights of your model during training provides resilience, generalization, and tunability
- DP3 - virtual epochs Base machine learning model training and evaluation on total number of examples, not on epochs or steps
- DP4 - keyed predictions Export your model so that it passes through client keys
- DP5 - repeatable sampling use the hash of a well distributed column to split your data into training, validation, and testing
- Gensim notebooks - from w2v, doc2vec to nmf, lda, pca, sklearn api, cosine, topic modeling, tsne, etc.
- Deep learning with python - francois chollet, deep learning & vision git notebooks!, official notebooks.
- Yandex school, nlp notebooks
- Machine learning engineering book (i.e., data science)
- Interpretable Machine Learning book
(really good) Practical advice for analysis of large, complex data sets - distributions, outliers, examples, slices, metric significance, consistency over time, validation, description, evaluation, robustness in measurement, reproducibility, etc.