This 4 day workshop is intended to introduce participants to the python language. It is designed to provide the solid foundation needed to conduct data analysis and visualization for data science. While no previous experience is required, some basic programming or data science experience is helpful.
I will lean heavily on the book Python for Data Analysis (as well as the Python Data Science Handbook).
The first day will focus on the fundamentals of data types and flow structures while the ultimate goal of the course will be to introduce you to statistical thinking, data literacy and modeling.
DataCamp is a pretty good resource for students to learn coding and data analysis skills. By completing the DataCamp courses listed below we would be able to significantly shorten the time we spend on basics and open up more space for data science concepts.
If you have extra time:
And much more advanced and totally optional:
- Python Data Science Toolbox (Part 1)
- Python Data Science Toolbox (Part 2)
- Statistical Thinking in Python (Part 1)
- Statistical Thinking in Python (Part 2)
The most convenient environment for you to code in might be Google Colab, for which you probably need a gmail account. It does not hurt to look at the 2-minute intro video. If you prefer a real IDE, I would recommend Visual Studio or PyCharm. (I will not be able to help much with the latter though)
- basic data types: lists, tuples, dictionaries, strings
- control structures (for, if else, while)
- functions
- numpy arrays: slicing and subsetting, axis
- Probabilistic Simulations
- basic plots
- pandas Data Frames: slicing and subsetting
- Counting and Summary Statistics
- Handling Files
- Grouped Operations
- plotting with pandas
- Contingency Tables as models
- A/B Testing and sampling distributions
- Hypothesis Testing
- parametric
- permutation
- the bootstrap
- regression
- simple and multiple
- logistic
- categorical variables and interactions
- regularization
- Basic ML tools
- Cross Validation
- sklearn
- Data Cleaning
- Classification and Regression Trees
- Random Forests and Boosting
- Exlainable ML
- Partial dependence plots
- SHAP values
Professor for Mathematics and Statistics
Berlin School of Economics and Law
https://www.linkedin.com/in/loecher/