Having complete and accurate data is a critical first step to being able to learn from it, but part of the complexity of data science is narrowing down what part of the data is important. In this introductory workshop to Machine Learning you will begin to understand how to narrow down the feature scope of your data so that the predictions are based on causation and not just correlation.
You do not need any prior experience with data science to attend this workshop. You are likely someone who is interested in data science, and has 1-2 years coding in Python, or another programming language and feel comfortable enough with Python to be able to code in it during the workshop. You are interested in learning about how to prepare your data for advanced machine learning models using Python and specific Python libraries.
You should:
- Bring your own laptop (Windows or Mac) with an Internet browser.
- Have coded in Python previously
You will be using Azure Notebooks, a cloud-based Jupyter Notebooks instance. All you will need is a Microsoft Account, which only requires an email address and for which you can sign up for at the event.
You should:
- Bring your own laptop (Windows or Mac) with an Internet browser.
- Have coded in Python and done data cleansing, manipulation, and preparation to run through machine learning models.
You will be using Azure Notebooks, a cloud-based Jupyter Notebooks instance. All you will need is a Microsoft Account, which only requires an email address and for which you can sign up for at the event
This workshop is meant to be highly interactive. The instructor will lead you in two interactive teaching styles:
-
Interactive Lecturing: The majority of content for this workshop is in a Notebook. Though the content will be introduced via PowerPoint, the rest of the workshop will consist of walking them through the Azure Notebooks. During this time, instructors will employ an interactive lecture style, where learners will be asked to participate by asking questions and offering up ideas.
-
Think, Pair, Share: For some of the more complex topics, the instructor will use the "Think, Pair, Share" method. This is where you will be asked a question and given about 45 seconds to think quietly to yourself. During this time it is imperative that you are not discussing with others yet. Then, you will have an opportunity to disucss with the 1-2 people next to you. Make sure you don't just share your answer, but why you think that is the answer. Finally, the isntructor will ask for a few people to share what they discussed with their neighbors.
Notice: Various interactive cues are called out in the Notebooks. These are suggestions and at the instructor's discression.
The primary source of content will be relatively bare Azure Notebooks where the instructor will guide you through discovering the different features of Pandas, general data cleaning and manipulation, and a few advanced machine learning models such as PCA, ROC, K-Means, and Naive Bayes.
Timing | T | opic |
---|---|---|
45 minutes | Introduction to Data Science Keynote | |
75 minutes | Joining Datasets | |
30 minutes | Lunch | |
75 minutes | Principal Component Analysis (PCA) | |
75 minutes | Machine Learning Accuracy | |
45 minutes | Wrap Up and Next Steps |
Azure Notebooks is still in Preview. This means that there are some times when it will fail. Here are some tips for avoiding losing your work:
- Ensure their work is being saved. In the Jupyter Notebook there is always one of two messages to the right of the title of the notebook:
(autosaved)
or(unsaved changes)
. Make sure you're noticing that your work is being saved. You should consider checking every 10 minutes or so. - Sometimes Notebooks get into a state where the Kernel cannot be started. Sometimes re-starting the kernel will work. But often you will have to somepletely sign out of Azure Notebooks and then sign back in.
If you need a referesher on how to code in Python or work with NumPy or Pandas, we recommend you check out the materials from our other Reactor Wowrkshops: