In this capstone course, you will apply various machine learning knowledge and skills that you have learned as part of the previous courses to solve some real-world industrial challenges.
Assume you are a new machine learning engineer in a Massive Open Online Courses (MOOCs) startup called AI Training Room. In AI Training Room, learners across the world can learn leading technologies such as Machine Learning, AI, Data Science, Cloud, App development, etc. Your company grows rapidly and reaches millions of learners in a very short period.
The learning topics of AI Training Room can be summarized in the following word cloud:
Starting this year, your machine learning engineer team is working very hard on a recommender system project. The main goal of this project is to improve learners' learning experience via helping them quickly find new interested courses and better paving their learning paths. Meanwhile, with more learners interacting with more courses via your recommender systems, your company's revenue may also be increased.
This project is currently at the Proof of Concept (PoC) phase so your main focus at this moment is to explore and compare various machine learning models and find one with the best performance in off-line evaluations.
Your tasks in this project are summarized in the following workflow, and you will be guided through them in hands-on labs.
More specifically, you will undertake the tasks of:
- Collecting and understanding data
- Performing exploratory data analysis on online course enrollments datasets
- Extracting Bag of Words (BoW) features from course textual content
- Calculating course similarity using BoW features
- Building content-based recommender systems using various
unsupervised learning algorithms, such as:
- Distance/Similarity measurements, K-means, Principal Component Analysis (PCA), etc.
- Building collaborative-filtering recommender systems using various
supervised learning algorithms
- K Nearest Neighbors, Non-negative Matrix Factorization (NMF), Neural Networks, Linear Regression, Logistic Regression, RandomForest, etc.
- Creating an insightful and informative slideshow and presenting it to your peers
If you have extra bandwidth, you can also deploy and demonstrate your
models via a web app built with streamlit
. Streamlit
is an
open-source app framework for Machine Learning and Data Science to
quickly demonstrate their works.
Your course recommender app where you select different recommendation
models and generate recommendations, may look like the following
screenshot:
This project is a great opportunity to showcase your machine learning skills, and demonstrate your proficiency to potential employers.
- Graded Quizzes: 30 pts
- Final presentation, peer-review: 70 pts
In this project, you have at least three development environments you may choose from:
Skills Network Labs is a virtual lab environment reserved for the exclusive use by the learners on IBM Developer Skills Network portals and its partners.
If you experience any issues with the above two cloud environments, you may install Python and JupyterNotebook / JupyterLab on your own environments like a desktop or laptop computer. All the notebooks and data used in the capstone can be downloaded and executed locally.
For this project, you will use Watson Studio as your main development environment. Watson Studio is a component of IBM Cloud Pak for Data, is a suite of tools and a collaborative environment for data scientists, data analysts, AI and machine learning engineers, and domain experts to develop and deploy your projects.
Now you should have a basic understanding of this capstone project.
In the next step of your project, you will start with collecting and exploring the datasets.
Date | Version | Changed by | Change Description |
---|---|---|---|
2022-03-18 | 1.0 | Initial version created | |