Sparkify is a music streaming service similar to spotify or pandora. Users use the service fall in one of the two categories - free tier, where they stream the service along with advertisements between songs, or paid subscription which includes a monthly fee to use the service without advertisements. Users can downgrade or cancel their subscriptions at any time. If we can identify potential cancels to subscriptions, we can potentially reduce the future cancellations by providing them discounts on subscriptions. The methodology I used to identify these potential cancellation is Logistic Regression
Packages used: Pyspark Pandas Matplotlib
Methodology:
- Load and Clean dataset
- Define customer churn and exploratory data analysis
- Feature engineering
- Modeling
- Evaluation
Please refer to the blog for a description of the analysis: https://medium.com/@nafisabulsara/identify-customer-churn-on-sparkify-a-music-streaming-application-6f4b0a7da96e?sk=72cb7fd577f012935842cf5ab9494630