Skip to content

Latest commit

 

History

History
66 lines (40 loc) · 1.55 KB

README.md

File metadata and controls

66 lines (40 loc) · 1.55 KB

Data-Visualization

Collect and analyze products on Shopee e-commerce platform


Along with the development of e-commerce platforms in Vietnam in recent years. I want to collect data and analyze products so that we can discover the product's price distribution, distribution of quantity sold, and rating of product

Tools usage

  • selenium, BeautifulSoup, chromedriver for crawl data in website
  • numpy, pandas, seaborn, matplotlib for EDA process

Flow

  1. Collecting
  • Collect links of product on Shopee’s search page
  • Access each link into list to get the necessary data
  1. Cleaning
  • Fill missing value
  • Cleaning data in some field
  1. Exploratory Data Analysis
  • Ratio of categories

alt Ratio of categories

  • Rating of categories

alt Rating of categories

  • Price of categories

alt Price of categories

  • Number of product sold of categories

alt Price of categories

  • Correlation Heatmap

alt Correlation Heatmap

  1. Modeling
  • Relationship between Total rating and product sold

alt Relationship between Total rating and product sold

  • Apply 3 machine learning algorithms: Linear Regression, Descision Tree and XGBoost

    • Using cross validation with k-fold = 5 to evaluate dataset

alt Modeling

Conclusion

  • Through this project, I was able to practice and understand more about data collection and processing to better understand the data and derive knowledge from that data.