Skip to content

hahoanglc97/Data-Visualization

Repository files navigation

Data-Visualization

Collect and analyze products on Shopee e-commerce platform


Along with the development of e-commerce platforms in Vietnam in recent years. I want to collect data and analyze products so that we can discover the product's price distribution, distribution of quantity sold, and rating of product

Tools usage

  • selenium, BeautifulSoup, chromedriver for crawl data in website
  • numpy, pandas, seaborn, matplotlib for EDA process

Flow

  1. Collecting
  • Collect links of product on Shopee’s search page
  • Access each link into list to get the necessary data
  1. Cleaning
  • Fill missing value
  • Cleaning data in some field
  1. Exploratory Data Analysis
  • Ratio of categories

alt Ratio of categories

  • Rating of categories

alt Rating of categories

  • Price of categories

alt Price of categories

  • Number of product sold of categories

alt Price of categories

  • Correlation Heatmap

alt Correlation Heatmap

  1. Modeling
  • Relationship between Total rating and product sold

alt Relationship between Total rating and product sold

  • Apply 3 machine learning algorithms: Linear Regression, Descision Tree and XGBoost

    • Using cross validation with k-fold = 5 to evaluate dataset

alt Modeling

Conclusion

  • Through this project, I was able to practice and understand more about data collection and processing to better understand the data and derive knowledge from that data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published