Skip to content

This tutorial would expaling some basics of how to use RDD's to analyse data in pyspark.

Notifications You must be signed in to change notification settings

dattatrayshinde/PySpark-2.AuctionDataAnalysisWithRDD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Auction Data Analysis with RDD

auctiondata.csv provided here contains the data related to the actions of few items. Here we are trying to get answers of following questions by using spark -

  1. What is first record of the RDD?
  2. What are first 5 record of the RDD?
  3. What is the total number of bids?
  4. What is the total number of distinct items that were auctioned?
  5. What is the total number of item types that were auctioned?
  6. What is the total number of bids per item type?
  7. What is the total number of bids per auction?
  8. Across all auctioned items, what is the max number of bids?
  9. Across all auctioned items, what is the minimum of bids?
  10. What is the average bid?

Click here to see IPython Notebook

About

This tutorial would expaling some basics of how to use RDD's to analyse data in pyspark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published