Skip to content

This repository is dedicated to solutions for LeetCode SQL questions implemented in PySpark.

License

Notifications You must be signed in to change notification settings

bitoollearner/leetcode-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

LeetCode SQL Questions - PySpark Solutions-(BI Learner)

This repository is dedicated to unsolved LeetCode SQL questions implemented in PySpark.

This notebook contains a collection of SQL-related challenges sourced from LeetCode. The primary goal is to enhance SQL skills while leveraging PySpark, a powerful framework for large-scale data processing.

For each question, you will find:

  • A brief problem description
  • The relevant dataset or schema context
  • A step-by-step Unsolved implemented using PySpark DataFrames and SQL functions

Question Breakdown by Difficulty:

Difficulty Count
Easy 111
Medium 130
Hard 60
Total 301

All Unsolveds in this notebook are implemented using PySpark to ensure consistency with big data processing practices. The approach utilizes PySpark’s Python API, DataFrame API, and built-in functions to solve problems efficiently.

Disclaimer:
All questions and problem statements belong to LeetCode. This notebook is solely for educational purposes, and all credit and ownership of the questions remain with LeetCode. If you have any questions, please reach out to [email protected].

Prerequisites

  • Databricks Account:
    • You need access to a Databricks account. Databricks provides a collaborative environment for big data analytics, including support for PySpark, which is crucial for running your Unsolveds.
  • Basic Knowledge of PySpark:
    • Familiarity with PySpark is essential. Users should understand how to write PySpark code to manipulate dataframes, perform transformations, and execute actions.
  • SQL Knowledge:
    • Basic knowledge of SQL is required to understand and write SQL queries within the PySpark framework. This includes understanding SQL syntax, querying databases, and manipulating data.
  • GitHub Repository Access:
    • Access to your GitHub repository containing unsolved LeetCode SQL questions. Users should be able to clone the repository, review the Unsolveds, and potentially contribute if allowed.

1. Setting up Databricks Premium (Paid Version)

Databricks Premium is a paid plan that offers advanced features such as higher compute power, security options, and integrations.

Step 1: Sign Up for Databricks

  1. Go to Databricks website.
  2. Click "Start your free trial" (for a trial) or go to "Sign In" if you have an account.
  3. Choose "AWS", "Azure", or "GCP" as your cloud provider.
  4. Follow the registration process, providing details like your email, company, and cloud provider credentials.

Step 2: Create a Databricks Workspace

  1. In the cloud provider console (AWS, Azure, or GCP), create a Databricks workspace.
  2. Select the Premium plan during setup.
  3. Configure networking and security settings as required.
  4. Once created, launch the workspace from the cloud console.

Step 3: Create a Cluster

  1. Inside the Databricks workspace, go to Compute.
  2. Click Create Cluster.
  3. Choose a cluster name and select a runtime version (latest recommended).
  4. Select the number of workers (scale as needed).
  5. Click Create Cluster.

Step 4: Create a Notebook

  1. Navigate to Workspace > Users > Your Name.
  2. Click Create > Notebook.
  3. Name the notebook and select Python as the language.
  4. Attach it to your running cluster.

2. Setting up Databricks Community Edition (Free Version)

Databricks Community Edition is a free, limited version ideal for learning PySpark.

Step 1: Sign Up for Community Edition

  1. Go to Databricks Community Edition Signup.
  2. Enter your email and complete the registration.
  3. Check your email for the verification link and activate your account.
  4. Log in to your Databricks Community workspace.

Step 2: Create a Cluster

  1. Click on Compute in the left panel.
  2. Click Create Cluster.
  3. Name your cluster.
  4. Choose the latest runtime version.
  5. Click Create Cluster (Community Edition supports only small clusters).

Step 3: Create a Notebook

  1. Go to Workspace > Users > Your Name.
  2. Click Create > Notebook.
  3. Name the notebook and select Python.
  4. Attach it to the running cluster.

Key Differences Between Premium and Community Edition

Feature Databricks Premium Databricks Community Edition
Price Paid Free
Cloud Providers AWS, Azure, GCP Databricks Cloud
Cluster Scaling Scalable Limited (Single Node)
Security Features Advanced Basic
Collaboration Multi-user Single-user

3. Step-by-Step Guide to Importing LeetCode SQL Questions Notebook into Jupyter Notebook

  1. Clone Your GitHub Repository:

    • First, ensure you have Git installed on your local machine. If not, download and install it from Git's official website.
    • Open your terminal (command prompt) and navigate to the directory where you want to clone your repository.
    • Clone your GitHub repository using the command:
      git clone <repository_url>
      
    • Replace <repository_url> with the URL of your GitHub repository. This will download your repository to your local machine.
  2. Install Required Dependencies:

    • Make sure you have Python installed on your machine. It's recommended to use Anaconda or Miniconda to manage your Python environments.
    • Install Jupyter Notebook and PySpark dependencies if you haven't already:
      pip install jupyter pyspark
      
  3. Launch Jupyter Notebook:

    • Navigate to the directory where your Jupyter Notebook files are located. Typically, this would be the root directory of your cloned repository.
    • Start Jupyter Notebook by running the command:
      jupyter notebook
      
    • This command will open a new tab in your web browser with the Jupyter Notebook interface.
  4. Open and Run Your Notebook:

    • In the Jupyter Notebook interface, navigate to the directory where your notebook file (*.ipynb) is located.
    • Click on the notebook file to open it.
    • Once the notebook is open, you can run each cell by pressing Shift + Enter or using the "Run" button in the toolbar.
    • Ensure that Spark is correctly initialized and configured in your notebook. You may need to import necessary libraries and set up the Spark session if it's not done automatically.
  5. Verify Spark Installation and Configuration:

    • Check if Spark is installed and configured correctly by running a basic Spark operation in one of the notebook cells. For example:
      from pyspark.sql import SparkSession
      
      # Initialize Spark session
      spark = SparkSession.builder \
                          .appName("MyApp") \
                          .getOrCreate()
      
      # Verify Spark session
      spark
    • If Spark is configured correctly, you should see the Spark session information printed without any errors.
  6. Execute and Test Your Notebook:

    • Execute each cell in your notebook to ensure that all code runs as expected.
    • Validate the results of the LeetCode SQL questions solutions to ensure correctness and functionality with PySpark.
  7. Save Your Work:

    • Once you have verified that everything is working correctly, save your notebook with any changes you have made.

Additional Tips:

  • Environment Management: Consider using virtual environments or conda environments to manage dependencies and avoid conflicts between different projects.
  • Documentation: It's helpful to include documentation within your notebook, such as explanations of the SQL solutions and any specific configurations required for Spark.
  • Version Control: Regularly commit your changes to Git and push them to your GitHub repository to keep a versioned history of your work.

By following these steps, you should be able to successfully import and run your LeetCode SQL questions notebook using PySpark in Jupyter Notebook on your local machine.

Note:
The Databricks Community Edition will be more than adequate for this activity.

LeetCode SQL Questions

S.No Questions Difficulty Unsolved Solved
1 175. Combine Two Tables Easy
2 176. Second Highest Salary Medium
3 177. Nth Highest Salary Medium
4 178. Rank Scores Medium
5 180. Consecutive Numbers Medium
6 181. Employees Earning More Than Their Managers Easy
7 182. Duplicate Emails Easy
8 183. Customers Who Never Order Easy
9 184. Department Highest Salary Medium
10 185. Department Top Three Salaries Hard
11 196. Delete Duplicate Emails Easy
12 197. Rising Temperature Easy
13 262. Trips and Users Hard
14 511. Game Play Analysis I Easy
15 512. Game Play Analysis II Easy
16 534. Game Play Analysis III Medium
17 550. Game Play Analysis IV Medium
18 569. Median Employee Salary Hard
19 570. Managers with at Least 5 Direct Reports Medium
20 571. Find Median Given Frequency of Numbers Hard
21 574. Winning Candidate Medium
22 577. Employee Bonus Easy
23 578. Get Highest Answer Rate Question Medium
24 579. Find Cumulative Salary of an Employee Hard
25 580. Count Student Number in Departments Medium
26 584. Find Customer Referee Easy
27 585. Investments in 2016 Medium
28 586. Customer Placing the Largest Number of Orders Easy
29 595. Big Countries Easy
30 596. Classes More Than 5 Students Easy
31 597. Friend Requests I: Overall Acceptance Rate Easy
32 601. Human Traffic of Stadium Hard
33 602. Friend Requests II: Who Has the Most Friends Medium
34 603. Consecutive Available Seats Easy
35 607. Sales Person Easy
36 608. Tree Node Medium
37 610. Triangle Judgement Easy
38 612. Shortest Distance in a Plane Medium
39 613. Shortest Distance in a Line Easy
40 614. Second Degree Follower Medium
41 615. Average Salary: Departments VS Company Hard
42 618. Students Report By Geography Hard
43 619. Biggest Single Number Easy
44 620. Not Boring Movies Easy
45 626. Exchange Seats Medium
46 627. Swap Salary Easy
47 1045. Customers Who Bought All Products Medium
48 1050. Actors and Directors Who Cooperated At Least Three Times Easy
49 1068. Product Sales Analysis I Easy
50 1069. Product Sales Analysis II Easy
51 1070. Product Sales Analysis III Medium
52 1075. Project Employees I Easy
53 1076. Project Employees II Easy
54 1077. Project Employees III Medium
55 1082. Sales Analysis I Easy
56 1083. Sales Analysis II Easy
57 1084. Sales Analysis III Easy
58 1097. Game Play Analysis V Hard
59 1098. Unpopular Books Medium
60 1107. New Users Daily Count Medium
61 1112. Highest Grade For Each Student Medium
62 1113. Reported Posts Easy
63 1126. Active Businesses Medium
64 1127. User Purchase Platform Hard
65 1132. Reported Posts II Medium
66 1141. User Activity for the Past 30 Days I Easy
67 1142. User Activity for the Past 30 Days II Easy
68 1148. Article Views I Easy
69 1149. Article Views II Medium
70 1158. Market Analysis I Medium
71 1159. Market Analysis II Hard
72 1164. Product Price at a Given Date Medium
73 1173. Immediate Food Delivery I Easy
74 1174. Immediate Food Delivery II Medium
75 1179. Reformat Department Table Easy
76 1193. Monthly Transactions I Medium
77 1194. Tournament Winners Hard
78 1204. Last Person to Fit in the Bus Medium
79 1205. Monthly Transactions II Medium
80 1211. Queries Quality and Percentage Easy
81 1212. Team Scores in Football Tournament Medium
82 1225. Report Contiguous Dates Hard
83 1241. Number of Comments per Post Easy
84 1251. Average Selling Price Easy
85 1264. Page Recommendations Medium
86 1270. All People Report to the Given Manager Medium
87 1280. Students and Examinations Easy
88 1285. Find the Start and End Number of Continuous Ranges Medium
89 1294. Weather Type in Each Country Easy
90 1303. Find the Team Size Easy
91 1308. Running Total for Different Genders Medium
92 1321. Restaurant Growth Medium
93 1322. Ads Performance Easy
94 1327. List the Products Ordered in a Period Easy
95 1336. Number of Transactions per Visit Hard
96 1341. Movie Rating Medium
97 1350. Students With Invalid Departments Easy
98 1355. Activity Participants Medium
99 1364. Number of Trusted Contacts of a Customer Medium
100 1369. Get the Second Most Recent Activity Hard
101 1378. Replace Employee ID With The Unique Identifier Easy
102 1384. Total Sales Amount by Year Hard
103 1393. Capital Gain/Loss Medium
104 1398. Customers Who Bought Products A and B but Not C Medium
105 1407. Top Travellers Easy
106 1412. Find the Quiet Students in All Exams Hard
107 1421. NPV Queries Easy
108 1435. Create a Session Bar Chart Easy
109 1440. Evaluate Boolean Expression Medium
110 1445. Apples & Oranges Medium
111 1454. Active Users Medium
112 1459. Rectangles Area Medium
113 1468. Calculate Salaries Medium
114 1479. Sales by Day of the Week Hard
115 1484. Group Sold Products By The Date Easy
116 1495. Friendly Movies Streamed Last Month Easy
117 1501. Countries You Can Safely Invest In Medium
118 1511. Customer Order Frequency Easy
119 1517. Find Users With Valid E-Mails Easy
120 1527. Patients With a Condition Easy
121 1532. The Most Recent Three Orders Medium
122 1543. Fix Product Name Format Easy
123 1549. The Most Recent Orders for Each Product Medium
124 1555. Bank Account Summary Medium
125 1565. Unique Orders and Customers Per Month Easy
126 1571. Warehouse Manager Easy
127 1581. Customer Who Visited but Did Not Make Any Transactions Easy
128 1587. Bank Account Summary II Easy
129 1596. The Most Frequently Ordered Products for Each Customer Medium
130 1607. Sellers With No Sales Easy
131 1613. Find the Missing IDs Medium
132 1623. All Valid Triplets That Can Represent a Country Easy
133 1633. Percentage of Users Attended a Contest Easy
134 1635. Hopper Company Queries I Hard
135 1645. Hopper Company Queries II Hard
136 1651. Hopper Company Queries III Hard
137 1661. Average Time of Process per Machine Easy
138 1667. Fix Names in a Table Easy
139 1677. Product's Worth Over Invoices Easy
140 1683. Invalid Tweets Easy
141 1693. Daily Leads and Partners Easy
142 1699. Number of Calls Between Two Persons Medium
143 1709. Biggest Window Between Visits Medium
144 1715. Count Apples and Oranges Medium
145 1729. Find Followers Count Easy
146 1731. The Number of Employees Which Report to Each Employee Easy
147 1741. Find Total Time Spent by Each Employee Easy
148 1747. Leetflex Banned Accounts Medium
149 1757. Recyclable and Low Fat Products Easy
150 1767. Find the Subtasks That Did Not Execute Hard
151 1777. Product's Price for Each Store Easy
152 1783. Grand Slam Titles Medium
153 1789. Primary Department for Each Employee Easy
154 1795. Rearrange Products Table Easy
155 1809. Ad-Free Sessions Easy
156 1811. Find Interview Candidates Medium
157 1821. Find Customers With Positive Revenue this Year Easy
158 1831. Maximum Transaction Each Day Medium
159 1841. League Statistics Medium
160 1843. Suspicious Bank Accounts Medium
161 1853. Convert Date Format Easy
162 1867. Orders With Maximum Quantity Above Average Medium
163 1873. Calculate Special Bonus Easy
164 1875. Group Employees of the Same Salary Medium
165 1890. The Latest Login in 2020 Easy
166 1892. Page Recommendations II Hard
167 1907. Count Salary Categories Medium
168 1917. Leetcodify Friends Recommendations Hard
169 1919. Leetcodify Similar Friends Hard
170 1934. Confirmation Rate Medium
171 1939. Users That Actively Request Confirmation Messages Easy
172 1949. Strong Friendship Medium
173 1951. All the Pairs With the Maximum Number of Common Followers Medium
174 1965. Employees With Missing Information Easy
175 1972. First and Last Call On the Same Day Hard
176 1978. Employees Whose Manager Left the Company Easy
177 1988. Find Cutoff Score for Each School Medium
178 1990. Count the Number of Experiments Medium
179 2004. The Number of Seniors and Juniors to Join the Company Hard
180 2010. The Number of Seniors and Juniors to Join the Company II Hard
181 2020. Number of Accounts That Did Not Stream Medium
182 2026. Low-Quality Problems Easy
183 2041. Accepted Candidates From the Interviews Medium
184 2051. The Category of Each Member in the Store Medium
185 2066. Account Balance Medium
186 2072. The Winner University Easy
187 2082. The Number of Rich Customers Easy
188 2084. Drop Type 1 Orders for Customers With Type 0 Orders Medium
189 2112. The Airport With the Most Traffic Medium
190 2118. Build the Equation Hard
191 2142. The Number of Passengers in Each Bus I Medium
192 2153. The Number of Passengers in Each Bus II Hard
193 2159. Order Two Columns Independently Medium
194 2173. Longest Winning Streak Hard
195 2175. The Change in Global Rankings Medium
196 2199. Finding the Topic of Each Post Hard
197 2205. The Number of Users That Are Eligible for Discount Easy
198 2228. Users With Two Purchases Within Seven Days Medium
199 2230. The Users That Are Eligible for Discount Easy
200 2238. Number of Times a Driver Was a Passenger Medium
201 2252. Dynamic Pivoting of a Table Hard
202 2253. Dynamic Unpivoting of a Table Hard
203 2292. Products With Three or More Orders in Two Consecutive Years Medium
204 2298. Tasks Count in the Weekend Medium
205 2308. Arrange Table by Gender Medium
206 2314. The First Day of the Maximum Recorded Degree in Each City Medium
207 2324. Product Sales Analysis IV Medium
208 2329. Product Sales Analysis V Easy
209 2339. All the Matches of the League Easy
210 2346. Compute the Rank as a Percentage Medium
211 2356. Number of Unique Subjects Taught by Each Teacher Easy
212 2362. Generate the Invoice Hard
213 2372. Calculate the Influence of Each Salesperson Medium
214 2377. Sort the Olympic Table Easy
215 2388. Change Null Values in a Table to the Previous Value Medium
216 2394. Employees With Deductions Medium
217 2474. Customers With Strictly Increasing Purchases Hard
218 2480. Form a Chemical Bond Easy
219 2494. Merge Overlapping Events in the Same Hall Hard
220 2504. Concatenate the Name and the Profession Easy
221 2668. Find Latest Salaries Easy
222 2669. Count Artist Occurrences On Spotify Ranking List Easy
223 2686. Immediate Food Delivery III Medium
224 2687. Bikes Last Time Used Easy
225 2688. Find Active Users Medium
226 2701. Consecutive Transactions with Increasing Amounts Hard
227 2720. Popularity Percentage Hard
228 2738. Count Occurrences in Text Medium
229 2752. Customers with Maximum Number of Transactions on Consecutive Days Hard
230 2783. Flight Occupancy and Waitlist Analysis Medium
231 2837. Total Traveled Distance Easy
232 2853. Highest Salaries Difference Easy
233 2854. Rolling Average Steps Medium
234 2893. Calculate Orders Within Each Interval Medium
235 2922. Market Analysis III Medium
236 2978. Symmetric Coordinates Medium
237 2984. Find Peak Calling Hours for Each City Medium
238 2985. Calculate Compressed Mean Easy
239 2986. Find Third Transaction Medium
240 2987. Find Expensive Cities Easy
241 2988. Manager of the Largest Department Medium
242 2989. Class Performance Medium
243 2990. Loan Types Easy
244 2991. Top Three Wineries Hard
245 2993. Friday Purchases I Medium
246 2994. Friday Purchases II Hard
247 2995. Viewers Turned Streamers Hard
248 3050. Pizza Toppings Cost Analysis Medium
249 3051. Find Candidates for Data Scientist Position Easy
250 3052. Maximize Items Hard
251 3053. Classifying Triangles by Lengths Easy
252 3054. Binary Tree Nodes Medium
253 3055. Top Percentile Fraud Medium
254 3056. Snaps Analysis Medium
255 3057. Employees Project Allocation Hard
256 3058. Friends With No Mutual Friends Medium
257 3059. Find All Unique Email Domains Easy
258 3060. User Activities within Time Bounds Hard
259 3061. Calculate Trapping Rain Water Hard
260 3087. Find Trending Hashtags Medium
261 3089. Find Bursty Behavior Medium
262 3103. Find Trending Hashtags II Hard
263 3118. Friday Purchase III Medium
264 3124. Find Longest Calls Medium
265 3126. Server Utilization Time Medium
266 3140. Consecutive Available Seats II Medium
267 3150. Invalid Tweets II Easy
268 3156. Employee Task Duration and Concurrent Tasks Hard
269 3166. Calculate Parking Fees and Duration Medium
270 3172. Second Day Verification Easy
271 3182. Find Top Scoring Students Medium
272 3188. Find Top Scoring Students II Hard
273 3198. Find Cities in Each State Easy
274 3204. Bitwise User Permissions Analysis Medium
275 3214. Year on Year Growth Rate Hard
276 3220. Odd and Even Transactions Medium
277 3230. Customer Purchasing Behavior Analysis Medium
278 3236. CEO Subordinate Hierarchy Hard
279 3246. Premier League Table Ranking Easy
280 3252. Premier League Table Ranking II Medium
281 3262. Find Overlapping Shifts Medium
282 3268. Find Overlapping Shifts II Hard
283 3278. Find Candidates for Data Scientist Position II Medium
284 3293. Calculate Product Final Price Medium
285 3308. Find Top Performing Driver Medium
286 3322. Premier League Table Ranking III Medium
287 3328. Find Cities in Each State II Medium
288 3338. Second Highest Salary II Medium
289 3358. Books with NULL Ratings Easy
290 3368. First Letter Capitalization Hard
291 3374. First Letter Capitalization II Hard
292 3384. Team Dominance by Pass Success Hard
293 3390. Longest Team Pass Streak Hard
294 3401. Find Circular Gift Exchange Chains Hard
295 3415. Find Products with Three Consecutive Digits Easy
296 3421. Find Students Who Improved Medium
297 3436. Find Valid Emails Easy
298 3451. Find Invalid IP Addresses Hard
299 3465. Find Products with Valid Serial Numbers Easy
300 3475. DNA Pattern Recognition Medium
301 3482. Analyze Organization Hierarchy Hard

🚀 Support This Project! ⭐

Hey everyone! 👋

We've created this GitHub repository to help the community solve LeetCode SQL questions using PySpark. If you find this project helpful, please consider giving it a star ⭐ and sharing it with other learners who might benefit from it!

Your support will help grow this resource and make it even better for everyone. Let’s learn and improve together! 🚀

🔗 LeetCode SQL Questions-(PySpark Unsolved)

Happy coding! 💻🔥

About

This repository is dedicated to solutions for LeetCode SQL questions implemented in PySpark.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published