GitHub - yasvanthkohli/Mastering-Data-Analysis-with-SQL

The compilation of projects and challenges encompasses a diverse collection centered around SQL and data analysis. Curated to foster learning, provide practical experience, and unearth valuable insights from past data.

Built with

SQL Databases:

MySQL
PostgreSQL

Tools:

MySQL Workbench
pgAdmin

Table of Contents

Connecting to the Database
Retrieving Data
Filtering Data
Sorting Data
Aggregating Data
Joining Tables
Grouping and Summarizing Data
Subqueries
Working with Dates and Times
Analytical Functions
Advanced Analysis Techniques
Exporting Data

World Happiness Report Data Exploration

Project Description:

In this project, I will explore the World Happiness Report dataset using basic SQL querying and data exploration techniques. The World Happiness Report is an annual publication that ranks countries based on their happiness levels and factors contributing to happiness.

The dataset contains information about various countries and their happiness scores along with several socio-economic and demographic indicators. The goal of this project is to gain insights from the dataset by formulating SQL queries and performing data exploration.

Tasks:

Database Setup: Create a database to store the World Happiness Report data.

Data Import: Import the dataset into appropriate tables within the database.

Basic SQL Queries: Write SQL queries to gain insights into the factors influencing happiness levels across countries.

Data Exploration: Perform exploratory analysis by writing SQL queries to answer specific questions and gain insights from the dataset.

Supermarket Sales Analysis

Introduction

This project aims to perform an analysis of supermarket sales data using SQL. The dataset contains information about various sales transactions, including invoice details, branch information, customer demographics, product details, pricing, taxes, payment information, and customer ratings. The project will focus on data cleaning and preprocessing, as well as data filtering and aggregation techniques to derive meaningful insights from the dataset.
Dataset Overview

The dataset consists of the following columns

Invoice_ID: Unique identifier for each sales transaction.
Branch: Branch code where the sale occurred.
City: City where the sale occurred.
Customer_type: Type of customer (e.g., Member, Normal).
Gender: Gender of the customer.
Product_line: Category of the product sold.
Unit_price: Price of a single unit of the product.
Quantity: Number of units sold.
Tax: Tax amount applied to the sale.
Total: Total amount of the sale including tax.
Date: Date of the sale.
Time: Time of the sale.
Payment: Payment method used (e.g., Cash, Credit Card).
cogs: Cost of goods sold.
gross_margin_percentage: Gross margin percentage.
gross_income: Gross income generated from the sale.
Rating: Customer satisfaction rating for the sale.

Data Cleaning and Preprocessing

Data cleaning and preprocessing are essential steps to ensure the quality and consistency of the dataset. In this project, the following data cleaning and preprocessing techniques were applied.

Handling missing values: Missing values in any of the columns were identified and appropriate actions, such as imputation or removal, were taken.
Removing duplicates: Duplicate records, if any, were identified and removed to avoid redundant data.
Data type conversion: The data types of certain columns were adjusted to match the nature of the data they represent, such as converting date and time columns into the appropriate data types.
Data normalization: Certain columns were normalized to ensure consistency and standardization across the dataset, such as converting the city names to a standardized format.
Handling outliers: Outliers in numeric columns were identified and treated using suitable techniques, such as capping or excluding them from analysis.

Data Filtering and Aggregation

Data filtering and aggregation techniques are used to extract specific subsets of data and derive meaningful insights from them. In this project, the following techniques were applied.

Filtering by date: The dataset was filtered based on a specific date range to focus on sales patterns during certain periods.
Filtering by branch or city: The dataset was filtered to analyze sales performance at specific branches or cities.
Aggregating sales data: Various aggregation functions, such as SUM, COUNT, AVERAGE, and MAX, were applied to calculate key metrics like total sales, average unit price, total tax, and gross income.
Grouping and sorting: The dataset was grouped by specific columns (e.g., product line, customer type) and sorted based on certain criteria to identify trends and patterns.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Supermarket Sales Analysis		Supermarket Sales Analysis
Veterinary Clinic Patient Insights		Veterinary Clinic Patient Insights
Weather Data Analysis		Weather Data Analysis
World Happiness Report		World Happiness Report
README.md		README.md
Sql_query_order_of_execution.png		Sql_query_order_of_execution.png
mydb.sql		mydb.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World Happiness Report Data Exploration

Supermarket Sales Analysis

About

Releases

Packages

yasvanthkohli/Mastering-Data-Analysis-with-SQL

Folders and files

Latest commit

History

Repository files navigation

World Happiness Report Data Exploration

Supermarket Sales Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages