DWDM-CSRGFF/dataPreprocessing at master · NJnisarg/DWDM-CSRGFF

History

Name		Name	Last commit message	Last commit date
parent directory ..
Indian_pines_corrected.mat		Indian_pines_corrected.mat
Indian_pines_gt.mat		Indian_pines_gt.mat
PCA.png		PCA.png
README.md		README.md
class_distribution.png		class_distribution.png
image_bands.png		image_bands.png
load_data.py		load_data.py
normalization.png		normalization.png
pre_process.py		pre_process.py
principal_comp_analysis.py		principal_comp_analysis.py
train-test-split.py		train-test-split.py

README.md

Introduction

Hyperspectral Images (HSI) are formed by imaging the same spatial region using multiple wavelengths of light capturing hundreds of spectral bands. These wavelengths range beyond the visible region (400 - 800 nm), which human eyes cannot perceive and range in between 400 - 1100 nm. Thus a hyperspectral image has several bands often hundreds unlike typical color images that have 3 bands like RGB(Red, Green, Blue). Thus hyperspectral images carry muchmore information about the surface, materials that are present in a region. Such images are used extensively in imaging agricultural regions using satellite imagery.

Data loading and Preprocessing

Indian Pine Hyperspectral Image dataset has been used to train and test the model. The process of acquiring the Indian Pine dataset and preprocessing steps to make it fit for further processing and evaluation has been described below:

The Data Preprocessing has 4 major steps:

1.) Data Loading:

Indian pine hyperspectral image dataset has been been acquired in a .mat format. There are two datasets, one contains the ground truth or the classified image, while the other contains the corrected unclassified image dataset. The dataset has been loaded into python using scipy.io python module and has been converted into numpy array for further processing.

2.) Data Scaling:

The dataset is of size 145x145x200 where 145x145 is the size of the image and there are 200 bands in the hyperspectral image. Since each pixel in the 145x145 image consits of 200 positive bands. For further analysis and processing the image is further normalized and scaled to occupy negative values using:

where s and x' are the standard deviation and mean of 200 values of the given pixel.

3.) Train & Test Splitting:

Since, testing and training are performed on the same hyperspectral image, the dataset needs to be smartly classified into the train and test dataset. There are 16 classes in the ground truth image spanning across the 145x145 image pixels. For training 15 sample pixels have been selected at random from each of the classes and rest of the pixels selected for testing and classifying.

4.) Principal Component Analysis:

Principal Component Analysis (PCA) is a dimensionality reduction mechanism, used for transforming a large data set of variables into a smaller one that still contains most of the information in the large set. Inorder to perform Entropy rate Superpixel regularization to create image segmentation, the 200 bands in the data sets needs to be reduced to 3 bands. The dataset is transformed to fit 3 bands which roughly represents the RGB representation of the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataPreprocessing

dataPreprocessing

README.md

Introduction

Data loading and Preprocessing

1.) Data Loading:

2.) Data Scaling:

3.) Train & Test Splitting:

4.) Principal Component Analysis:

Files

dataPreprocessing

Directory actions

More options

Directory actions

More options

Latest commit

History

dataPreprocessing

Folders and files

parent directory

README.md

Introduction

Data loading and Preprocessing

1.) Data Loading:

2.) Data Scaling:

3.) Train & Test Splitting:

4.) Principal Component Analysis: