Skip to content

Commit

Permalink
Update 6.1
Browse files Browse the repository at this point in the history
  • Loading branch information
addtt committed Sep 25, 2022
1 parent d82a27e commit e826666
Showing 1 changed file with 24 additions and 29 deletions.
53 changes: 24 additions & 29 deletions 6_Mini_Project/6.1-EXE-Kaggle-Leaf-Challenge.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Credits\n",
"\n",
"Originally created for a previous version of the [02456-deep-learning](https://github.com/DeepLearningDTU/02456-deep-learning) course material, but [converted to PyTorch](https://github.com/pytorch/tutorials).\n",
"See repos for credits."
"# Kaggle Leaf Classification Challenge\n"
]
},
{
Expand Down Expand Up @@ -52,26 +49,24 @@
"source": [
"# Tying everything together\n",
"\n",
"Now that you have learned about the three most used network architectures: FFNs, CNNs and RNNs. It is time to combine these network types into a more advanced model. \n",
"It often happens that you have a combination of data that cannot fully be modeled by any one of these three types of network. Knowing how to divide the data into the right subsets, and then build a network that handles each subset efficiently can mean the difference between a great model and an unusable one. \n",
"Now that you have learned about the most common network architectures, it is time to combine them into a more advanced model. \n",
"It often happens that you have a combination of data that cannot easily be modeled by any single one of these types of network. Knowing how to divide the data into the right subsets, and then build a network that handles each subset efficiently can mean the difference between a great model and an unusable one. \n",
"\n",
"In this notebook we will work on the **Kaggle Leaf Classification Challenge**, a data science competition from [`kaggle.com`](kaggle.com) that contains several different kinds of data.\n",
"First we will download the data and visualize it, and then we will train a network to classify the data.\n",
"A simple network with poor performance is provided for you as a starting point, but it is up to you use the things you have learnt to improve the results.\n",
"In this notebook, we will work on the **Kaggle Leaf Classification Challenge**, a data science competition from [`kaggle.com`](https://www.kaggle.com/) that contains several different kinds of data.\n",
"We will download the data, visualize it, and train a classifier.\n",
"A simple network with poor performance is provided for you as a starting point, but it is up to you use what you have learnt to improve the results.\n",
"\n",
"\n",
"## Kaggle challenge\n",
"Kaggle is a website to participate in real life challenges.\n",
"Early 2017 it was bought by Google, who wanted access to the global community of data scientists it has created over the last 7 years.\n",
"Since then Google has sponsored its expansion and now the prizes of the competitions and the amount of public datasets are bigger than ever. \n",
"\n",
"Kaggle is a website to participate in real-world challenges.\n",
"Most competitions on Kaggle have a dataset, an accuracy metric and a leaderboard to compare submissions.\n",
"You can read more about Kaggle public datasets [here](https://www.kaggle.com/datasets).\n",
"\n",
"The challenge we will pursue is the [_Leaf Classification_](https://www.kaggle.com/c/leaf-classification) challenge.\n",
"The dataset consists approximately 1,584 images of leaf specimens which have been converted to binary black leaves against white backgrounds. \n",
"Three sets of features are also provided per image: a shape contiguous descriptor, an interior texture histogram, and a fine-scale margin histogram. For each feature, a 64-attribute vector is given per leaf sample. We will primarily look into the type of neural network best suited for handling this type of data. \n",
"We will undertake the [_Leaf Classification_](https://www.kaggle.com/c/leaf-classification) challenge. We report here the description of the dataset:\n",
"\n",
"Lastly, we will train the model and put the outputs in a submission file that we can submit to Kaggle."
"> The dataset consists of approximately 1,584 images of leaf specimens which have been converted to binary black leaves against white backgrounds. \n",
"Three sets of features are also provided per image: a shape contiguous descriptor, an interior texture histogram, and a fine-scale margin histogram. For each feature, a 64-attribute vector is given per leaf sample.\n"
]
},
{
Expand Down Expand Up @@ -165,7 +160,7 @@
"metadata": {},
"outputs": [],
"source": [
"image_paths = glob.glob(\"drive/My Drive/images/*.jpg\") # if your path to the \n",
"image_paths = glob.glob(\"drive/My Drive/images/*.jpg\")\n",
"print(\"Total Observations:\\t\", len(image_paths))\n",
"\n",
"# now loading the train.csv to find features for each training point\n",
Expand All @@ -184,7 +179,7 @@
"\n",
"1.1) How many samples do we have for training and test? Do we have the same information for training and test data? How many samples do we have for each species?\n",
"\n",
"**Hint** You might want to use .shape, .columns, pd.unique() and .symmetric_difference().\n"
"**Hint**: You might want to use .shape, .columns, pd.unique() and .symmetric_difference().\n"
]
},
{
Expand All @@ -193,7 +188,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Your code to produce answers here:\n"
"# Your code here:\n"
]
},
{
Expand Down Expand Up @@ -251,26 +246,26 @@
"metadata": {},
"outputs": [],
"source": [
"# Now plot 1 image from each category:\n"
"# Now plot 1 image from each category\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, classifying leaves is actually a very tough problem.\n",
"What makes it even worse, is that we cannot use all the image data we have available.\n",
"In order to decrease the amount of computations needed, we need to reduce the size of the images as much as possible.\n",
"On top of that our neural network usually only accepts fixed size input tensors.\n",
"What makes it even worse is that we cannot use all the image data we have available.\n",
"In order to decrease the amount of computation needed, we need to reduce the size of the images as much as possible.\n",
"On top of that, our neural network usually only accepts fixed-size input tensors.\n",
"This means we will have to change the shape of the images so that they all have the same sizes.\n",
"\n",
"\n",
"Resizing is problematic because it alters the shape of the leaves, and for some of them, this is their most distinctive feature. Take a look at `Salix_Intergra` in the bottom left corner.\n",
"Describing this leaf without taking its' shape into account seems extremely difficult.\n",
"Describing this leaf without taking its shape into account seems extremely difficult.\n",
"\n",
"Therefore we will \n",
"- 1) first pad all the images into squares, and\n",
"- 2) then resize them, as visualized below:"
"1. first pad all the images into squares, and\n",
"2. then resize them."
]
},
{
Expand Down Expand Up @@ -408,7 +403,7 @@
"# Managing the data\n",
"\n",
"The details of the code in this section isn't that important.\n",
"It simply manages the data in a nice way - so it is a good place to come back and look for inspiration when you going to work on your own projects.\n",
"It simply manages the data in a nice way - so it is a good place to come back and look for inspiration when you will work on your own projects.\n",
"\n",
"\n",
"## Defining the data loader"
Expand Down Expand Up @@ -983,7 +978,7 @@
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -997,7 +992,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
"version": "3.8.12"
}
},
"nbformat": 4,
Expand Down

0 comments on commit e826666

Please sign in to comment.