From c6b25b9c886df6691f767303f08cff3862b2da8f Mon Sep 17 00:00:00 2001 From: bettermorn Date: Sat, 1 Jan 2022 21:21:04 +0800 Subject: [PATCH] Add files via upload --- Code/MLSpringHW03/HW03.ipynb | 684 +++++++++++++++++++++++++++++++++++ 1 file changed, 684 insertions(+) create mode 100644 Code/MLSpringHW03/HW03.ipynb diff --git a/Code/MLSpringHW03/HW03.ipynb b/Code/MLSpringHW03/HW03.ipynb new file mode 100644 index 0000000..669813c --- /dev/null +++ b/Code/MLSpringHW03/HW03.ipynb @@ -0,0 +1,684 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "D_a2USyd4giE" + }, + "source": [ + "# **Homework 3 - Convolutional Neural Network**\n", + "\n", + "This is the example code of homework 3 of the machine learning course by Prof. Hung-yi Lee.\n", + "\n", + "In this homework, you are required to build a convolutional neural network for image classification, possibly with some advanced training tips.\n", + "\n", + "\n", + "There are three levels here:\n", + "\n", + "**Easy**: Build a simple convolutional neural network as the baseline. (2 pts)\n", + "\n", + "**Medium**: Design a better architecture or adopt different data augmentations to improve the performance. (2 pts)\n", + "\n", + "**Hard**: Utilize provided unlabeled data to obtain better results. (2 pts)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VHpJocsDr6iA" + }, + "source": [ + "## **About the Dataset**\n", + "\n", + "The dataset used here is food-11, a collection of food images in 11 classes.\n", + "\n", + "For the requirement in the homework, TAs slightly modified the data.\n", + "Please DO NOT access the original fully-labeled training data or testing labels.\n", + "\n", + "Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "zhzdomRTOKoJ" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "zsh:1: command not found: gdown\n", + "unzip: cannot find or open food-11.zip, food-11.zip.zip or food-11.zip.ZIP.\n" + ] + } + ], + "source": [ + "# Download the dataset\n", + "# You may choose where to download the data.\n", + "\n", + "# Google Drive\n", + "!gdown --id '1awF7pZ9Dz7X1jn1_QAiKN-_v56veCEKy' --output food-11.zip\n", + "\n", + "# Dropbox\n", + "# !wget https://www.dropbox.com/s/m9q6273jl3djall/food-11.zip -O food-11.zip\n", + "\n", + "# MEGA\n", + "# !sudo apt install megatools\n", + "# !megadl \"https://mega.nz/#!zt1TTIhK!ZuMbg5ZjGWzWX1I6nEUbfjMZgCmAgeqJlwDkqdIryfg\"\n", + "\n", + "# Unzip the dataset.\n", + "# This may take some time.\n", + "!unzip -q food-11.zip" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BBVSCWWhp6uq" + }, + "source": [ + "## **Import Packages**\n", + "\n", + "First, we need to import packages that will be used later.\n", + "\n", + "In this homework, we highly rely on **torchvision**, a library of PyTorch." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "9sVrKci4PUFW" + }, + "outputs": [], + "source": [ + "# Import necessary packages.\n", + "import numpy as np\n", + "import torch\n", + "import torch.nn as nn\n", + "import torchvision.transforms as transforms\n", + "from PIL import Image\n", + "# \"ConcatDataset\" and \"Subset\" are possibly useful when doing semi-supervised learning.\n", + "from torch.utils.data import ConcatDataset, DataLoader, Subset, TensorDataset\n", + "from torchvision.datasets import DatasetFolder\n", + "\n", + "# This is for the progress bar.\n", + "from tqdm.auto import tqdm" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F0i9ZCPrOVN_" + }, + "source": [ + "## **Dataset, Data Loader, and Transforms**\n", + "\n", + "Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.\n", + "\n", + "Here, since our data are stored in folders by class labels, we can directly apply **torchvision.datasets.DatasetFolder** for wrapping data without much effort.\n", + "\n", + "Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "gKd2abixQghI" + }, + "outputs": [], + "source": [ + "# It is important to do data augmentation in training.\n", + "# However, not every augmentation is useful.\n", + "# Please think about what kind of augmentation is helpful for food recognition.\n", + "train_tfm = transforms.Compose([\n", + " # Resize the image into a fixed shape (height = width = 128)\n", + " transforms.Resize((128, 128)),\n", + " # You may add some transforms here.\n", + " # ToTensor() should be the last one of the transforms.\n", + " transforms.ToTensor(),\n", + "])\n", + "\n", + "# We don't need augmentations in testing and validation.\n", + "# All we need here is to resize the PIL image and transform it into Tensor.\n", + "test_tfm = transforms.Compose([\n", + " transforms.Resize((128, 128)),\n", + " transforms.ToTensor(),\n", + "])\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "qz6jeMnkQl0_" + }, + "outputs": [], + "source": [ + "# Batch size for training, validation, and testing.\n", + "# A greater batch size usually gives a more stable gradient.\n", + "# But the GPU memory is limited, so please adjust it carefully.\n", + "batch_size = 128\n", + "\n", + "# Construct datasets.\n", + "# The argument \"loader\" tells how torchvision reads the data.\n", + "train_set = DatasetFolder(\"food-11/training/labeled\", loader=lambda x: Image.open(x), extensions=\"jpg\", transform=train_tfm)\n", + "valid_set = DatasetFolder(\"food-11/validation\", loader=lambda x: Image.open(x), extensions=\"jpg\", transform=test_tfm)\n", + "unlabeled_set = DatasetFolder(\"food-11/training/unlabeled\", loader=lambda x: Image.open(x), extensions=\"jpg\", transform=train_tfm)\n", + "test_set = DatasetFolder(\"food-11/testing\", loader=lambda x: Image.open(x), extensions=\"jpg\", transform=test_tfm)\n", + "\n", + "# Construct data loaders.\n", + "train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)\n", + "valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)\n", + "test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "j9YhZo7POPYG" + }, + "source": [ + "## **Model**\n", + "\n", + "The basic model here is simply a stack of convolutional layers followed by some fully-connected layers.\n", + "\n", + "Since there are three channels for a color image (RGB), the input channels of the network must be three.\n", + "In each convolutional layer, typically the channels of inputs grow, while the height and width shrink (or remain unchanged, according to some hyperparameters like stride and padding).\n", + "\n", + "Before fed into fully-connected layers, the feature map must be flattened into a single one-dimensional vector (for each image).\n", + "These features are then transformed by the fully-connected layers, and finally, we obtain the \"logits\" for each class.\n", + "\n", + "### **WARNING -- You Must Know**\n", + "You are free to modify the model architecture here for further improvement.\n", + "However, if you want to use some well-known architectures such as ResNet50, please make sure **NOT** to load the pre-trained weights.\n", + "Using such pre-trained models is considered cheating and therefore you will be punished.\n", + "Similarly, it is your responsibility to make sure no pre-trained weights are used if you use **torch.hub** to load any modules.\n", + "\n", + "For example, if you use ResNet-18 as your model:\n", + "\n", + "model = torchvision.models.resnet18(pretrained=**False**) → This is fine.\n", + "\n", + "model = torchvision.models.resnet18(pretrained=**True**) → This is **NOT** allowed." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "Y1c-GwrMQqMl" + }, + "outputs": [], + "source": [ + "class Classifier(nn.Module):\n", + " def __init__(self):\n", + " super(Classifier, self).__init__()\n", + " # The arguments for commonly used modules:\n", + " # torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)\n", + " # torch.nn.MaxPool2d(kernel_size, stride, padding)\n", + "\n", + " # input image size: [3, 128, 128]\n", + " self.cnn_layers = nn.Sequential(\n", + " nn.Conv2d(3, 64, 3, 1, 1),\n", + " nn.BatchNorm2d(64),\n", + " nn.ReLU(),\n", + " nn.MaxPool2d(2, 2, 0),\n", + "\n", + " nn.Conv2d(64, 128, 3, 1, 1),\n", + " nn.BatchNorm2d(128),\n", + " nn.ReLU(),\n", + " nn.MaxPool2d(2, 2, 0),\n", + "\n", + " nn.Conv2d(128, 256, 3, 1, 1),\n", + " nn.BatchNorm2d(256),\n", + " nn.ReLU(),\n", + " nn.MaxPool2d(4, 4, 0),\n", + " )\n", + " self.fc_layers = nn.Sequential(\n", + " nn.Linear(256 * 8 * 8, 256),\n", + " nn.ReLU(),\n", + " nn.Linear(256, 256),\n", + " nn.ReLU(),\n", + " nn.Linear(256, 11)\n", + " )\n", + "\n", + " def forward(self, x):\n", + " # input (x): [batch_size, 3, 128, 128]\n", + " # output: [batch_size, 11]\n", + "\n", + " # Extract features by convolutional layers.\n", + " x = self.cnn_layers(x)\n", + "\n", + " # The extracted feature map must be flatten before going to fully-connected layers.\n", + " x = x.flatten(1)\n", + "\n", + " # The features are transformed by fully-connected layers to obtain the final logits.\n", + " x = self.fc_layers(x)\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aEnGbriXORN3" + }, + "source": [ + "## **Training**\n", + "\n", + "You can finish supervised learning by simply running the provided code without any modification.\n", + "\n", + "The function \"get_pseudo_labels\" is used for semi-supervised learning.\n", + "It is expected to get better performance if you use unlabeled data for semi-supervised learning.\n", + "However, you have to implement the function on your own and need to adjust several hyperparameters manually.\n", + "\n", + "For more details about semi-supervised learning, please refer to [Prof. Lee's slides](https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/semi%20(v3).pdf).\n", + "\n", + "Again, please notice that utilizing external data (or pre-trained model) for training is **prohibited**." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "swlf5EwA-hxA" + }, + "outputs": [], + "source": [ + "\n", + "def get_pseudo_labels(dataset, model, threshold=0.65):\n", + " # This functions generates pseudo-labels of a dataset using given model.\n", + " # It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.\n", + " # You are NOT allowed to use any models trained on external data for pseudo-labeling.\n", + " device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", + "\n", + " # Construct a data loader.\n", + " data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False)\n", + "\n", + " # Make sure the model is in eval mode.\n", + " model.eval()\n", + " # Define softmax function.\n", + " softmax = nn.Softmax(dim=-1)\n", + "\n", + " # Iterate over the dataset by batches.\n", + " images = torch.Tensor([])\n", + " targets = torch.Tensor([])\n", + " for batch in tqdm(data_loader):\n", + " img, _ = batch\n", + "\n", + " # Forward the data\n", + " # Using torch.no_grad() accelerates the forward process.\n", + " with torch.no_grad():\n", + " logits = model(img.to(device))\n", + "\n", + " # Obtain the probability distributions by applying softmax on logits.\n", + " probs = softmax(logits)\n", + "\n", + " # ---------- TODO ----------\n", + " # Filter the data and construct a new dataset.\n", + " # 根据阈值判断是否保留\n", + " for idx, prob in enumerate(probs):\n", + " c = torch.argmax(prob)\n", + " if prob[c] > threshold:\n", + " torch.cat([images, img[idx]],dim=0) # 用索引选出对应的图片\n", + " torch.cat([targets, torch.tensor(c)],dim=0) # 用最大值索引当class \n", + " \n", + " dataset = TensorDataset(images, targets) # 拼成tensor dataset\n", + "\n", + " # # Turn off the eval mode.\n", + " model.train()\n", + " return dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PHaFE-8oQtkC" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "6f34f76a3d4441f9ad917f74b46237d4", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + " 0%| | 0/54 [00:00 best_acc:\n", + " best_acc = valid_acc\n", + " torch.save(model.state_dict(), model_path)\n", + " train_loss_record.append(train_loss)\n", + " valid_loss_record.append(valid_loss)\n", + " train_acc_record.append(train_acc)\n", + " valid_acc_record.append(valid_acc)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Visualize Result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "x = np.arange(len(train_acc_record))\n", + "plt.plot(x, train_acc_record, color=\"blue\", label=\"Train\")\n", + "plt.plot(x, valid_acc_record, color=\"red\", label=\"Valid\")\n", + "plt.legend(loc=\"upper right\")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "x = np.arange(len(train_loss_record))\n", + "plt.plot(x, train_loss_record, color=\"blue\", label=\"Train\")\n", + "plt.plot(x, valid_loss_record, color=\"red\", label=\"Valid\")\n", + "plt.legend(loc=\"upper right\") \n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2o1oCMXy61_3" + }, + "source": [ + "## **Testing**\n", + "\n", + "For inference, we need to make sure the model is in eval mode, and the order of the dataset should not be shuffled (\"shuffle=False\" in test_loader).\n", + "\n", + "Last but not least, don't forget to save the predictions into a single CSV file.\n", + "The format of CSV file should follow the rules mentioned in the slides.\n", + "\n", + "### **WARNING -- Keep in Mind**\n", + "\n", + "Cheating includes but not limited to:\n", + "1. using testing labels,\n", + "2. submitting results to previous Kaggle competitions,\n", + "3. sharing predictions with others,\n", + "4. copying codes from any creatures on Earth,\n", + "5. asking other people to do it for you.\n", + "\n", + "Any violations bring you punishments from getting a discount on the final grade to failing the course.\n", + "\n", + "It is your responsibility to check whether your code violates the rules.\n", + "When citing codes from the Internet, you should know what these codes exactly do.\n", + "You will **NOT** be tolerated if you break the rule and claim you don't know what these codes do.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4HznI9_-ocrq" + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9568bd7c179b4c668f84ba235b87c6ad", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + " 0%| | 0/27 [00:00