updated project 3

rz2394 · Oct 8, 2019 · 89046d3 · 89046d3
1 parent 3ecf4ec
commit 89046d3
Show file tree

Hide file tree

Showing 21 changed files with 704 additions and 0 deletions.
diff --git a/Projects_StarterCodes/Project3-FacialEmotionRecognition/data/AU_annotation_all_subjects.xls b/Projects_StarterCodes/Project3-FacialEmotionRecognition/data/AU_annotation_all_subjects.xls
diff --git a/Projects_StarterCodes/Project3-FacialEmotionRecognition/data/README.md b/Projects_StarterCodes/Project3-FacialEmotionRecognition/data/README.md
@@ -0,0 +1,6 @@
+# Project: Can you recognize the emotion from an image of a face?
+
+### Data folder
+
+The data directory contains data used in the analysis. This is treated as read only; in paricular the R/python files are never allowed to write to the files in here. Depending on the project, these might be csv files, a database, and the directory itself may have subdirectories.
+
diff --git a/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/Main.Rmd b/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/Main.Rmd
@@ -0,0 +1,251 @@
+---
+title: "Main"
+author: "Chengliang Tang, Yujie Wang, Tian Zheng"
+output:
+  html_document:
+    df_print: paged
+---
+In your final repo, there should be an R markdown file that organizes **all computational steps** for evaluating your proposed Facial Expression Recognition framework. 
+
+This file is currently a template for running evaluation experiments. You should update it according to your codes but following precisely the same structure. 
+
+```{r message=FALSE}
+if(!require("EBImage")){
+  source("https://bioconductor.org/biocLite.R")
+  biocLite("EBImage")
+}
+if(!require("R.matlab")){
+  install.packages("R.matlab")
+}
+if(!require("readxl")){
+  install.packages("readxl")
+}
+
+if(!require("dplyr")){
+  install.packages("dplyr")
+}
+if(!require("readxl")){
+  install.packages("readxl")
+}
+
+if(!require("ggplot2")){
+  install.packages("ggplot2")
+}
+
+if(!require("caret")){
+  install.packages("caret")
+}
+
+library(R.matlab)
+library(readxl)
+library(dplyr)
+library(EBImage)
+library(ggplot2)
+library(caret)
+```
+
+### Step 0 set work directories, extract paths, summarize
+```{r wkdir, eval=FALSE}
+set.seed(0)
+setwd("~/Project3-FacialEmotionRecognition/doc")
+# here replace it with your own path or manually set it in RStudio to where this rmd file is located. 
+# use relative path for reproducibility
+```
+
+Provide directories for training images. Training images and Training fiducial points will be in different subfolders. 
+```{r}
+train_dir <- "../data/train_set/" # This will be modified for different data sets.
+train_image_dir <- paste(train_dir, "images/", sep="")
+train_pt_dir <- paste(train_dir,  "points/", sep="")
+train_label_path <- paste(train_dir, "label.csv", sep="") 
+```
+
+
+### Step 1: set up controls for evaluation experiments.
+
+In this chunk, we have a set of controls for the evaluation experiments. 
+
++ (T/F) cross-validation on the training set
++ (number) K, the number of CV folds
++ (T/F) process features for training set
++ (T/F) run evaluation on an independent test set
++ (T/F) process features for test set
+
+```{r exp_setup}
+run.cv=TRUE # run cross-validation on the training set
+K <- 5  # number of CV folds
+run.feature.train=TRUE # process features for training set
+run.test=TRUE # run evaluation on an independent test set
+run.feature.test=TRUE # process features for test set
+```
+Using cross-validation or independent test set evaluation, we compare the performance of models with different specifications. In this Starter Code, we tune parameter k (number of neighbours) for KNN.
+
+```{r model_setup}
+k = c(5,11,21,31,41,51)
+model_labels = paste("KNN with K =", k)
+```
+
+### Step 2: import data and train-test split 
+```{r}
+#train-test split
+info <- read.csv(train_label_path)
+n <- nrow(info)
+n_train <- round(n*(4/5), 0)
+train_idx <- sample(info$Index, n_train, replace = F)
+test_idx <- setdiff(info$Index,train_idx)
+```
+
+If you choose to extract features from images, such as using Gabor filter, R memory will exhaust all images are read together. The solution is to repeat reading a smaller batch(e.g 100) and process them. 
+```{r}
+n_files <- length(list.files(train_image_dir))
+
+image_list <- list()
+for(i in 1:100){
+   image_list[[i]] <- readImage(paste0(train_image_dir, sprintf("%04d", i), ".jpg"))
+}
+```
+
+Fiducial points are stored in matlab format. In this step, we read them and store them in a list.
+```{r read fiducial points}
+#function to read fiducial points
+#input: index
+#output: matrix of fiducial points corresponding to the index
+readMat.matrix <- function(index){
+     return(round(readMat(paste0(train_pt_dir, sprintf("%04d", index), ".mat"))[[1]],0))
+}
+
+#load fiducial points
+fiducial_pt_list <- lapply(1:n_files, readMat.matrix)
+save(fiducial_pt_list, file="../output/fiducial_pt_list.RData")
+```
+
+### Step 3: construct features and responses
+
++ The follow plots show how pairwise distance between fiducial points can work as feature for facial emotion recognition.
+
+  + In the first column, 78 fiducials points of each emotion are marked in order. 
+  + In the second column distributions of vertical distance between right pupil(1) and  right brow peak(21) are shown in  histograms. For example, the distance of an angry face tends to be shorter than that of a surprised face.
+  + The third column is the distributions of vertical distances between right mouth corner(50)
+and the midpoint of the upper lip(52).  For example, the distance of an happy face tends to be shorter than that of a face.
+
+![Figure1](../figs/feature_visualization.jpg)
+
+`feature.R` should be the wrapper for all your feature engineering functions and options. The function `feature( )` should have options that correspond to different scenarios for your project and produces an R object that contains features and responses that are required by all the models you are going to evaluate later. 
+
+  + `feature.R`
+  + Input: list of images or fiducial point
+  + Output: an RData file that contains extracted features and corresponding responses
+
+```{r feature}
+source("../lib/feature.R")
+tm_feature_train <- NA
+if(run.feature.train){
+  tm_feature_train <- system.time(dat_train <- feature(fiducial_pt_list, train_idx))
+}
+
+tm_feature_test <- NA
+if(run.feature.train){
+  tm_feature_test <- system.time(dat_test <- feature(fiducial_pt_list, test_idx))
+}
+
+save(dat_train, file="../output/feature_train.RData")
+save(dat_test, file="../output/feature_test.RData")
+```
+
+### Step 4: Train a classification model with training features and responses
+Call the train model and test model from library. 
+
+`train.R` and `test.R` should be wrappers for all your model training steps and your classification/prediction steps. 
+
++ `train.R`
+  + Input: a data frame containing features and labels and a parameter list.
+  + Output:a trained model
++ `test.R`
+  + Input: the fitted classification model using training data and processed features from testing images 
+  + Input: an R object that contains a trained classifier.
+  + Output: training model specification
+
++ In this Starter Code, we use KNN to do classification. 
+
+```{r loadlib}
+#source("../lib/train.R") Since knn does not need to train, I comment this line.
+source("../lib/test_knn.R")
+```
+
+#### Model selection with cross-validation
+* Do model selection by choosing among different values of training model parameters.
+```{r runcv, eval=F}
+source("../lib/cross_validation_knn.R")
+if(run.cv){
+  err_cv <- matrix(0, nrow = length(k), ncol = 2)
+  for(i in 1:length(k)){
+    cat("k=", k[i], "\n")
+    err_cv[i,] <- cv.function(dat_train, K, k[i])
+  save(err_cv, file="../output/err_cv.RData")
+  }
+}
+```
+
+Visualize cross-validation results. 
+```{r cv_vis}
+if(run.cv){
+  load("../output/err_cv.RData")
+  err_cv <- as.data.frame(err_cv) 
+  colnames(err_cv) <- c("mean_error", "sd_error")
+  err_cv$k = as.factor(k)
+  err_cv %>% 
+    ggplot(aes(x = k, y = mean_error,
+               ymin = mean_error - sd_error, ymax = mean_error + sd_error)) + 
+    geom_crossbar() +
+    theme(axis.text.x = element_text(angle = 90, hjust = 1))
+}
+
+```
+
+
+* Choose the "best" parameter value
+```{r best_model}
+if(run.cv){
+  model_best <- k[which.min(err_cv[,1])]
+}
+par_best <- list(k = model_best)
+```
+
+* Train the model with the entire training set using the selected model (model parameter) via cross-validation.
+```{r final_train}
+#tm_train=NA
+#tm_train <- system.time(fit_train <- train(dat_train, par_best))
+#save(fit_train, file="../output/fit_train.RData")
+```
+
+### Step 5: Run test on test images
+```{r test}
+tm_test=NA
+if(run.test){
+  load(file="../output/fit_train.RData")
+  tm_test <- system.time(pred <- test(model_best, dat_test))
+}
+```
+
+* evaluation
+```{r}
+accu <- mean(dat_test$emotion_idx == pred)
+cat("The accuracy of model:", model_labels[which.min(err_cv[,1])], "is", accu*100, "%.\n")
+
+library(caret)
+confusionMatrix(pred, dat_test$emotion_idx)
+```
+
+Note that the accuracy is not high but is better than that of ramdom guess(4.5%). 
+
+### Summarize Running Time
+Prediction performance matters, so does the running times for constructing features and for training the model, especially when the computation resource is limited. 
+```{r running_time}
+cat("Time for constructing training features=", tm_feature_train[1], "s \n")
+cat("Time for constructing testing features=", tm_feature_test[1], "s \n")
+#cat("Time for training model=", tm_train[1], "s \n")
+cat("Time for testing model=", tm_test[1], "s \n")
+```
+
+###Reference
+- Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454-E1462.
diff --git a/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/README.md b/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/README.md
@@ -0,0 +1,5 @@
+# Project: Can you recognize the emotion from an image of a face?
+
+### Doc folder
+
+The doc directory contains the report or presentation files. It can have subfolders.  
diff --git a/...s_StarterCodes/Project3-FacialEmotionRecognition/doc/a_note_on_contributions.md b/...s_StarterCodes/Project3-FacialEmotionRecognition/doc/a_note_on_contributions.md
@@ -0,0 +1,20 @@
+### A Note on Contributions
+
+Whenever we have team projects, there are always concerns on unequal contributions from members of a project team. In the ideal world, we are all here to put in our best efforts and learn together. Even in that ideal world, we have different skill sets and preparations, and we will still contribute differently to a project. 
+
+Therefore, you are required to post a *contribution statement* in the root README.md of your GitHub repo. Please beware that your GitHub repo will become public and remain public after the due date of the projects. 
+
+Post your title, team members, project abstract and a contribution statement in the README.md file.  This is a common practice for research scientific journals. 
+
+Below is an example. If no contribution statement is provided, we will insert a default statement that goes "**All team members contributed equally in all stages of this project. All team members approve our work presented in this GitHub repository including this contributions statement**. "
+
+---
+Sample project README statement.
+
+Project xxx
+
+Team members: Avi Bond, Comma Deed, En Funn, Gem Hon
+
+Summary: In this project, we developed a new method and improved the classification accuracy for images of cats and dogs to 99.9%
+
+[Contribution Statement] AB, CD, EF and GH designed the study. AB and CD developed baseline classification model for evaluation. EF and GH explored feature engineering for improving the baseline model. AB, EF and GH discussed and designed the model evaluation protocol. CD carried out the computation for model evaluation. All team members contributed to the GitHub repository and prepared the presentation. All team members approve our work presented in our GitHub repository including this contribution statement.
diff --git a/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/feature_visualization.Rmd b/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/feature_visualization.Rmd
@@ -0,0 +1,80 @@
+---
+title: "Functions to Generate Figure 1"
+output:
+  html_document:
+    info_print: paged
+  html_notebook: default
+  pinfo_document: default
+  word_document: default
+---
+
+```{r}
+info <- read.csv("../data/train_set/label.csv")
+load("../output/fiducial_pt_list.Rdata")
+library(readxl)
+# A function to plot histogram of distribution of pairwise distance for each emotion
+# input: emotion index, first point, second point, vertical or horizontal distance
+# output: histogram of distance distribution
+dist_histogram_by_emotion <- function(emotion_idx, pt1, pt2, vertical = T){
+    indices <- info[info$emotion_idx == emotion_idx, 'Index']
+    emotion <- as.character(info[info$emotion_idx == emotion_idx, 'emotion_cat'])[1]
+    ver.dist <- function(idx){
+        if(vertical){
+        ver.loc <- fiducial_pt_list[[idx]][,2]
+        }
+        else{
+        ver.loc <- fiducial_pt_list[[idx]][,1] 
+        }
+        pairwise_dist <- as.matrix(dist(ver.loc))
+        ver.dist <- pairwise_dist[pt1,pt2]
+    return(ver.dist)
+}
+    dist_distribution <- sapply(indices, ver.dist)
+    #jpeg(paste(c('../output/hist', emotion_idx,"_", pt1, "_", pt2, '.jpg'), collapse = ''),
+    #width = 500, height = 375)
+    hist(dist_distribution, xlab = "distance", 
+         main = paste(c(emotion, "distance between", pt1, "and", pt2),collapse = " ")) 
+    abline(v = mean(dist_distribution))
+    #dev.off()
+}
+
+# A function to generalize "dist_histogram_by_emotion" to all emotions 
+dist_histogram <- function(pt1, pt2, vertical = T){
+  lapply(1:22, dist_histogram_by_emotion, pt1, pt2, vertical)
+}
+
+#examples
+dist_histogram(50,52)
+dist_histogram(1,21)
+```
+
+```{r}
+library(EBImage)
+# a function display fiducial points on images
+# input: identity index
+# output: all images associated with the identity with fiducial points marked
+display_fid_pt <- function(identity){
+    indices <- info[info$identity == identity, 'Index']
+    emotions <- as.character(info[info$identity == identity, 'emotion_cat'])
+    image.path_sub <- paste0(train_image_dir, sprintf("%04d", indices), ".jpg")
+    Image_list_sub <- lapply(image.path_sub, EBImage::readImage)
+    fiducial_pt_list_sub <- fiducial_pt_list[indices]
+
+    display_single <- function(j){
+        #jpeg(paste(c('../output/', identity,'_', j, '.jpg'), collapse = ''), width = 500, height = 375)
+        display(Image(Image_list_sub[[j]], colormode = 'Color'), method="raster")
+        text(x = 170, y = 50, label = emotions[j], cex = 1.5)
+        add_point <- function(n){text(x = fiducial_pt_list_sub[[j]][n,1],
+                                      y = fiducial_pt_list_sub[[j]][n,2],
+                                      label = as.character(n), col = "white", cex = 0.8)}
+        lapply(1:78,add_point)
+        #dev.off() 
+}
+
+    lapply(1:length(indices), display_single) 
+}
+
+display_fid_pt(110)
+
+```
+
diff --git a/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/predictiveprogram.png b/Projects_StarterCodes/Project3-FacialEmotionRecognition/doc/predictiveprogram.png