added metrics

graviraja · Aug 7, 2020 · 61ae290 · 61ae290
1 parent 47e7e86
commit 61ae290
Show file tree

Hide file tree

Showing 4 changed files with 18 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -470,6 +470,13 @@ Therefore, our sequence tagging model uses both
 
 ![ner](assets/images/applications/classification/char_bilstm_ner.png)
 
+### Day 83: Evaluation metrics for NER tagging
+
+Micro and macro-averages (for whatever metric) will compute slightly different things, and thus their interpretation differs. A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric. In a multi-class classification setup, micro-average is preferable if you suspect there might be class imbalance (i.e you may have many more examples of one class than of other classes).
+
+![ner](assets/images/applications/classification/bilstm_crf_res.png)
+
+![ner](assets/images/applications/classification/char_bilstm_crf_res.png)
 
 Checkout the code in `applications/classification` folder
 

diff --git a/applications/classification/ner_tagging/README.md b/applications/classification/ner_tagging/README.md
@@ -55,13 +55,20 @@ Since we're using CRFs, we're not so much predicting the right label at each wor
 
 ![ner](../../../assets/images/applications/classification/viterbi.png)
 
+Results:
+
+Micro and macro-averages (for whatever metric) will compute slightly different things, and thus their interpretation differs. A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric. In a multi-class classification setup, micro-average is preferable if you suspect there might be class imbalance (i.e you may have many more examples of one class than of other classes).
+
+![ner](../../../assets/images/applications/classification/bilstm_crf_res.png)
+
 #### Resources
 
 - [Medium Blog post on CRF (Must read)](https://towardsdatascience.com/implementing-a-linear-chain-conditional-random-field-crf-in-pytorch-16b0b9c4b4ea)
 - [BiLSTM - CRF model paper](https://arxiv.org/pdf/1508.01991.pdf)
 - [CRF Video Explanation](https://www.youtube.com/watch?v=GF3iSJkgPbA)
 - [code reference](https://github.com/Gxzzz/BiLSTM-CRF)
 - [Vitebri decoding](https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Sequence-Labeling#viterbi-decoding)
+- [Metric explanation](https://datascience.stackexchange.com/questions/15989/micro-average-vs-macro-average-performance-in-a-multiclass-classification-settin)
 
 ## NER tagging with Char-BiLSTM-CRF.ipynb
 
@@ -72,5 +79,8 @@ Therefore, our sequence tagging model uses both
 - `word-level` information in the form of word embeddings.
 - `character-level` information up to and including each word in both directions.
 
+![ner](../../../assets/images/applications/classification/char_bilstm_ner.png)
+
+Results:
 
-![ner](../../../assets/images/applications/classification/char_bilstm_ner.png)
+![ner](../../../assets/images/applications/classification/char_bilstm_crf_res.png)
diff --git a/assets/images/applications/classification/bilstm_crf_res.png b/assets/images/applications/classification/bilstm_crf_res.png
diff --git a/assets/images/applications/classification/char_bilstm_crf_res.png b/assets/images/applications/classification/char_bilstm_crf_res.png