We solved 2 tasks in the Vietnamese Aspect-based Sentiment Analysis problem: Aspect Category Detection (ACD) and Sentiment Polarity Classification (SPC). Besides, we proposed end-to-end models to handle the above tasks simultaneously for 1 domains (Hotel) in the VLSP 2018 ABSA dataset using PhoBERT as Pre-trained language models for Vietnamese in 4 ways:
-
Multi-task
-
Multi-task with Multi-branch approach
-
CNN
-
Bi-LSTM
- The VLSP 2018 Aspect-based Sentiment Analysis dataset:
Domain | Dataset | Reviews | Aspects | AvgLength | VocabSize | DiffVocab |
---|---|---|---|---|---|---|
Training | 3,000 | 13,948 | 47 | 3,908 | - | |
Hotel | Dev | 2,000 | 7,111 | 23 | 2,745 | 1,059 |
Test | 600 | 2,584 | 30 | 1,631 | 346 |
- Preprocessing:
flowchart LR
A[Remove\nHTML] --> B[Standardize\nUnicode] --> C[Normalize\nAcronym] --> D[Word\nSegmentation] --> E[Remove\nunnecessary\ncharacters]
Task | Method | Hotel | ||
---|---|---|---|---|
Precision | Recall | F1-score | ||
Aspect Detection |
Bi-LSTM | 98.80 | 5.70 | 5.00 |
CNN | 77.70 | 44.30 | 48.00 | |
Multi-task | 82.70 | 50.00 | 55.00 | |
Multi-task Multi-branch | 95.20 | 47.70 | 52.30 | |
Aspect + Polarity |
Bi-LSTM | 96.70 | 28.10 | 26.20 |
CNN | 78.80 | 45.80 | 46.60 | |
Multi-task | 82.30 | 50.09 | 52.20 | |
Multi-task Multi-branch | 82.20 | 48.70 | 50.10 |