Skip to content

trunghieu41003/Big-data-project

Repository files navigation

Overview

We solved 2 tasks in the Vietnamese Aspect-based Sentiment Analysis problem: Aspect Category Detection (ACD) and Sentiment Polarity Classification (SPC). Besides, we proposed end-to-end models to handle the above tasks simultaneously for 1 domains (Hotel) in the VLSP 2018 ABSA dataset using PhoBERT as Pre-trained language models for Vietnamese in 4 ways:

  • Multi-task

  • Multi-task with Multi-branch approach

  • CNN

  • Bi-LSTM

Dataset

  • The VLSP 2018 Aspect-based Sentiment Analysis dataset:
Domain Dataset Reviews Aspects AvgLength VocabSize DiffVocab
Training 3,000 13,948 47 3,908 -
Hotel Dev 2,000 7,111 23 2,745 1,059
Test 600 2,584 30 1,631 346
  • Preprocessing:
flowchart LR
A[Remove\nHTML] --> B[Standardize\nUnicode] --> C[Normalize\nAcronym] --> D[Word\nSegmentation] --> E[Remove\nunnecessary\ncharacters]
Loading

Results

Task Method Hotel
Precision Recall F1-score
Aspect
Detection
Bi-LSTM 98.80 5.70 5.00
CNN 77.70 44.30 48.00
Multi-task 82.70 50.00 55.00
Multi-task Multi-branch 95.20 47.70 52.30
Aspect +
Polarity
Bi-LSTM 96.70 28.10 26.20
CNN 78.80 45.80 46.60
Multi-task 82.30 50.09 52.20
Multi-task Multi-branch 82.20 48.70 50.10

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published