GitHub - BabaSanfour/Text-Classification--Classic-models-vs-BERT: Test classic models vs BERT performance on the IMBD movie review + a sneak peak on BERT attention weights

In this notebook we have a copmarison between 4 classic ML models from different families (LogReg using L2 and L1, RandomForest and XGboost) and a transofomer: Bidirectional Encoder Representations from Transformers. Comparison was done using accuracy and ROC AUC, a two popular metrics in the machine learning field.

For the classic models, we used TF-IDF to generate tokens and train the models.

For BERT, we used the pretrained version provided by huggingface: BERTforSequenceClassification.

The task was binary classification of positive vs negative reviews on the famous IMDB movies dataset The IMDB Reviews data can be downloaded from here: http://ai.stanford.edu/~amaas/data/sentiment/.

At the end we also added some attention weights for the BERT heads visualization using heatmaps

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Text_classification_classic_models_vs_bert.ipynb		Text_classification_classic_models_vs_bert.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

BabaSanfour/Text-Classification--Classic-models-vs-BERT

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages