We need to find which, among a given set of pre-trained text-classification models, has the best performance based on different evaluation metrics. For this, we will use the method of TOPSIS - Technique for Order of Preference by Similarity to Ideal Solution
Here, 4 models based on text-classification are being imported:
- distilbert/distilbert-base-uncased-finetuned-sst-2-english
- lxyuan/distilbert-base-multilingual-cased-sentiments-student
- cardiffnlp/twitter-roberta-base-sentiment-latest
- siebert/sentiment-roberta-large-english
We have created a sample dataset for the three different genres of the world, namely Education, Sports, Politics and Finance, in order to test the model for different metrics
Hence, we have the following result
Domain | Best Model | Model Name |
---|---|---|
Education | Model 2 | lxyuan/distilbert-base-multilingual-cased-sentiments-student |
Sports | Model 4 | siebert/sentiment-roberta-large-english |
Politics | Model 4 | siebert/sentiment-roberta-large-english |
Finance | Model 4 | siebert/sentiment-roberta-large-english |
Dataset is available in "Education.csv", "Sports.csv", "Politics.csv" and "Finance.csv" files
Python Code is available in "Models.ipynb" jupyter notebook