Skip to content

Discriminate between the Moldavian and the Romanian dialects across different text genres (news versus tweets)

Notifications You must be signed in to change notification settings

DenisaElena99/romanian-sub-dialect-identification

Repository files navigation

Romanian-sub-dialect-identification

Machine Learning Project

Overview

In the Romanian Dialect Identification (RDI) shared task, participants have to train a model on tweets. Therefore, participants have to build a model for a in-genre binary classification by dialect task, in which a classification model is required to discriminate between the Moldavian (MD) and the Romanian (RO) dialects.

Task

Participants have to train a model on tweets. Therefore, participants have to build a model for a in-genre binary classification by dialect task, in which a classification model is required to discriminate between the Moldavian (label 0) and the Romanian (label 1) dialects.

File Descriptions

  • train_samples.txt - the training data samples (one sample per row)
  • train_labels.txt - the training labels (one label per row)
  • validation_samples.txt - the validation data samples (one sample per row)
  • validation_labels.txt - the validation labels (one label per row)
  • test_samples.txt - the test data samples (one sample per row)
  • sample_submission.txt - a sample submission file in the correct format (but the final sample_sumbission need to be .csv format)

About

Discriminate between the Moldavian and the Romanian dialects across different text genres (news versus tweets)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages