In the Romanian Dialect Identification (RDI) shared task, participants have to train a model on tweets. Therefore, participants have to build a model for a in-genre binary classification by dialect task, in which a classification model is required to discriminate between the Moldavian (MD) and the Romanian (RO) dialects.
Participants have to train a model on tweets. Therefore, participants have to build a model for a in-genre binary classification by dialect task, in which a classification model is required to discriminate between the Moldavian (label 0) and the Romanian (label 1) dialects.
- train_samples.txt - the training data samples (one sample per row)
- train_labels.txt - the training labels (one label per row)
- validation_samples.txt - the validation data samples (one sample per row)
- validation_labels.txt - the validation labels (one label per row)
- test_samples.txt - the test data samples (one sample per row)
- sample_submission.txt - a sample submission file in the correct format (but the final sample_sumbission need to be .csv format)