This folder contains the folowing ressources :
- AMTS: AmazighTag Set(28 tags). In 29 tagset corpus (we splitted preposition to S and S_PP)
- data_lab28 : An annotated corpus of about ~20K words. It is also labeled data with lexical n-gram features
- Amazigh_Corpus : Amazigh unlabeled data corpus.
- data_29tags : An annotated corpus of about ~20K words. In this coprus we used 29 tags(we separated S and S_PP to distinguish between preposition(S) and preposition when folowed by a personal pronoun).
- labeledData.29.tags.2col_WORD-POS : 21k words of labeled data in 2 colonnes the token and its part of speech.
- Unlabelled data, collected for divers books and web sites, we have :
- UnlabeledData.sent : containnig brut texts. It contains only sentences containing more than 2 tokens per sentence
- UnlabeledData.225K.TOK: contains 225240 tokens.
- And UnlabeledData.1col contains 224620 tokens.
This folder containes also some useful perl scripts.