AmazighCorpora

This folder contains the folowing ressources :

AMTS: AmazighTag Set(28 tags). In 29 tagset corpus (we splitted preposition to S and S_PP)
data_lab28 : An annotated corpus of about ~20K words. It is also labeled data with lexical n-gram features
Amazigh_Corpus : Amazigh unlabeled data corpus.
data_29tags : An annotated corpus of about ~20K words. In this coprus we used 29 tags(we separated S and S_PP to distinguish between preposition(S) and preposition when folowed by a personal pronoun).
labeledData.29.tags.2col_WORD-POS : 21k words of labeled data in 2 colonnes the token and its part of speech.
Unlabelled data, collected for divers books and web sites, we have :

UnlabeledData.sent : containnig brut texts. It contains only sentences containing more than 2 tokens per sentence
UnlabeledData.225K.TOK: contains 225240 tokens.
And UnlabeledData.1col contains 224620 tokens.

This folder containes also some useful perl scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
AMTS.pdf		AMTS.pdf
Makefile		Makefile
README.md		README.md
UnlabeledData.1col		UnlabeledData.1col
UnlabeledData.225K.TOK		UnlabeledData.225K.TOK
UnlabeledData.sent		UnlabeledData.sent
baseline.pl		baseline.pl
conlleval.pl		conlleval.pl
convRaw2Yam4POS.pl		convRaw2Yam4POS.pl
data_Lab28_extract		data_Lab28_extract
dataset29.tags.lf.10col		dataset29.tags.lf.10col
dataset_21K_TOK.10col		dataset_21K_TOK.10col
dictio		dictio
labeledData.29.tags.2col_WORD-POS		labeledData.29.tags.2col_WORD-POS
lex8k		lex8k
splitCV.pl		splitCV.pl
template		template
word_bondary.pl		word_bondary.pl

Provide feedback