Code-Mixed Tweet Identifier

It classifies the tweets data into three main categories English, Hindi and Code-mixed. It also identifies code-mixed sentences which have mixing of grammatical constructions from each language

Data is provide in the data folder along with code

Requirements

Keras
Tensorflow or Theano (we experimented with theano)
Gensim
xgboost
NLTK
Sklearn
Numpy

Instructions to run

There are two main approaches:-

Using Character Aware Neural Networks after feature modelling using CNN. Code for this approach is available at code/NN_approach.ipynb
Using SVN the code for which is available at code/train_shub.ipynb

Transliteration is carried out using this library( https://github.com/irshadbhat/litcm )

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
data		data
models		models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code-Mixed Tweet Identifier

Requirements

Instructions to run

There are two main approaches:-

About

Releases

Packages

Languages

shubham745/tweet-classifier

Folders and files

Latest commit

History

Repository files navigation

Code-Mixed Tweet Identifier

Requirements

Instructions to run

There are two main approaches:-

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages