This project is a simple implementation of bag of words and tf-idf. It does document classification using the following dataset -
- talk.politics.misc
- misc.forsale
- rec.motorcycles
- comp.sys.mac.hardware
- sci.med
- talk.religion.misc
- scikit-learn
- pickle
- Kafka
wget http://www-us.apache.org/dist/kafka/1.0.0/kafka_2.11-1.0.0.tgz
tar -xvf kafka_2.11-1.0.0.tgz
Start Kafka
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic velotio
Start kafka producer
4. python twitter_kafka_prodcer.py
Start message classifier
5. python doc_classifier.py
"message" => category