INTRODUCTION
A hate speech is any speech made to arouse and propagate hate or prejudice against an Individual or a group. It is characterized by the use of insultive and derogatory words to stigmatize, minimize or
Python programming language has become a mainstream language and its applications cuts across several industries including, Information Technology, Finance, Government, Market research and many others. We are proposing to examine the application of Python in discovering information from text documents.
Our motivation is gotten from a Reality TV show competition in Nigeria called Big Brother Nigeria. This show has gathered public attraction and has generated a lot of controversies among Nigerians. Perhaps one thing that makes this show popular is its eviction system, it requires the public to vote through Text message or an online voting platform. Each contestant with the lowest vote during a voting period would be evicted. The interesting thing is that most Nigerians now take to twitter to share their opinions and perception about their favorite contestant. Thus, tons of thousands of tweets are added daily mostly from Nigerians about this reality show. We want to extract this data from twitter, analyze them and check for interesting information about them.
We are taking advantage of the fact that the show is currently ongoing, and we are trying to gather tweets about the show using the hashtag #bbnaija. We plan to extract between 500,000 to 1,000,000 tweets on this Hashtag. The good thing is that twitter has a feature that allows developers to download streams of tweets as they are being posted into a database. This feature is accessed through the twitter streaming API. We have already connected to twitter using twitter streaming API and we have been able to gather 262,801 tweets. This number is increasing daily and we would stop when we reach the 500,000 tweets target or if we still have enough time, we would gather up to 1,000,000 tweets.