toxic_comment_classification

Building toxicity models that operate fairly across a diverse range of conversations.

Here’s the background: When the Conversation AI team first built toxicity models, they found that the models incorrectly learned to associate the names of frequently attacked identities with toxicity. Models predicted a high likelihood of toxicity for comments containing those identities (e.g. "gay"), even when those comments were not actually toxic (such as "I am a gay woman"). This happens because training data was pulled from available sources where unfortunately, certain identities are overwhelmingly referred to in offensive ways. Training a model from data with these imbalances risks simply mirroring those biases back to users.

In this competition, you're challenged to build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. Develop strategies to reduce unintended bias in machine learning models, and you'll help the Conversation AI team, and the entire industry, build models that work well for a wide range of conversations.

Disclaimer: The dataset contains text that may be considered profane, vulgar, or offensive.

External data set :FastText crawl 300d 2M file

File Source :https://fasttext.cc/docs/en/english-vectors.html

FastText crawl 300d 2M file :300-dimensional pretrained FastText English word vectors released by Facebook.

The first line of the file contains the number of words in the vocabulary and the size of the vectors. Each line contains a word followed by its vectors, like in the default fastText text format. Each value is space separated. Words are ordered by descending frequency.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
jubtc		jubtc
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
single-lstm.ipynb		single-lstm.ipynb
toxic_comment_classification_single_model.ipynb		toxic_comment_classification_single_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

toxic_comment_classification

External data set :FastText crawl 300d 2M file

About

Releases

Packages

Languages

rashmimarganiatgithub/toxic_comment_classification

Folders and files

Latest commit

History

Repository files navigation

toxic_comment_classification

External data set :FastText crawl 300d 2M file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages