BERT-toxicity-classification

This repo show how to train bert model on Jigsaw Unintended Bias in Toxicity Classification
star me and i will keep update the code
this repo is modified from google open source code for bert , thank Jon Mischo advice here

LB Score

2019-04-06: 0.91216
2019-04-07: 0.91455(add text clean method reference here)

How to output the prediction on test data by finetuning bert model

prepare

download the pretrain model
download the data and unzip to input folder
split the train and dev data(for convenience, i just tyde this command and not recommanded)

cat train.csv | tail -n 1000 > dev_1000.csv

train model

run run_classifier.py

python run_classifier.py \
  --data_dir=input/ --vocab_file=uncased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=uncased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=uncased_L-12_H-768_A-12/bert_model.ckpt \
  --task_name=toxic \
  --do_train=True \
  --do_eval=True \
  --do_predict=True \
  --output_dir=model_output/

the model will train 10 epochs, but you can stop it depend on your time
the checkpoint will be saved on the model_output, also the prediton on the test data(see model_output/test_result.tsv)

generate the submission

run encode.py
upload the output/sub.csv to kaggle

What is the different with official code**

add csv handler(line 243 in run_classifier.py)
add ToxicProcessor(line 264 in run_classifier.py)

To do

text clean and OOV
CV
average different checkpoint prediction

like this repo? you can buy me a cup of coffee

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
input		input
output		output
README.md		README.md
__init__.py		__init__.py
encode.py		encode.py
modeling.py		modeling.py
modeling_test.py		modeling_test.py
optimization.py		optimization.py
optimization_test.py		optimization_test.py
run_classifier.py		run_classifier.py
tokenization.py		tokenization.py
tokenization_test.py		tokenization_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT-toxicity-classification

LB Score

How to output the prediction on test data by finetuning bert model

prepare

train model

generate the submission

What is the different with official code**

To do

About

Releases

Packages

Languages

EliasCai/bert-toxicity-classification

Folders and files

Latest commit

History

Repository files navigation

BERT-toxicity-classification

LB Score

How to output the prediction on test data by finetuning bert model

prepare

train model

generate the submission

What is the different with official code**

To do

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages