Integrate BERT into Hedwig #11

achyudh · 2019-04-14T04:09:15Z

Changes:

Fix package imports
Update README.md
Fix bug due to TAR/AR attribute check
Add BERT models
Add BERT tokenizer
Return logits from the model.py
Remove unused classes in models/bert
Return logits from the model.py (DocBERT model weights #12)
Remove unused classes in models/bert (Update READMEs #13)
Add initial main file
Add args for BERT
Add partial support for BERT
Initialize training and optimization
Draft the structure of Trainers for BERT
Remove duplicate tokenizer
Add utils
Move optimization to utils
Add more structure for trainer
Refactor the trainer (Remove unwanted args from models/bert #15)
Refactor the trainer
Add more edits
Add support for our datasets
Add evaluator
Split data4bert module into multiple processors
Refactor BERT tokenizer
Integrate BERT into Castor framework (Fix bug where model wasn't in training mode every epoch #17)
Remove unused classes in models/bert
Split data4bert module into multiple processors
Refactor BERT tokenizer
Add multilabel support in BertTrainer
Add multilabel support in BertEvaluator
Add get_test_samples method in dataset processors
Fix args.py for BERT
Add support for Reuters, IMDB datasets for BERT

* Fix package imports * Update README.md * Fix bug due to TAR/AR attribute check * Add BERT models * Add BERT tokenizer * Return logits from the model.py * Remove unused classes in models/bert * Return logits from the model.py (#12) * Remove unused classes in models/bert (#13) * Add initial main file * Add args for BERT * Add partial support for BERT * Initialize training and optimization * Draft the structure of Trainers for BERT * Remove duplicate tokenizer * Add utils * Move optimization to utils * Add more structure for trainer * Refactor the trainer (#15) * Refactor the trainer * Add more edits * Add support for our datasets * Add evaluator * Split data4bert module into multiple processors * Refactor BERT tokenizer * Integrate BERT into Castor framework (#17) * Remove unused classes in models/bert * Split data4bert module into multiple processors * Refactor BERT tokenizer * Add multilabel support in BertTrainer * Add multilabel support in BertEvaluator * Add get_test_samples method in dataset processors * Fix args.py for BERT * Add support for Reuters, IMDB datasets for BERT * Revert "Integrate BERT into Castor framework (#17)" This reverts commit e4244ec. * Fix paths to datasets in dataset classes and args * Add SST dataset * Add hedwig-data instructions to README.md * Fix KimCNN README * Fix RegLSTM README * Fix typos in README * Remove trec_eval from README * Add tensorboardX to requirements.txt * Rename processors module to bert_processors * Add method to print metrics after training * Add model check-pointing and early stopping for BERT * Add logos * Update README.md * Fix code comments in classification trainer * Add support for AAPD, Sogou, AGNews and Yelp2014 * Fix bug that deleted saved models * Update README for HAN * Update README for XML-CNN * Remove redundant TODOs from the READMEs * Fix logo in README.md * Update README for Char-CNN * Fix all the READMEs * Resolve conflict * Fix Typos * Re-Add SST2 Processor * Add support for evaluating trained model * Update args.py * Resolve issues due to DataParallel wrapper on saved model * Remove redundant Yelp processor * Fix bug for safely creating the saving directory * Change checkpoint paths to timestamps * Remove unwanted string.strip() from tokenizer * Create save path if it doesn't exist * Decouple model checkpoints from code * Remove model choice restrictions for BERT * Remove model/distill driver * Simplify checkpoint directory creation

Ashutosh-Adhikari · 2019-04-14T04:11:09Z

Why would we create a duplicate PR? Or is there something that I am missing?

achyudh · 2019-04-14T04:11:55Z

I cannot resolve conflicts and push directly to karkaroff. It has to be on a fork I have write access to.

Ashutosh-Adhikari · 2019-04-14T04:12:31Z

Don't worry, I have resolved the conflicts. Please close this duplitcate PR. Thanks

achyudh · 2019-04-14T04:14:34Z

Hmm, I don't see any changes in your original pull request yet

Ashutosh-Adhikari

LGTM.

achyudh and others added 3 commits April 13, 2019 23:25

Resolve conflicts in the dev fork

cb14201

Merge branch 'karkaroff-master'

8346514

achyudh requested a review from daemon April 14, 2019 04:10

achyudh assigned achyudh and Ashutosh-Adhikari Apr 14, 2019

Resolve merge conflicts in README.md

fff8e0a

Ashutosh-Adhikari requested review from Ashutosh-Adhikari and removed request for daemon April 14, 2019 04:27

Ashutosh-Adhikari approved these changes Apr 14, 2019

View reviewed changes

Ashutosh-Adhikari merged commit 7d24958 into castorini:master Apr 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate BERT into Hedwig #11

Integrate BERT into Hedwig #11

achyudh commented Apr 14, 2019

Ashutosh-Adhikari commented Apr 14, 2019 •

edited

Loading

achyudh commented Apr 14, 2019

Ashutosh-Adhikari commented Apr 14, 2019 •

edited

Loading

achyudh commented Apr 14, 2019

Ashutosh-Adhikari left a comment

Integrate BERT into Hedwig #11

Integrate BERT into Hedwig #11

Conversation

achyudh commented Apr 14, 2019

Changes:

Ashutosh-Adhikari commented Apr 14, 2019 • edited Loading

achyudh commented Apr 14, 2019

Ashutosh-Adhikari commented Apr 14, 2019 • edited Loading

achyudh commented Apr 14, 2019

Ashutosh-Adhikari left a comment

Choose a reason for hiding this comment

Ashutosh-Adhikari commented Apr 14, 2019 •

edited

Loading

Ashutosh-Adhikari commented Apr 14, 2019 •

edited

Loading