This repository contains data and scripts pertaining to work done by Avijit Thawani while at Northeastern University (summer 2018) under the guidance of Dr. Byron C. Wallace (College of Computer and Information Science, Northeastern University, Boston, MA).
Thawani A. Paul M J. Sarkar U. Wallace B C.
Are Online Reviews of Physicians Biased Against Female Providers?
In Proceedings of Machine Learning Research. 106:1-17, 2019.
The paper was presented at MLHC 2019 (Machine Learning for Healthcare) Conference, Ann Arbor, Michigan. Here's a poster summarizing our work, slides from the talk and a video presentation for the same.
Please cite us and mail me at [email protected] for feedback, errors, ideas for future work, or just to say Hi!
The easiest way to explore our project, without installing or downloading anything. Just copy our RateMDs/
folder from Google Drive link and give its access to this Google Colab Notebook.
processed_1.csv*
: clean data containing 37646 reviews and ratings along withspecialty
andgender
.ratemds.model
: Pretrained word embedding (gensim) model. Scripts to play are in thescripts/
folder as well as on Google Colab.
raw data
: parsed HTML files from RateMDs.comunclean.csv
: id, review, physician specialty, physician gender, physician name, document labelprocessed_1.csv
: review id, physician id, physician specialty, physician gender, rating staff, rating punctuality, rating helpfulness, rating knowledgeability, review text (tokenized)all_Github.csv
: physician_id.review_id, physician_id, physician name, physician specialty, physician gender, rating staff, rating punctuality, rating helpfulness, rating knowledgeability, review textscripts
: Jupyter Notebooks to reproduce our results (corresponding section from the paper in parantheses):
clean.ipynb
: Data preprocessing (Section 2.1)regression.ipynb
: Rating Analysis (Section 2.2)LR.ipynb
: Lexical Regression (Section 2.3.1)match.ipynb
: Embeddings (Section 2.3.2)
Avijit Thawani, University of Southern California (work done when interning at Northeastern in Summer 2018).
Michael J. Paul, University of Colorado Boulder.
Urmimala Sarkar, University of California San Francisco.
Byron C. Wallace, Northeastern University.