conText provides a fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021).
install.packages("conText")
To use conText you will need three objects:
- A (quanteda) corpus with the documents and corresponding document variables you want to evaluate.
- A set of (GloVe) pre-trained embeddings.
- A transformation matrix specific to the pre-trained embeddings.
conText includes sample objects for all three but keep in mind these are just meant to illustrate function implementations. In this Dropbox folder we have included the raw versions of these objects including the full Stanford GloVe 300-dimensional embeddings (labeled glove.rds) and its corresponding transformation matrix estimated by Khodak et al. (2018) (labeled khodakA.rds).
Check out this Quick Start Guide to get going with conText
(last updated: 08/04/2023).
We are hugely thankful to Will Hobbs and Breanna Green for bringing to our attention clear examples where finite sample bias was larger than we had anticipated when implementing our main estimation routine, conText
. We are actively collaborating with them to evaluate alternative fixes. In the meantime we've implemented and recommend using Jackknife debiasing. Please refer to the Finite Sample Bias vignette for additional information on the issue and simulation results using various debiasing methods.
For those working in languages other than English, we have a set of data and code resources here: https://alcembeddings.org/