Code for https://arxiv.org/abs/2410.02472
Files:
base_classifier.py
experiments with comparing meta-models to just feeding the text to a meta-model and asking the questiondata.py
all the data. lots of duplicated code hereelicit_activations.py
get activations from a finetuned input-modelfinetune2.py
finetune an input-model LoRAhftrain.py
train a meta-modelincontext.py
short experiment to create a meta-model fron in-context examples (unsuccessful so far)make_main_figure.py
makes the main figuremake_question_ablations.py
makes the question ablations ablation figurephi2_meta_model.py
the meta-model code