sori424

Follow

🏋️‍♂️

Focusing

Soyoung Oh sori424

🏋️‍♂️

Focusing

Follow

[email protected]

7 followers · 10 following

https://sori424.github.io/

Achievements

Achievements

Stars

TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models

Python 1,956 349 Updated Mar 13, 2025

kmeng01 / rome

Locating and editing factual associations in GPT (NeurIPS 2022)

Python 607 135 Updated Apr 20, 2024

kanishkamisra / minicons

Utility for behavioral and representational analyses of Language Models

Python 131 32 Updated Mar 12, 2025

stanfordnlp / pyreft

Stanford NLP Python library for Representation Finetuning (ReFT)

Python 1,443 124 Updated Feb 6, 2025

PAIR-code / interpretability

PAIR.withgoogle.com and friend's work on interpretability methods

JavaScript 170 31 Updated Feb 11, 2025

gkamradt / LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,758 190 Updated Aug 17, 2024

dottxt-ai / outlines

Structured Text Generation

Python 11,065 571 Updated Mar 15, 2025

huggingface / neuralcoref

✨Fast Coreference Resolution in spaCy with Neural Networks

C 2,866 478 Updated Apr 13, 2023

akshathaarodi / textual_timetravel_TOM

Public repo with code and dataset for Textual time travel project

Lua 4 Updated Nov 8, 2021

likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 513 39 Updated Jan 28, 2025

ndif-team / nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Jupyter Notebook 518 45 Updated Mar 12, 2025

inseq-team / inseq

Interpretability for sequence generation models 🐛 🔍

Python 408 37 Updated Nov 10, 2024

shawnsihyunlee / simulatedtom

Public repository for "Think Twice: Perspective-Taking Improves Large Language Models’ Theory-of-Mind Capabilities".

Python 17 2 Updated Aug 16, 2023

philipperemy / stanford-openie-python

Stanford Open Information Extraction made simple!

Python 651 103 Updated Jan 11, 2024

cicl-stanford / procedural-evals-tom

Jupyter Notebook 28 5 Updated Jul 16, 2023

openai / sparse_autoencoder

Python 437 46 Updated Jul 19, 2024

HoagyC / sparse_coding

Using sparse coding to find distributed representations used by neural networks.

Jupyter Notebook 220 30 Updated Nov 10, 2023

Walter0807 / RepBelief

[ICML 2024] Language Models Represent Beliefs of Self and Others

Python 31 1 Updated Sep 26, 2024

zjunlp / EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

Jupyter Notebook 2,130 262 Updated Mar 10, 2025

ericwtodd / function_vectors

Function Vectors in Large Language Models (ICLR 2024)

Python 144 33 Updated Oct 11, 2024

aalok-sathe / surprisal

A unified interface for computing surprisal (log probabilities) from language models! Supports neural, symbolic, and black-box API models.

Python 37 8 Updated Dec 17, 2024

gao-g / metaphor-in-context

Code for the paper "Neural Metaphor Detection in Context".

Python 59 22 Updated Jun 19, 2023

kuribayashi4 / llm-cognitive-modeling

Jupyter Notebook 4 1 Updated Jul 12, 2024

EhsanAghazadeh / Metaphors_in_PLMs

Probing and Generalization of Metaphorical Knowledge in Pre-Trained Language Modelss[ACL 2022]

Python 21 4 Updated May 15, 2022

Mars-tin / awesome-theory-of-mind

Machine Theory of Mind Reading List. Built upon EMNLP Findings 2023 Paper: Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

120 7 Updated Feb 18, 2025

evandez / REMEDI

Inspecting and Editing Knowledge Representations in Language Models

Python 112 5 Updated Jul 24, 2023

michaelnny / InstructLLaMA

Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to Instru…

Jupyter Notebook 48 10 Updated Mar 9, 2024

CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Python 4,597 479 Updated Jan 8, 2024

ninodimontalcino / moralchoice

Evaluating the Moral Beliefs Encoded in LLMs

Python 24 3 Updated Dec 17, 2024

QData / TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Python 3,090 412 Updated Jul 25, 2024