Skip to content
View sori424's full-sized avatar
🏋️‍♂️
Focusing
🏋️‍♂️
Focusing

Block or report sori424

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A library for mechanistic interpretability of GPT-style language models

Python 1,956 349 Updated Mar 13, 2025

Locating and editing factual associations in GPT (NeurIPS 2022)

Python 607 135 Updated Apr 20, 2024

Utility for behavioral and representational analyses of Language Models

Python 131 32 Updated Mar 12, 2025

Stanford NLP Python library for Representation Finetuning (ReFT)

Python 1,443 124 Updated Feb 6, 2025

PAIR.withgoogle.com and friend's work on interpretability methods

JavaScript 170 31 Updated Feb 11, 2025

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,758 190 Updated Aug 17, 2024

Structured Text Generation

Python 11,065 571 Updated Mar 15, 2025

✨Fast Coreference Resolution in spaCy with Neural Networks

C 2,866 478 Updated Apr 13, 2023

Public repo with code and dataset for Textual time travel project

Lua 4 Updated Nov 8, 2021

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 513 39 Updated Jan 28, 2025

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Jupyter Notebook 518 45 Updated Mar 12, 2025

Interpretability for sequence generation models 🐛 🔍

Python 408 37 Updated Nov 10, 2024

Public repository for "Think Twice: Perspective-Taking Improves Large Language Models’ Theory-of-Mind Capabilities".

Python 17 2 Updated Aug 16, 2023

Stanford Open Information Extraction made simple!

Python 651 103 Updated Jan 11, 2024
Jupyter Notebook 28 5 Updated Jul 16, 2023
Python 437 46 Updated Jul 19, 2024

Using sparse coding to find distributed representations used by neural networks.

Jupyter Notebook 220 30 Updated Nov 10, 2023

[ICML 2024] Language Models Represent Beliefs of Self and Others

Python 31 1 Updated Sep 26, 2024

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

Jupyter Notebook 2,130 262 Updated Mar 10, 2025

Function Vectors in Large Language Models (ICLR 2024)

Python 144 33 Updated Oct 11, 2024

A unified interface for computing surprisal (log probabilities) from language models! Supports neural, symbolic, and black-box API models.

Python 37 8 Updated Dec 17, 2024

Code for the paper "Neural Metaphor Detection in Context".

Python 59 22 Updated Jun 19, 2023
Jupyter Notebook 4 1 Updated Jul 12, 2024

Probing and Generalization of Metaphorical Knowledge in Pre-Trained Language Modelss[ACL 2022]

Python 21 4 Updated May 15, 2022

Machine Theory of Mind Reading List. Built upon EMNLP Findings 2023 Paper: Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

120 7 Updated Feb 18, 2025

Inspecting and Editing Knowledge Representations in Language Models

Python 112 5 Updated Jul 24, 2023

Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to Instru…

Jupyter Notebook 48 10 Updated Mar 9, 2024

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Python 4,597 479 Updated Jan 8, 2024

Evaluating the Moral Beliefs Encoded in LLMs

Python 24 3 Updated Dec 17, 2024

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Python 3,090 412 Updated Jul 25, 2024
Next