Skip to content

Commit

Permalink
doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Victorwz committed Jun 13, 2023
1 parent 2e3ce9e commit cfe8769
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 15 deletions.
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# LongMem

Official implementation of our paper "[Augmenting Language Models with Long-Term Memory](https://arxiv.org/abs//2306.07174)".

## Environment Setup
* torch: Please follow [torch official installation guide](https://pytorch.org/get-started/previous-versions/). We recommend torch>=1.8.0. Please select the torch-gpu version which is consistent with your cuda driver version.

* Faiss-GPU: For Nvidia V100 GPUs, simply install via ``pip iinstall faiss-gpu``. For Nvidia A100 GPUs, please run ``conda install faiss-gpu cudatoolkit=11.0 -c pytorch``. The A100 GPU is not officially supported by faiss-gpu, sometimes it will lead to errors, you can refer to this git [issue](https://github.com/facebookresearch/faiss/issues/2064) of faiss for help.

* fairseq: ``pip install --editable ./fairseq`` Then the revised `fairseq` and ohter packages will be installed. We strongly recommend you to use python 3.8 for stability.

## Project Strcture
Pre-trained LLM Class (L24, E1024, Alibi POS_ENCODING): ``fairseq/fairseq/models/newgpt.py``

Expand All @@ -9,4 +18,15 @@ Transformer Language Model with SideNetwork Class: ``fairseq/fairseq/models/tran

Memory Bank and Retrieval: ``fairseq/fairseq/modules/dynamic_memory_with_chunk.py``

Joint Attention for Memory Fusion: ``fairseq/fairseq/modules/joint_multihead_attention_sum.py``
Joint Attention for Memory Fusion: ``fairseq/fairseq/modules/joint_multihead_attention_sum.py``

## Citation
Please cite our paper if you find this repository helpful in your research:
```
@article{LongMem,
title={Augmenting Language Models with Long-Term Memory},
author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},
journal={arXiv preprint arXiv:2306.07174},
year={2023}
}
```
14 changes: 0 additions & 14 deletions fairseq/fairseq/models/newgpt.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
from typing import Dict, List, Optional

from .hf_newgpt import NewGPTConfig, NewGPTForCausalLM
from fairseq.modules.dynamic_memory_memtrm import External_Memory

import torch
import torch.nn as nn
Expand Down Expand Up @@ -68,9 +67,6 @@ def build_model(cls, args, task):
if args.gpt_model_path != "":
state = checkpoint_utils.load_checkpoint_to_cpu(args.gpt_model_path)
model.load_state_dict(state["model"], strict=True, args=args)
if getattr(args, "use_external_memory", False):
model.decoder.model.transformer.h[args.retrieval_layer_index].attn.memory_bias = nn.Parameter(torch.zeros(args.num_attention_heads))


return model

Expand Down Expand Up @@ -100,16 +96,6 @@ def __init__(self, args, task):
self.pad_idx = task.target_dictionary.pad()
self.model.transformer.wte.weight.data[self.pad_idx].zero_()

self.external_memory = External_Memory(args) if getattr(args, "use_external_memory", False) else None
if getattr(args, "use_external_memory", False):
for i in range(args.num_layers//2):
for name, param in self.model.transformer.h[i*2].named_parameters():
param.requires_grad = False
self.model.transformer.wte.weight.requires_grad = False
self.model.lm_head.weight.requires_grad = False
self.model.lm_head.bias.requires_grad = False
# self.model.transformer.ln_f.weight.requires_grad = False

def forward(
self,
prev_output_tokens,
Expand Down

0 comments on commit cfe8769

Please sign in to comment.