doc

Sergeshik · Jun 13, 2023 · cfe8769 · cfe8769
1 parent 2e3ce9e
commit cfe8769
Show file tree

Hide file tree

Showing 2 changed files with 21 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,14 @@
 # LongMem
 
+Official implementation of our paper "[Augmenting Language Models with Long-Term Memory](https://arxiv.org/abs//2306.07174)". 
+
+## Environment Setup 
+* torch: Please follow [torch official installation guide](https://pytorch.org/get-started/previous-versions/). We recommend torch>=1.8.0. Please select the torch-gpu version which is consistent with your cuda driver version.
+
+* Faiss-GPU: For Nvidia V100 GPUs, simply install via ``pip iinstall faiss-gpu``. For Nvidia A100 GPUs, please run ``conda install faiss-gpu cudatoolkit=11.0 -c pytorch``. The A100 GPU is not officially supported by faiss-gpu, sometimes it will lead to errors, you can refer to this git [issue](https://github.com/facebookresearch/faiss/issues/2064) of faiss for help.
+
+* fairseq: ``pip install --editable ./fairseq`` Then the revised `fairseq` and ohter packages will be installed. We strongly recommend you to use python 3.8 for stability.
+
 ## Project Strcture
 Pre-trained LLM Class (L24, E1024, Alibi POS_ENCODING): ``fairseq/fairseq/models/newgpt.py``
 
@@ -9,4 +18,15 @@ Transformer Language Model with SideNetwork Class: ``fairseq/fairseq/models/tran
 
 Memory Bank and Retrieval: ``fairseq/fairseq/modules/dynamic_memory_with_chunk.py``
 
-Joint Attention for Memory Fusion: ``fairseq/fairseq/modules/joint_multihead_attention_sum.py``
+Joint Attention for Memory Fusion: ``fairseq/fairseq/modules/joint_multihead_attention_sum.py``
+
+## Citation
+Please cite our paper if you find this repository helpful in your research:
+```
+@article{LongMem,
+  title={Augmenting Language Models with Long-Term Memory},
+  author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},
+  journal={arXiv preprint arXiv:2306.07174},
+  year={2023}
+}
+```
diff --git a/fairseq/fairseq/models/newgpt.py b/fairseq/fairseq/models/newgpt.py
@@ -9,7 +9,6 @@
 from typing import Dict, List, Optional
 
 from .hf_newgpt import NewGPTConfig, NewGPTForCausalLM
-from fairseq.modules.dynamic_memory_memtrm import External_Memory
 
 import torch
 import torch.nn as nn
@@ -68,9 +67,6 @@ def build_model(cls, args, task):
         if args.gpt_model_path != "":
             state = checkpoint_utils.load_checkpoint_to_cpu(args.gpt_model_path)
             model.load_state_dict(state["model"], strict=True, args=args)
-        if getattr(args, "use_external_memory", False):
-            model.decoder.model.transformer.h[args.retrieval_layer_index].attn.memory_bias = nn.Parameter(torch.zeros(args.num_attention_heads))
-
 
         return model
 
@@ -100,16 +96,6 @@ def __init__(self, args, task):
         self.pad_idx = task.target_dictionary.pad()
         self.model.transformer.wte.weight.data[self.pad_idx].zero_()
 
-        self.external_memory = External_Memory(args) if getattr(args, "use_external_memory", False) else None
-        if getattr(args, "use_external_memory", False):
-            for i in range(args.num_layers//2):
-                for name, param in self.model.transformer.h[i*2].named_parameters():
-                    param.requires_grad = False
-            self.model.transformer.wte.weight.requires_grad = False
-            self.model.lm_head.weight.requires_grad = False
-            self.model.lm_head.bias.requires_grad = False
-            # self.model.transformer.ln_f.weight.requires_grad = False
-
     def forward(
         self,
         prev_output_tokens,