Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM #45

Closed
IT-five opened this issue Dec 9, 2023 · 3 comments
Closed

OOM #45

IT-five opened this issue Dec 9, 2023 · 3 comments

Comments

@IT-five
Copy link

IT-five commented Dec 9, 2023

我想比较直接外推、插值、NTK的效果,于是我在pred.py中将截断部分的逻辑删除,于是在seq_len=21000的长度时,我的A800报出了OOM的错误,这是正常的吗,请问应该怎么解决,我没用xformer

@bys0318
Copy link
Member

bys0318 commented Dec 10, 2023

您好!baichuan的modeling代码里目前似乎用xops.memory_efficient_attention来启用flash-attention,不用flash-attention的话显存占用会随长度二次方增长,20k的长度下推理确实可能会OOM。建议您通过from xformers import ops as xops来启用flash-attention。

@IT-five
Copy link
Author

IT-five commented Dec 12, 2023

您好!baichuan的modeling代码里目前似乎用xops.memory_efficient_attention来启用flash-attention,不用flash-attention的话显存占用会随长度二次方增长,20k的长度下推理确实可能会OOM。建议您通过from xformers import ops as xops来启用flash-attention。

感谢您的回答,可是我在如下环境中安装xformers一直会帮我重新安装torch,然后重装失败。具体应该怎么做?

bitsandbytes                  0.41.1
open-clip-torch               2.20.0
peft                          0.5.0
pytorch-lightning             1.7.7
pytorch-metric-learning       2.3.0
pytorch-wavelets              1.3.0
pytorch-wpe                   0.0.1
pytorch3d                     0.7.4
rotary-embedding-torch        0.3.0
sentencepiece                 0.1.99
taming-transformers-rom1504   0.0.6
torch                         2.0.1+cu118
torch-complex                 0.4.3
torch-scatter                 2.1.1
torchaudio                    2.0.2+cu118
torchmetrics                  0.11.4
torchsummary                  1.5.1
torchvision                   0.15.2+cu118
transformers                  4.34.1
transformers-stream-generator 0.0.4
DEPRECATION: pytorch-lightning 1.7.7 has a non-standard dependency specifier torch>=1.9.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063
Installing collected packages: triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch, xformers
  Attempting uninstall: triton
    Found existing installation: triton 2.0.0
    Uninstalling triton-2.0.0:
      Successfully uninstalled triton-2.0.0
  Attempting uninstall: torch
    Found existing installation: torch 2.0.1+cu118
    Uninstalling torch-2.0.1+cu118:
      Successfully uninstalled torch-2.0.1+cu118
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fairseq 0.12.2 requires hydra-core<1.1,>=1.0.7, but you have hydra-core 1.3.2 which is incompatible.
fairseq 0.12.2 requires omegaconf<2.1, but you have omegaconf 2.3.0 which is incompatible.
fastai 2.7.12 requires torch<2.1,>=1.7, but you have torch 2.1.1 which is incompatible.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.1.1 which is incompatible.
torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.1.1 which is incompatible.
wenetruntime 1.11.0 requires torch==1.11.0, but you have torch 2.1.1 which is incompatible.
Successfully installed nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 torch-2.1.1 triton-2.1.0 xformers-0.0.23
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@dsw-5143-9979c9f66-knjrl:/mnt/workspace/extend_length/test# python
Python 3.8.16 (default, Jun 12 2023, 18:09:05) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

@bys0318
Copy link
Member

bys0318 commented Dec 12, 2023

您遇到的似乎是一些package之间的冲突问题,您可以试试去xformer的github下询问

@bys0318 bys0318 closed this as completed Dec 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants