Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for running on T4 GPU #3

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

DmitryKey
Copy link

Hello,

Thanks for open-sourcing the rStar!

I gave it a spin on Google Collab with T4 GPU (15G GPU memory) and had to make several changes, included in this PR. Hopefully these would be useful for those who try to run the code.

The following changes had to be made to meet the hardware requirements, but I'm not committing them in the code, as specific hardware may vary.

  1. scripts/run_gsm8k_generator.sh: add the parameter --half_precision so that the file looks like this:
CUDA_VISIBLE_DEVICES=0 python run_src/do_generate.py \
    --dataset_name GSM8K \
    --test_json_filename test_all \
    --model_ckpt mistralai/Mistral-7B-v0.1 \
    --note default \
    --num_rollouts 16 \
    --half_precision
  1. In models/vLLM_API.py change the swap space from 16 to 4 to avoid "Too large swap space": in LLM class add swap_space=4, so the full file looks like this:
# Licensed under the MIT license.

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import numpy as np
import math


def load_vLLM_model(model_ckpt, seed, tensor_parallel_size=1, half_precision=False, max_num_seqs=256):
    tokenizer = AutoTokenizer.from_pretrained(model_ckpt)

    if half_precision:
        llm = LLM(
            model=model_ckpt,
            dtype="half",
            tensor_parallel_size=tensor_parallel_size,
            seed=seed,
            trust_remote_code=True,
            max_num_seqs=max_num_seqs,
            swap_space=4,
        )
    else:
        llm = LLM(
            model=model_ckpt,
            tensor_parallel_size=tensor_parallel_size,
            seed=seed,
            trust_remote_code=True,
            max_num_seqs=max_num_seqs,
            swap_space=4,
        )

    return tokenizer, llm


def generate_with_vLLM_model(
    model,
    input,
    temperature=0.8,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1,
    n=1,
    max_tokens=256,
    logprobs=1,
    stop=[],
):
    sampling_params = SamplingParams(
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        repetition_penalty=repetition_penalty,
        n=n,
        logprobs=logprobs,
        max_tokens=max_tokens,
        stop=stop,
    )

    output = model.generate(input, sampling_params, use_tqdm=False)
    return output


if __name__ == "__main__":
    model_ckpt = "mistralai/Mistral-7B-v0.1"
    tokenizer, model = load_vLLM_model(model_ckpt, seed=42, tensor_parallel_size=1, half_precision=False)
    input = "What is the meaning of life?"
    output = generate_with_vLLM_model(model, input)
    breakpoint()
    print(output[0].outputs[0].text)
  1. export the following variable to help with CUDA running out of memory:
!export PYTORCH_CUDA_ALLOC_CONF=expandable_segments
  1. The full run of the script still fails on T4, because of lack of GPU memory:
Loading model weights took 13.4966 GB
...
[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 241.06 MiB is free. Process 122360 has 14.51 GiB memory in use. Of the allocated memory 14.38 GiB is allocated by PyTorch, and 4.61 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Added missing mode parameter
Added requirements.txt as seen on Google Collab T4 GPU runtime
added a list of missing packages
@VictoryLoveJessica
Copy link

Hello, did you encounter this error message when running the code?
image

@DmitryKey
Copy link
Author

@VictoryLoveJessica yes, original code would produce this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants