Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probability tensor contains either inf, nan or element < 0 #657

Open
alvaropastor7 opened this issue Nov 26, 2024 · 2 comments
Open

probability tensor contains either inf, nan or element < 0 #657

alvaropastor7 opened this issue Nov 26, 2024 · 2 comments

Comments

@alvaropastor7
Copy link

alvaropastor7 commented Nov 26, 2024

Hi
Im trying to do inference on a awq quantized model and im constantly getting this error when trying to generate text.
Im using Qwen2.5-72B-Instruct-AWQ.
Some code to give context:

    self._model = AutoAWQForCausalLM.from_pretrained(
            model_name,
            device_map="auto",
            token=hf_token,
            attn_implementation="flash_attention_2",
            torch_dtype = torch.float16
        )

    # Cargar el tokenizer
    self._tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    token=hf_token
    )

def _call(self, prompt: str, **kwargs) -> str:
    # Parámetros predeterminados para generación
    max_new_tokens = kwargs.get("max_new_tokens", 400)
    temperature = kwargs.get("temperature", 1)
    top_k = kwargs.get("top_k", 30)

    model = self.load_model()
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Codificar el prompt
    inputs = self._tokenizer(prompt, return_tensors="pt").to(device)
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]

    # Parámetros para la generación
    generation_params = {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "max_new_tokens": max_new_tokens,
        "temperature": temperature,
        "do_sample": True,
        "top_k": top_k,
        "eos_token_id": self._tokenizer.eos_token_id,
        "pad_token_id": self._tokenizer.eos_token_id,
        "top_p": 0.95,
        "repetition_penalty": 1.2,
    }

    # Generar texto
    outputs = model.generate(**generation_params)

    # Decodificar la salida
    generated_text = self._tokenizer.decode(
        outputs[0],
        skip_special_tokens=True
    )
    return generated_text[len(prompt):].strip()

Thanks :)

@casper-hansen
Copy link
Owner

Please use from_quantized instead of from_pretrained

@dcdmm
Copy link

dcdmm commented Dec 4, 2024

请使用from_quantized而不是from_pretrained

The official AWQ model can be called directly by the transformer library. How to do this?

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-7B-Instruct-AWQ"

model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Their model is 1.6G, but my quantized model is only 1.1G

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants