How to use CPU inference? #210

tanbw · 2025-01-25T15:08:36Z

i tried:
model = AutoModel.from_pretrained("/app/models/Qwen2.5-7B-Instruct", device='cpu')
but it do not work.

Error Logs:
saved layers already found in /app/models/Qwen2.5-7B-Instruct/splitted_model
either BetterTransformer or attn_implementation='sdpa' is available, creating model directly
either BetterTransformer or attn_implementation='sdpa' is available, creating model directly
running layers(cpu): 3%|█▍ | 1/31 [00:01<00:56, 1.88s/it]
Traceback (most recent call last):
File "/app/test.py", line 25, in
generation_output = model.generate(
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 2255, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/utils.py", line 3254, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/airllm/airllm_base.py", line 369, in call
return self.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/airllm/airllm_base.py", line 569, in forward
new_seq = layer(seq, **kwargs)[0]
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 259, in forward
hidden_states, self_attn_weights = self.self_attn(
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 165, in forward
cos, sin = position_embeddings
^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object

Here is the code:
`
from airllm import AutoModel
import torch

MAX_LENGTH = 128

model = AutoModel.from_pretrained("/app/models/Qwen2.5-7B-Instruct", device='cpu')

input_text = ['What is the capital of China?',]
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH)
generation_output = model.generate(
input_tokens['input_ids'],
max_new_tokens=5,
use_cache=True,
return_dict_in_generate=True)
model.tokenizer.decode(generation_output.sequences[0])
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use CPU inference? #210

How to use CPU inference? #210

tanbw commented Jan 25, 2025 •

edited

Loading

How to use CPU inference? #210

How to use CPU inference? #210

Comments

tanbw commented Jan 25, 2025 • edited Loading

tanbw commented Jan 25, 2025 •

edited

Loading