Update quantized_generation_demo.ipynb #715

dnoliver · 2024-05-16T16:54:45Z

Fix version of torch to 2.0.1.
See pytorch/pytorch#125109

What does this PR do?

Running this notebook with current pytorch version fails.
Test Platform: Windows 11 - Intel Core Ultra 7 165HL
Error Message: OSError: [WinError 126] ... Error loading "C:\Users\xyz\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll" or one of its dependencies.
Applying the recommendation in pytorch/pytorch#125109, and rolling back the version to 2.0.1

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Fix version of torch to 2.0.1. See pytorch/pytorch#125109

eaidova · 2024-05-17T08:00:14Z

@dnoliver is it really necessary to use such old pytorch? As I can see in issue, downgrade on 2.2.2 should be enough

dnoliver · 2024-05-17T16:17:17Z

Will try with 2.2.2 and update you!

Bump version of torch to 2.2.2

dnoliver · 2024-05-17T16:32:02Z

Torch 2.2.2 works. I tested with CPU (we are having problems with GPU and NPU drivers in the target machine right now, but will be the next test target)

dnoliver · 2024-05-17T16:36:21Z

Run into an error in further cells:

The stack trace is:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[10], line 7
      4 # Tokenize the sample
      5 inputs = tokenizer([sample], return_tensors='pt')    
----> 7 out = stateless_model.generate(
      8     **inputs,
      9     max_new_tokens=128,
     10     streamer=TextStreamer(tokenizer=tokenizer, skip_special_tokens=True),
     11     pad_token_id=tokenizer.eos_token_id,
     12     prompt_lookup_num_tokens=3,
     13 )    

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\transformers\generation\utils.py:1559, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   1549     candidate_generator = self._get_candidate_generator(
   1550         generation_config=generation_config,
   1551         input_ids=input_ids,
   (...)
   1555         model_kwargs=model_kwargs,
   1556     )
   1558     # 12. run assisted generate
-> 1559     result = self._assisted_decoding(
   1560         input_ids,
   1561         candidate_generator=candidate_generator,
   1562         do_sample=generation_config.do_sample,
   1563         logits_processor=prepared_logits_processor,
   1564         logits_warper=self._get_logits_warper(generation_config) if generation_config.do_sample else None,
   1565         stopping_criteria=prepared_stopping_criteria,
   1566         pad_token_id=generation_config.pad_token_id,
   1567         output_scores=generation_config.output_scores,
   1568         output_logits=generation_config.output_logits,
   1569         return_dict_in_generate=generation_config.return_dict_in_generate,
   1570         synced_gpus=synced_gpus,
   1571         streamer=streamer,
   1572         **model_kwargs,
   1573     )
   1574 if generation_mode == GenerationMode.GREEDY_SEARCH:
   1575     # 11. run greedy search
   1576     result = self._greedy_search(
   1577         input_ids,
   1578         logits_processor=prepared_logits_processor,
   (...)
   1586         **model_kwargs,
   1587     )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\transformers\generation\utils.py:4684, in GenerationMixin._assisted_decoding(self, input_ids, candidate_generator, do_sample, logits_processor, logits_warper, stopping_criteria, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, output_logits, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
   4681     model_inputs["num_logits_to_keep"] = candidate_length + 1
   4683 # 2.2. Run a forward pass on the candidate sequence
-> 4684 outputs = self(
   4685     **model_inputs,
   4686     output_attentions=output_attentions,
   4687     output_hidden_states=output_hidden_states,
   4688 )
   4690 # 2.3. Process the new logits
   4691 new_logits = outputs.logits[:, -candidate_length - 1 :]  # excludes the input prompt if present

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\optimum\modeling_base.py:92, in OptimizedModel.__call__(self, *args, **kwargs)
     91 def __call__(self, *args, **kwargs):
---> 92     return self.forward(*args, **kwargs)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\optimum\intel\openvino\modeling_decoder.py:466, in OVModelForCausalLM.forward(self, input_ids, attention_mask, past_key_values, position_ids, **kwargs)
    464 # Run inference
    465 self.request.start_async(inputs, share_inputs=True)
--> 466 self.request.wait()
    467 logits = torch.from_numpy(self.request.get_tensor("logits").data).to(self.device)
    468 if self.stateful:
    469     # Need a marker to differentiate the first generate iteration from the others in
    470     # the first condition at the function beginning above.
    471     # It should be something that is not None and it should be True when converted to Boolean.

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Exception from ..\pyopenvino/core/infer_request.hpp:54:
Caught exception: Exception from src\plugins\intel_cpu\src\node.cpp:1620:
Shape inference of Select node with name __module.model/aten::masked_fill/Select_1 failed: Exception from src\plugins\intel_cpu\src\shape_inference\custom\eltwise.cpp:45:
Eltwise shape infer input shapes dim index: 3 mismatch

Is this related/unrelated to the change I am making on the torch version?

HuggingFaceDocBuilderDev · 2024-05-23T14:34:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dnoliver · 2024-05-23T18:35:53Z

@eaidova we were using this notebook to validate a Meteor Lake system that we later found that it had bad drivers for NPU and GPU.

We have validated it in another device, with good drivers, Windows 11, works fine:

eaidova · 2024-05-24T04:40:49Z

@dnoliver great, I also tested your notebook, it is also working for me. Should we put some message about proper driver version requirements to install (e.g. refer on this page for NPU https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-npu.html)?

dnoliver · 2024-05-24T16:37:19Z

This is how a healthy system looks like:

Display Adapters => Intel(R) Arc(TM) Graphics is healthy
Neural Processors => Intel(R) AI Boost is healthy

And this is how an unhealthy system looks like:

Both devices have a warning sign, and reporting error 43 in the Device Status.

The referenced documentation for the NPU is not accurate for Windows. In the docs it says "the NPU is most likely listed in “Other devices” as “Multimedia Video Controller.”", but it is in "Neural Processors". Other than that, installing latest drivers' versions should address most issues

Update quantized_generation_demo.ipynb

3dfd73d

Fix version of torch to 2.0.1. See pytorch/pytorch#125109

Update quantized_generation_demo.ipynb

64753d6

Bump version of torch to 2.2.2

echarlaix approved these changes Jun 3, 2024

View reviewed changes

echarlaix merged commit 6529306 into huggingface:main Jun 3, 2024
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update quantized_generation_demo.ipynb #715

Update quantized_generation_demo.ipynb #715

dnoliver commented May 16, 2024 •

edited

Loading

eaidova commented May 17, 2024 •

edited

Loading

dnoliver commented May 17, 2024

dnoliver commented May 17, 2024

dnoliver commented May 17, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 23, 2024

dnoliver commented May 23, 2024

eaidova commented May 24, 2024

dnoliver commented May 24, 2024

Update quantized_generation_demo.ipynb #715

Update quantized_generation_demo.ipynb #715

Conversation

dnoliver commented May 16, 2024 • edited Loading

What does this PR do?

Before submitting

eaidova commented May 17, 2024 • edited Loading

dnoliver commented May 17, 2024

dnoliver commented May 17, 2024

dnoliver commented May 17, 2024 • edited Loading

HuggingFaceDocBuilderDev commented May 23, 2024

dnoliver commented May 23, 2024

eaidova commented May 24, 2024

dnoliver commented May 24, 2024

dnoliver commented May 16, 2024 •

edited

Loading

eaidova commented May 17, 2024 •

edited

Loading

dnoliver commented May 17, 2024 •

edited

Loading