Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update quantized_generation_demo.ipynb #715

Merged
merged 2 commits into from
Jun 3, 2024
Merged

Conversation

dnoliver
Copy link
Contributor

@dnoliver dnoliver commented May 16, 2024

Fix version of torch to 2.0.1.
See pytorch/pytorch#125109

What does this PR do?

Running this notebook with current pytorch version fails.
Test Platform: Windows 11 - Intel Core Ultra 7 165HL
Error Message: OSError: [WinError 126] ... Error loading "C:\Users\xyz\AppData\Roaming\Python\Python312\site-packages\torch\lib\shm.dll" or one of its dependencies.
Applying the recommendation in pytorch/pytorch#125109, and rolling back the version to 2.0.1

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@eaidova
Copy link
Collaborator

eaidova commented May 17, 2024

@dnoliver is it really necessary to use such old pytorch? As I can see in issue, downgrade on 2.2.2 should be enough

@dnoliver
Copy link
Contributor Author

Will try with 2.2.2 and update you!

Bump version of torch to 2.2.2
@dnoliver
Copy link
Contributor Author

Torch 2.2.2 works. I tested with CPU (we are having problems with GPU and NPU drivers in the target machine right now, but will be the next test target)

@dnoliver
Copy link
Contributor Author

dnoliver commented May 17, 2024

Run into an error in further cells:

image

The stack trace is:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[10], line 7
      4 # Tokenize the sample
      5 inputs = tokenizer([sample], return_tensors='pt')    
----> 7 out = stateless_model.generate(
      8     **inputs,
      9     max_new_tokens=128,
     10     streamer=TextStreamer(tokenizer=tokenizer, skip_special_tokens=True),
     11     pad_token_id=tokenizer.eos_token_id,
     12     prompt_lookup_num_tokens=3,
     13 )    

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\transformers\generation\utils.py:1559, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   1549     candidate_generator = self._get_candidate_generator(
   1550         generation_config=generation_config,
   1551         input_ids=input_ids,
   (...)
   1555         model_kwargs=model_kwargs,
   1556     )
   1558     # 12. run assisted generate
-> 1559     result = self._assisted_decoding(
   1560         input_ids,
   1561         candidate_generator=candidate_generator,
   1562         do_sample=generation_config.do_sample,
   1563         logits_processor=prepared_logits_processor,
   1564         logits_warper=self._get_logits_warper(generation_config) if generation_config.do_sample else None,
   1565         stopping_criteria=prepared_stopping_criteria,
   1566         pad_token_id=generation_config.pad_token_id,
   1567         output_scores=generation_config.output_scores,
   1568         output_logits=generation_config.output_logits,
   1569         return_dict_in_generate=generation_config.return_dict_in_generate,
   1570         synced_gpus=synced_gpus,
   1571         streamer=streamer,
   1572         **model_kwargs,
   1573     )
   1574 if generation_mode == GenerationMode.GREEDY_SEARCH:
   1575     # 11. run greedy search
   1576     result = self._greedy_search(
   1577         input_ids,
   1578         logits_processor=prepared_logits_processor,
   (...)
   1586         **model_kwargs,
   1587     )

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\transformers\generation\utils.py:4684, in GenerationMixin._assisted_decoding(self, input_ids, candidate_generator, do_sample, logits_processor, logits_warper, stopping_criteria, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, output_logits, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
   4681     model_inputs["num_logits_to_keep"] = candidate_length + 1
   4683 # 2.2. Run a forward pass on the candidate sequence
-> 4684 outputs = self(
   4685     **model_inputs,
   4686     output_attentions=output_attentions,
   4687     output_hidden_states=output_hidden_states,
   4688 )
   4690 # 2.3. Process the new logits
   4691 new_logits = outputs.logits[:, -candidate_length - 1 :]  # excludes the input prompt if present

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\optimum\modeling_base.py:92, in OptimizedModel.__call__(self, *args, **kwargs)
     91 def __call__(self, *args, **kwargs):
---> 92     return self.forward(*args, **kwargs)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\optimum\intel\openvino\modeling_decoder.py:466, in OVModelForCausalLM.forward(self, input_ids, attention_mask, past_key_values, position_ids, **kwargs)
    464 # Run inference
    465 self.request.start_async(inputs, share_inputs=True)
--> 466 self.request.wait()
    467 logits = torch.from_numpy(self.request.get_tensor("logits").data).to(self.device)
    468 if self.stateful:
    469     # Need a marker to differentiate the first generate iteration from the others in
    470     # the first condition at the function beginning above.
    471     # It should be something that is not None and it should be True when converted to Boolean.

RuntimeError: Exception from src\inference\src\cpp\infer_request.cpp:245:
Exception from ..\pyopenvino/core/infer_request.hpp:54:
Caught exception: Exception from src\plugins\intel_cpu\src\node.cpp:1620:
Shape inference of Select node with name __module.model/aten::masked_fill/Select_1 failed: Exception from src\plugins\intel_cpu\src\shape_inference\custom\eltwise.cpp:45:
Eltwise shape infer input shapes dim index: 3 mismatch

Is this related/unrelated to the change I am making on the torch version?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dnoliver
Copy link
Contributor Author

@eaidova we were using this notebook to validate a Meteor Lake system that we later found that it had bad drivers for NPU and GPU.

We have validated it in another device, with good drivers, Windows 11, works fine:

image

@eaidova
Copy link
Collaborator

eaidova commented May 24, 2024

@dnoliver great, I also tested your notebook, it is also working for me. Should we put some message about proper driver version requirements to install (e.g. refer on this page for NPU https://docs.openvino.ai/2024/get-started/configurations/configurations-intel-npu.html)?

@dnoliver
Copy link
Contributor Author

This is how a healthy system looks like:

  • Display Adapters => Intel(R) Arc(TM) Graphics is healthy
  • Neural Processors => Intel(R) AI Boost is healthy

image

And this is how an unhealthy system looks like:

  • Both devices have a warning sign, and reporting error 43 in the Device Status.

image

The referenced documentation for the NPU is not accurate for Windows. In the docs it says "the NPU is most likely listed in “Other devices” as “Multimedia Video Controller.”", but it is in "Neural Processors". Other than that, installing latest drivers' versions should address most issues

@echarlaix echarlaix merged commit 6529306 into huggingface:main Jun 3, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants