huggingface
diff --git a/‎.github/workflows/dev_test_benckmark.yml
+1 b/‎.github/workflows/dev_test_benckmark.yml
+1
diff --git a/‎.github/workflows/test_benckmark.yml
+1 b/‎.github/workflows/test_benckmark.yml
+1
diff --git a/‎README.md
+18-202 b/‎README.md
+18-202
diff --git a/‎optimum/exporters/onnx/base.py
+3 b/‎optimum/exporters/onnx/base.py
+3
diff --git a/‎optimum/exporters/onnx/convert.py
+4-2 b/‎optimum/exporters/onnx/convert.py
+4-2
@@ -22,6 +22,7 @@ jobs:
         python-version: ${{ matrix.python-version }}
     - name: Install dependencies
       run: |
+        pip install -U setuptools
         pip install wheel
         pip install .[tests,onnxruntime,benchmark] datasets
         pip install -U git+https://github.com/huggingface/evaluate
 
@@ -29,6 +29,7 @@ jobs:
           python-version: ${{ matrix.python-version }}
       - name: Install dependencies
         run: |
+          pip install -U setuptools
           pip install wheel
           pip install .[tests,onnxruntime,benchmark] datasets
       - name: Test with unittest
 
@@ -52,76 +52,6 @@ python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/op
 
 The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.
 
-### Features summary
-
-| Features                           | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|
-|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|
-| Graph optimization                 | :heavy_check_mark: | N/A                | :heavy_check_mark: | N/A                |
-| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A                | :heavy_check_mark: |
-| Post-training static quantization  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
-| Quantization Aware Training (QAT)  | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |
-| FP16 (half precision)              | :heavy_check_mark: | N/A                | :heavy_check_mark: | :heavy_check_mark: |
-| Pruning                            | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |
-| Knowledge Distillation             | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |
-
-
-### OpenVINO
-
-Before you begin, make sure you have all the necessary libraries installed :
-
-```bash
-pip install --upgrade --upgrade-strategy eager optimum[openvino]
-```
-
-It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO format easily:
-
-```bash
-optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
-```
-
-If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/export) for more detail. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/intel/openvino/optimization#static-quantization).
-
-To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.
-
-```diff
-- from transformers import AutoModelForSequenceClassification
-+ from optimum.intel import OVModelForSequenceClassification
-  from transformers import AutoTokenizer, pipeline
-
-  model_id = "distilbert-base-uncased-finetuned-sst-2-english"
-  tokenizer = AutoTokenizer.from_pretrained(model_id)
-- model = AutoModelForSequenceClassification.from_pretrained(model_id)
-+ model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
-
-  classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
-  results = classifier("He's a dreadful magician.")
-```
-
-You can find more examples in the [documentation](https://huggingface.co/docs/optimum/main/intel/openvino/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).
-
-### Neural Compressor
-
-Before you begin, make sure you have all the necessary libraries installed :
-
-```bash
-pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
-```
-
-Dynamic quantization can be applied on your model:
-
-```bash
-optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
-```
-
-To load a model quantized with Intel Neural Compressor, hosted locally or on the 🤗 hub, you can do as follows :
-```python
-from optimum.intel import INCModelForSequenceClassification
-
-model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
-model = INCModelForSequenceClassification.from_pretrained(model_id)
-```
-
-You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).
 
 ### ONNX + ONNX Runtime
 
@@ -131,43 +61,11 @@ Before you begin, make sure you have all the necessary libraries installed :
 pip install optimum[exporters,onnxruntime]
 ```
 
-It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:
-
-```plain
-optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx
-```
-
-The model can then be quantized using `onnxruntime`:
-
-```bash
-optimum-cli onnxruntime quantize \
-  --avx512 \
-  --onnx_model roberta_base_qa_onnx \
-  -o quantized_roberta_base_qa_onnx
-```
-
-These commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).
+It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily.
 
 For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).
 
-#### Run the exported model using ONNX Runtime
-
-Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:
-
-```diff
-- from transformers import AutoModelForQuestionAnswering
-+ from optimum.onnxruntime import ORTModelForQuestionAnswering
-  from transformers import AutoTokenizer, pipeline
-
-  model_id = "deepset/roberta-base-squad2"
-  tokenizer = AutoTokenizer.from_pretrained(model_id)
-- model = AutoModelForQuestionAnswering.from_pretrained(model_id)
-+ model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
-  qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
-  question = "What's Optimum?"
-  context = "Optimum is an awesome library everyone should use!"
-  results = qa_pipe(question=question, context=context)
-```
+Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend.
 
 More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).
 
@@ -179,14 +77,21 @@ Before you begin, make sure you have all the necessary libraries installed :
 pip install optimum[exporters-tf]
 ```
 
-Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:
+Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them.
+You can find more information in our [documentation](https://huggingface.co/docs/optimum/main/exporters/tflite/usage_guides/export_a_model).
 
-```plain
-optimum-cli export tflite \
-  -m deepset/roberta-base-squad2 \
-  --sequence_length 384  \
-  --quantize int8-dynamic roberta_tflite_model
-```
+### Intel (OpenVINO + Neural Compressor + IPEX)
+
+Before you begin, make sure you have all the necessary [libraries installed](https://huggingface.co/docs/optimum/main/en/intel/installation).
+
+You can find more information on the different integration in our [documentation](https://huggingface.co/docs/optimum/main/en/intel/index) and in the examples of [`optimum-intel`](https://github.com/huggingface/optimum-intel).
+
+
+### Quanto
+
+[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the `optimum-cli`.
+
+You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
 
 ## Accelerated training
 
@@ -205,37 +110,7 @@ Before you begin, make sure you have all the necessary libraries installed :
 pip install --upgrade --upgrade-strategy eager optimum[habana]
 ```
 
-```diff
-- from transformers import Trainer, TrainingArguments
-+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments
-
-  # Download a pretrained model from the Hub
-  model = AutoModelForXxx.from_pretrained("bert-base-uncased")
-
-  # Define the training arguments
-- training_args = TrainingArguments(
-+ training_args = GaudiTrainingArguments(
-      output_dir="path/to/save/folder/",
-+     use_habana=True,
-+     use_lazy_mode=True,
-+     gaudi_config_name="Habana/bert-base-uncased",
-      ...
-  )
-
-  # Initialize the trainer
-- trainer = Trainer(
-+ trainer = GaudiTrainer(
-      model=model,
-      args=training_args,
-      train_dataset=train_dataset,
-      ...
-  )
-
-  # Use Habana Gaudi processor for training!
-  trainer.train()
-```
-
-You can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
+You can find examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).
 
 ### ONNX Runtime
 
@@ -246,63 +121,4 @@ Before you begin, make sure you have all the necessary libraries installed :
 pip install optimum[onnxruntime-training]
 ```
 
-```diff
-- from transformers import Trainer, TrainingArguments
-+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments
-
-  # Download a pretrained model from the Hub
-  model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
-
-  # Define the training arguments
-- training_args = TrainingArguments(
-+ training_args = ORTTrainingArguments(
-      output_dir="path/to/save/folder/",
-      optim="adamw_ort_fused",
-      ...
-  )
-
-  # Create a ONNX Runtime Trainer
-- trainer = Trainer(
-+ trainer = ORTTrainer(
-      model=model,
-      args=training_args,
-      train_dataset=train_dataset,
-      ...
-  )
-
-  # Use ONNX Runtime for training!
-  trainer.train()
-```
-
-You can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).
-
-
-### Quanto
-
-[Quanto](https://github.com/huggingface/optimum-quanto) is a pytorch quantization backend.
-
-You can quantize a model either using the python API or the `optimum-cli`.
-
-```python
-from transformers import AutoModelForCausalLM
-from optimum.quanto import QuantizedModelForCausalLM, qint4
-
-model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3.1-8B')
-qmodel = QuantizedModelForCausalLM.quantize(model, weights=qint4, exclude='lm_head')
-```
-
-The quantized model can be saved using `save_pretrained`:
-
-```python
-qmodel.save_pretrained('./Llama-3.1-8B-quantized')
-```
-
-It can later be reloaded using `from_pretrained`:
-
-```python
-from optimum.quanto import QuantizedModelForCausalLM
-
-qmodel = QuantizedModelForCausalLM.from_pretrained('Llama-3.1-8B-quantized')
-```
-
-You can see more details and [examples](https://github.com/huggingface/optimum-quanto/tree/main/examples) in the [Quanto](https://github.com/huggingface/optimum-quanto) repository.
+You can find examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).
@@ -159,6 +159,9 @@ class OnnxConfig(ExportConfig, ABC):
         "image-to-image": OrderedDict(
             {"reconstruction": {0: "batch_size", 1: "num_channels", 2: "height", 3: "width"}}
         ),
+        "keypoint-detection": OrderedDict(
+            {"heatmaps": {0: "batch_size", 1: "num_keypoints", 2: "height", 3: "width"}}
+        ),
         "mask-generation": OrderedDict({"logits": {0: "batch_size"}}),
         "masked-im": OrderedDict(
             {"reconstruction" if is_transformers_version(">=", "4.29.0") else "logits": {0: "batch_size"}}
 
@@ -513,8 +513,10 @@ def export_pytorch(
 
     model_kwargs = model_kwargs or {}
     # num_logits_to_keep was added in transformers 4.45 and isn't added as inputs when exporting the model
-    if is_transformers_version(">=", "4.44.99") and "num_logits_to_keep" in signature(model.forward).parameters.keys():
-        model_kwargs["num_logits_to_keep"] = 0
+    if is_transformers_version(">=", "4.45"):
+        logits_to_keep_name = "logits_to_keep" if is_transformers_version(">=", "4.49") else "num_logits_to_keep"
+        if logits_to_keep_name in signature(model.forward).parameters.keys():
+            model_kwargs[logits_to_keep_name] = 0
 
     with torch.no_grad():
         model.config.return_dict = True