Update default LayerwiseScheduler behavior (#3291)

alexsu52 · web-flow · commit 5c75e22c2888 · 2025-03-04T10:26:07.000+04:00
### Changes `add_additional_outputs = True` is set in `LayerwiseScheduler` by default. It's mean that additional output nodes will be added that were visited to remove inputs to target nodes. ### Reason for changes Memory and speed optimization. More details: Calibration dataset: wikitext-2-raw-v1 ``` compressed_model = nncf.compress_weights( model, dataset=quantization_dataset, mode=nncf.CompressWeightsMode.INT4_SYM, gptq=True, subset_size=128, ratio=1, ) ``` 1. sequence length > 128 - Reduce memory footprint in 2.71 times. - 1.16x compression time speed up. Model | Accuracy (lambada_openai) Develop | Accuracy (lambada_openai) Branch | Compression time Develop (sec.) | Compression time Branch (sec.) | Compression time speed up | Peak Memory Develop (MiB) | Peak Memory Branch (MiB) | Memory reduction -- | -- | -- | -- | -- | -- | -- | -- | -- facebook/opt-125m | аcc: 0.3369 perplexity: 32.2038 | аcc: 0.3369 perplexity: 32.2038 | 219 | 204 | 1,07x | 5575 | 2341 | 2,38x TinyLlama/TinyLlama-1.1B-Chat-v1.0 | аcc: 0.5956 perplexity: 6.7443 | аcc: 0.5956 perplexity: 6.7443 | 1889 | 1620 | 1,17x | 25980 | 8413 | 3,09x microsoft/Phi-3.5-mini-instruct | аcc: 0.6328 perplexity: 5.6143 | аcc: 0.6328 perplexity: 5.6143 | 7994 | 6334 | 1,26x | 61273 | 22968 | 2,67x 2. sequence length = 32 - Reduce memory footprint in 1.42 times. - 1.07x compression time speed up. Model | Compression time Develop (sec.) | Compression time Branch (sec.) | Compression time speed up | Peak Memory Develop (MiB) | Peak Memory Branch (MiB) | Memory reduction -- | -- | -- | -- | -- | -- | -- facebook/opt-125m | 192 | 183 | 1,05 | 2205 | 1695 | 1,30 TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 1596 | 1482 | 1,08 | 9247 | 6182 | 1,50 microsoft/Phi-3.5-mini-instruct | 6724 | 6196 | 1,09 | 25924 | 17765 | 1,46 Memory profiler for TinyLlama/TinyLlama-1.1B-Chat-v1.0: Develop: ![image](https://github.com/user-attachments/assets/2a819b19-89c3-434e-8751-74911a41348e) Branch: ![image](https://github.com/user-attachments/assets/501652fb-6865-4484-9cc6-149d0899d9a8) Memory profiler for microsoft/Phi-3.5-mini-instruct: Develop: ![image](https://github.com/user-attachments/assets/ba64e5ce-36eb-4b57-a35b-9199bfcc4e44) Branch: ![image](https://github.com/user-attachments/assets/4f742016-0ceb-42f7-bd04-53f7d112b789) ### Related tickets ref: 153732 ### Tests NNCF/job/manual/job/post_training_weight_compression/329/
diff --git a/nncf/quantization/algorithms/layerwise/scheduler.py b/nncf/quantization/algorithms/layerwise/scheduler.py
@@ -82,9 +82,10 @@ class LayerwiseScheduler:
     7. Repeat the process until all target nodes have been processed.
     """
 
-    def __init__(self, add_additional_outputs: bool = False):
+    def __init__(self, add_additional_outputs: bool = True):
         """
-        :param strategy: The strategy to use for scheduling.
+        :param add_additional_outputs: If True (default), includes additional output nodes that were visited
+            to remove inputs to target nodes.
         """
         self.add_additional_outputs = add_additional_outputs