Activity

Attempt to use symbolic for MatMul axis tracking, blocked by insuffic…

slyalincreated use_symbolics_for_selected_tokens_fusion • a435a33 •

on May 27, 2024

Fuse selected token indices pruning into the model

slyalincreated fuse_selected_token_indices • a7e119e •

on May 26, 2024

Flattening of input_ids and position_ids. Adopt to a new signature in…

slyalinpushed 1 commit to new_paged_attention • da7211c…b237fdf •

on May 15, 2024

WIP: migrate to the new PA

slyalincreated new_paged_attention • da7211c •

on May 13, 2024

Fixed dimenstions in HW dependent part of PA transformation

slyalinpushed 1 commit to paged_attention_transformation • 642f87a…0bf2880 •

on May 7, 2024

Deduce the number of KV heads and head_size from the model without re…

slyalinpushed 4 commits to paged_attention_transformation • 3193fa8…642f87a •

on May 6, 2024

Merge remote-tracking branch 'lavrenov/openvino-model-executor' into …

slyalincreated paged_attention_transformation • 3193fa8 •

on Apr 22, 2024

Enabled int8 weights by default for performnace benchmarking purposes

slyalincreated int8_enabled_by_default • 4931727 •

on Apr 19, 2024

Merge branch 'openvino-model-executor' into transformers_4_39

slyalinpushed 51 commits to transformers_4_39 • 56bf5b1…fc6302a •

on Apr 17, 2024

Describe weights comression option in the documentation

slyalinpushed 1 commit to disable_int8 • 786c6e5…02a108a •

on Apr 16, 2024

Merge remote-tracking branch 'lavrenov/openvino-model-executor' into …

slyalinpushed 7 commits to disable_int8 • 0acb46c…786c6e5 •

on Apr 15, 2024

Transformers 4.39

slyalincreated for_health_check • 4b2ec6e •

on Apr 4, 2024

Disable weight compression on optimum-intel path if model is being co…

slyalincreated disable_int8 • 0acb46c •

on Apr 4, 2024

Update Dockerfile.openvino

ilya-lavrenovpushed 1 commit to paged_attention_in_openvino • b691a92…c790199 •

on Apr 4, 2024

Merged from remote

slyalincreated transform_fixes • aa002b6 •

on Apr 3, 2024

Removed op factory and dependency on openvino_contrib.

slyalincreated paged_attention_in_openvino • b691a92 •

on Apr 3, 2024

Update vllm/model_executor/openvino_model_loader.py

slyalinpushed 1 commit to extended_optimum_transform_plus • e09a0f1…d073339 •

on Apr 1, 2024

Merge remote-tracking branch 'lavrenov/openvino-model-executor' into …

slyalincreated extended_optimum_transform_plus • e09a0f1 •

on Apr 1, 2024

Update vllm/model_executor/openvino_model_loader.py

slyalinpushed 1 commit to load_from_ir • 9dd49cb…06cf403 •

on Mar 26, 2024

Load from OpenVINO IR if it exists in model id directory

slyalincreated load_from_ir • 9dd49cb •

on Mar 26, 2024

Upgrade transformers to 4.39

slyalincreated transformers_4_39 • 56bf5b1 •

on Mar 22, 2024

Passthrough trust_remote_code

slyalincreated trust_remote_code_passthrough • d617241 •

on Mar 22, 2024

Reverted accidental change

slyalinpushed 2 commits to python_op • 15eaeec…e683a9a •

on Mar 21, 2024

Removed part of the debug output

slyalinpushed 1 commit to python_op • 14b1dc2…15eaeec •

on Mar 21, 2024

vLLM modeling works with PythonOp, less debug output, no explicit add…

slyalinpushed 8 commits to python_op • 6d87df2…14b1dc2 •

on Mar 21, 2024

Env var control for optimum-intel switch. Debug output in evaluate.

slyalinpushed 6 commits to python_op • 5c21861…6d87df2 •

on Mar 21, 2024

Using stub-like Python Op for PagedAttentionExtension instead of real…

slyalincreated python_op • 5c21861 •

on Mar 19, 2024

Enabled optimum-intel path, fixed recent regression with int32/int64 …

slyalincreated optimum_models_after_reorg • b98f5ba •

on Mar 18, 2024

Merge pull request #5 from slyalin/fixed_parameter_types

Pull request merge

ilya-lavrenovpushed 6 commits to openvino • 8a9862f…30605c8 •

on Mar 8, 2024

Set position_ids name for optimum-intel based modeling. Fix for model…

slyalinpushed 2 commits to fixed_parameter_types • 0d2ba62…504704c •

on Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to use symbolic for MatMul axis tracking, blocked by insuffic…

Fuse selected token indices pruning into the model

Flattening of input_ids and position_ids. Adopt to a new signature in…

WIP: migrate to the new PA

Fixed dimenstions in HW dependent part of PA transformation

Deduce the number of KV heads and head_size from the model without re…

Merge remote-tracking branch 'lavrenov/openvino-model-executor' into …

Enabled int8 weights by default for performnace benchmarking purposes

Merge branch 'openvino-model-executor' into transformers_4_39

Describe weights comression option in the documentation

Merge remote-tracking branch 'lavrenov/openvino-model-executor' into …

Transformers 4.39

Disable weight compression on optimum-intel path if model is being co…

Update Dockerfile.openvino

Merged from remote

Removed op factory and dependency on openvino_contrib.

Update vllm/model_executor/openvino_model_loader.py

Merge remote-tracking branch 'lavrenov/openvino-model-executor' into …

Update vllm/model_executor/openvino_model_loader.py

Load from OpenVINO IR if it exists in model id directory

Upgrade transformers to 4.39

Passthrough trust_remote_code

Reverted accidental change

Removed part of the debug output

vLLM modeling works with PythonOp, less debug output, no explicit add…

Env var control for optimum-intel switch. Debug output in evaluate.

Using stub-like Python Op for PagedAttentionExtension instead of real…

Enabled optimum-intel path, fixed recent regression with int32/int64 …

Merge pull request #5 from slyalin/fixed_parameter_types

Set position_ids name for optimum-intel based modeling. Fix for model…