openvinotoolkit
diff --git a/‎.github/workflows/token_merging.yml
+39 b/‎.github/workflows/token_merging.yml
+39
diff --git a/‎.gitignore
+2 b/‎.gitignore
+2
diff --git a/‎README.md
+1 b/‎README.md
+1
diff --git a/‎modules/token_merging/README.md
+61 b/‎modules/token_merging/README.md
+61
diff --git a/‎modules/token_merging/demo.ipynb
+131 b/‎modules/token_merging/demo.ipynb
+131
diff --git a/‎modules/token_merging/setup.py
+23 b/‎modules/token_merging/setup.py
+23
diff --git a/‎modules/token_merging/tests/test_precommit.py
+89 b/‎modules/token_merging/tests/test_precommit.py
+89
diff --git a/‎modules/token_merging/tomeov/__init__.py
+23 b/‎modules/token_merging/tomeov/__init__.py
+23
@@ -0,0 +1,39 @@
+name: Token Merging - Test
+
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  Precommit:
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: [3.8]
+
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v2
+    - name: Setup Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Create and start a virtual environment
+      run: |
+        python -m venv venv
+        source venv/bin/activate
+    - name: Install dependencies
+      run: |
+        source venv/bin/activate
+        python -m pip install --upgrade pip
+        pip install modules/token_merging/[tests]
+    - name: Run test
+      run: |
+        source venv/bin/activate
+        python -m pytest modules/token_merging/tests/
@@ -3,3 +3,5 @@
 
 **/*.png
 **/*.jar
+
+__pycache__/
@@ -13,6 +13,7 @@ This list gives an overview of all modules available inside the contrib reposito
 * [**java_api**](./modules/java_api): Inference Engine Java API -- provides Java wrappers for Inference Engine public API.
 * [**Azure Video Analyzer**](./modules/ovms_ai_extension/): Azure Video Analyzer Extension -- enables exchange of video frames and inference results between [Azure Video Analyzer (AVA)](https://docs.microsoft.com/en-us/azure/azure-video-analyzer/video-analyzer-docs/overview) and OpenVINO™ Model Server.
 * [**custom_operations**](./modules/custom_operations/): Collection of Custom Operations -- implement Custom Operations with OpenVINO Extensibility Mechanism.
+* [**Token Merging**](./modules/token_merging/): adaptation of [Token Merging method](https://arxiv.org/abs/2210.09461) for OpenVINO.
 
 ## How to build OpenVINO with extra modules
 You can build OpenVINO, so it will include the modules from this repository. Contrib modules are under constant development and it is recommended to use them alongside the master branch or latest releases of OpenVINO.
 
@@ -0,0 +1,61 @@
+# Token Merging for Stable Diffusion running with OpenVINO
+
+This is an OpenVINO adopted version of Token Merging method. The method is applied to PyTorch model before exporting to OpenVINO representation. It can be also stacked with 8-bit quantization to achieve a higher inference speed. 
+The repository contains implementation for:
+- Stable Diffusion (HF Diffusers based models), see [example](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino/stable-diffusion).
+- OpenCLIP, see [example](https://github.com/AlexKoff88/open_clip/blob/openvino_alt/tutorials/openvino/openvino_tome.ipynb).
+- Timm
+
+
+Here are the results for 100 iteration of 512x512 image generation on CPU.
+![ToMe for SD applied on a 512x512 image.](examples/assets/tome_results.png)
+
+This is the official implementation of **ToMe for SD** from the paper:  
+**[Token Merging for Fast Stable Diffusion](https://arxiv.org/abs/2303.17604)**
+
+ToMe for SD is an extension of the original **ToMe**:  
+**[Token Merging: Your ViT but Faster](https://arxiv.org/abs/2210.09461)**  
+
+
+**Note:** This also supports most downstream UIs that use these repositories.
+
+## Installation
+
+ToMe for SD requires ``pytorch >= 1.12.1`` (for `scatter_reduce`), which you can get from [here](https://pytorch.org/get-started/locally/). Then after installing your choice of stable diffusion environment ([supported environments](#supported-environments)), use the corresponding python environment to install ToMe for SD:
+
+```bash
+pip install git+https://github.com/openvinotoolkit/openvino_contrib.git#egg=tomeov&subdirectory=modules/token_merging
+```
+
+## Usage
+* Diffusers:
+```py
+import torch, tomeov
+from diffusers import StableDiffusionPipeline
+
+pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
+
+save_dir = "stable_diffusion_optimized"
+# Apply ToMe with a 30% merging ratio
+tomeov.patch_stable_diffusion(pipe, ratio=0.3) # Can also use pipe.unet in place of pipe here
+```
+* OpenCLIP:
+```py
+import torch, tomeov
+import open_clip
+from open_clip import tokenizer
+
+model, _, preprocess = open_clip.create_model_and_transforms("ViT-B-16-plus-240", pretrained="laion400m_e32")
+
+tomeov.patch_openclip(model, 8) # 8 - number of tokens merged in each MHSA from top down
+```
+* Timm:
+```py
+import torch, tomeov
+import timm
+
+model_name = 'vit_tiny_patch16_224'
+model = timm.create_model(model_name, pretrained=True)
+
+tomeov.patch_timm(model, 4) # 8 - number of tokens merged in each MHSA from top down
+```
@@ -0,0 +1,131 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Token Merging for Stable Diffusion running with OpenVINO demo\n",
+    "This notebook demonstrates how to use Token Merging method to accelerate Stable Diffusion model running with OpenVINO. The method is applied to PyTorch model before exporting to OpenVINO representation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tomeov\n",
+    "from diffusers import StableDiffusionPipeline, DDPMScheduler\n",
+    "from diffusers.training_utils import set_seed\n",
+    "from optimum.intel.openvino import OVStableDiffusionPipeline\n",
+    "from IPython.display import display"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scheduler = DDPMScheduler(beta_start=0.00085, beta_end=0.012,\n",
+    "                          beta_schedule=\"scaled_linear\", num_train_timesteps=1000)\n",
+    "pipe = StableDiffusionPipeline.from_pretrained(\"runwayml/stable-diffusion-v1-5\", scheduler=scheduler)\n",
+    "pipe.safety_checker = lambda images, clip_input: (images, False)\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* Create a pipiline with Token Merging applied to a Stable Diffusion model and export it to OpenVINO representation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Apply ToMe with a 30% merging ratio\n",
+    "tomeov.patch_stable_diffusion(pipe, ratio=0.3) # Can also use pipe.unet in place of pipe here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "save_dir = \"stable_diffusion_optimized\"\n",
+    "tomeov.export_diffusion_pipeline(pipe, save_dir)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* Create OpenVINO-based pipeline. We fix image size for faster inference."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "set_seed(42)\n",
+    "ov_pipe = OVStableDiffusionPipeline.from_pretrained(save_dir, compile=False)\n",
+    "ov_pipe.reshape(batch_size=1, height=512, width=512, num_images_per_prompt=1)\n",
+    "ov_pipe.compile()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* Generate and display the image."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "set_seed(42)\n",
+    "output = ov_pipe(prompt, num_inference_steps=50, output_type=\"pil\")\n",
+    "display(output.images[0])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.8.10 ('stable_diffusion')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "7918409a64d3d4275e0103fc4443d9be5863d1df136c02ed032407c7ae821339"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,23 @@
+# Copyright (C) 2018-2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+from setuptools import find_packages, setup
+
+EXTRAS_REQUIRE = {
+    "tests": ["onnx", "onnxruntime", "accelerate", "diffusers", "openvino", "optimum", "optimum-intel", "open-clip-torch","timm", "pytest"],
+}
+
+setup(
+    name="tomeov",
+    version="0.1.0",
+    author="Alexander Kozlov",
+    url="https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/token_merging",
+    description="Token Merging for OpenVINO",
+    install_requires=["torch~=1.13.1", "torchvision~=0.14.1"],
+    dependency_links=["https://download.pytorch.org/whl/cpu"],
+    extras_require=EXTRAS_REQUIRE,
+    packages=find_packages(exclude=("examples", "build")),
+    license = 'Apache 2.0',
+    long_description=open("README.md", "r", encoding="utf-8").read(),
+    long_description_content_type="text/markdown",
+)
@@ -0,0 +1,89 @@
+# Copyright (C) 2018-2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import tempfile
+import unittest
+import os
+
+import numpy as np
+from PIL import Image
+import torch
+import openvino.runtime as ov
+
+import tomeov
+from diffusers import StableDiffusionPipeline, DDPMScheduler
+from optimum.intel.openvino import OVStableDiffusionPipeline
+import open_clip
+import timm
+
+
+class TokenMergingIntegrationTest(unittest.TestCase):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.OV_DIFFUSION_MODEL_ID = "hf-internal-testing/tiny-stable-diffusion-torch"
+        self.OPENCLIP_MODEL = ("ViT-B-32", "laion400m_e32")
+        self.TIMM_MODEL = "vit_tiny_patch16_224"
+        
+    def test_stable_diffusion(self):
+        loaded_pipeline = StableDiffusionPipeline.from_pretrained(self.OV_DIFFUSION_MODEL_ID)
+        prompt = "sailing ship in storm by Leonardo da Vinci"
+        height = 128
+        width = 128
+
+        tomeov.patch_stable_diffusion(loaded_pipeline, ratio=0.3)
+        
+        with tempfile.TemporaryDirectory() as tmpdirname:
+            tomeov.export_diffusion_pipeline(loaded_pipeline, tmpdirname)
+            ov_pipe = OVStableDiffusionPipeline.from_pretrained(tmpdirname, compile=False)
+            ov_pipe.reshape(batch_size=1, height=height, width=width, num_images_per_prompt=1)
+            ov_pipe.compile()
+            ov_pipe(prompt, num_inference_steps=1, height=height, width=width, output_type="np").images
+            
+    def test_openclip(self):
+        model, _, transform = open_clip.create_model_and_transforms(self.OPENCLIP_MODEL[0], pretrained=self.OPENCLIP_MODEL[1])
+        tomeov.patch_openclip(model, 8)
+        dummy_image = np.random.rand(100, 100, 3) * 255
+        dummy_image = Image.fromarray(dummy_image.astype("uint8"))
+        dummy_image = transform(dummy_image).unsqueeze(0)
+        
+        with tempfile.TemporaryDirectory(suffix = ".onnx") as tmpdirname:
+            model_file = os.path.join(tmpdirname, "image_encoder.onnx")
+            torch.onnx.export(
+                model.visual,
+                dummy_image,
+                model_file,
+                opset_version=14,
+                input_names=["image"],
+                output_names=["image_embedding"], 
+                dynamic_axes={ 
+                    "image": {0: "batch"},
+                    "image_embedding": {0: "batch"},
+                }
+            )
+            compiled_model = ov.compile_model(model_file)
+            self.assertTrue(compiled_model)
+            
+    def test_timm(self):
+        model = timm.create_model(self.TIMM_MODEL, pretrained=False)
+
+        tomeov.patch_timm(model, 4) # 8 - number of tokens merged in each MHSA from top down
+        
+        dummy_image = torch.rand(1, 3, 224, 224)
+        
+        with tempfile.TemporaryDirectory(suffix = ".onnx") as tmpdirname:
+            model_file = os.path.join(tmpdirname, "model.onnx")
+            torch.onnx.export(
+                model,
+                dummy_image,
+                model_file,
+                opset_version=14,
+                input_names=["image"],
+                output_names=["output"], 
+                dynamic_axes={ 
+                    "image": {0: "batch"},
+                    "output": {0: "batch"},
+                }
+            )
+            compiled_model = ov.compile_model(model_file)
+            self.assertTrue(compiled_model)
+        
@@ -0,0 +1,23 @@
+# Copyright (C) 2018-2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+from .import_utils import (
+    is_diffusers_available,
+    is_openclip_available,
+    is_timm_available,
+)
+
+__all__ = []
+
+if is_diffusers_available():  
+    from .stable_diffusion import patch_stable_diffusion
+    from .utils import export_diffusion_pipeline 
+    __all__ += ["patch_stable_diffusion", "export_diffusion_pipeline"]
+
+if is_openclip_available():  
+    from .openclip import patch_openclip
+    __all__ += ["patch_openclip"]
+    
+if is_timm_available():  
+    from .timm import patch_timm
+    __all__ += ["patch_timm"]