AnishaUdayakumar
diff --git a/‎ai_ref_kits/adventure_gaming_to_life/README.md ‎ai_ref_kits/multimodal_ai_visual_generator/README.md
+69-69 b/‎ai_ref_kits/adventure_gaming_to_life/README.md ‎ai_ref_kits/multimodal_ai_visual_generator/README.md
+69-69
diff --git a/‎ai_ref_kits/adventure_gaming_to_life/WIP_download_and_optimize_depth_anything_v2_model.py ‎ai_ref_kits/multimodal_ai_visual_generator/WIP_download_and_optimize_depth_anything_v2_model.py
+64-64 b/‎ai_ref_kits/adventure_gaming_to_life/WIP_download_and_optimize_depth_anything_v2_model.py ‎ai_ref_kits/multimodal_ai_visual_generator/WIP_download_and_optimize_depth_anything_v2_model.py
+64-64
diff --git a/‎ai_ref_kits/adventure_gaming_to_life/assets/audio_server.png ‎ai_ref_kits/multimodal_ai_visual_generator/assets/audio_server.png b/‎ai_ref_kits/adventure_gaming_to_life/assets/audio_server.png ‎ai_ref_kits/multimodal_ai_visual_generator/assets/audio_server.png
diff --git a/‎ai_ref_kits/adventure_gaming_to_life/assets/backiee-86553-landscape-transparent.png ‎ai_ref_kits/multimodal_ai_visual_generator/assets/backiee-86553-landscape-transparent.png b/‎ai_ref_kits/adventure_gaming_to_life/assets/backiee-86553-landscape-transparent.png ‎ai_ref_kits/multimodal_ai_visual_generator/assets/backiee-86553-landscape-transparent.png
diff --git a/‎ai_ref_kits/adventure_gaming_to_life/assets/gradio_ui.png ‎ai_ref_kits/multimodal_ai_visual_generator/assets/gradio_ui.png b/‎ai_ref_kits/adventure_gaming_to_life/assets/gradio_ui.png ‎ai_ref_kits/multimodal_ai_visual_generator/assets/gradio_ui.png
diff --git a/‎ai_ref_kits/adventure_gaming_to_life/assets/image_opt.jpg ‎ai_ref_kits/multimodal_ai_visual_generator/assets/image_opt.jpg b/‎ai_ref_kits/adventure_gaming_to_life/assets/image_opt.jpg ‎ai_ref_kits/multimodal_ai_visual_generator/assets/image_opt.jpg
@@ -1,69 +1,69 @@
-# Bringing Adventure Gaming to Life 🧙 on AI PC with the OpenVINO™ toolkit 💻
-
-**Authors:** Arisha Kumar, Garth Long, Ria Cheruvu, Dmitriy Pastushenkov, Paula Ramos, Raymond Lo, Zhuo Wu
-
-**Contact (for questions):** Ria Cheruvu, Dmitriy Pastushenkov
-
-**Tested on:** Intel® Core™ Ultra 7 and 9 Processors
-
-## Pipeline
-![SIGGRAPH Drawing](https://github.com/user-attachments/assets/3ce58b50-4ee9-4dae-aeb6-0af5368a3ddd)
-
-## Installation
-
-1. Clone this repository to get started
-
-2. Download and optimize required models
-	- Nano-Llava (MultiModal) - Image Recognition/Captioning from Webcam 
-	- Whisper - Speech Recognition
-	- Llama3-8b-instruct - Prompt Refinement
-	- AI Superesolution - Increase res of generated image
-	- Latent Consistency Models - Generating Image
-  	- Depth Anything v2 - Create 3d parallax animations
-    
-	```
-    	python -m venv model_installation_venv
-	model_installation_venv\Scripts\activate
-	pip install -r python3.12_requirements_model_installation.txt
-	python download_and_prepare_models.py
-    ``` 
-	After model installation, you can remove the virtual environment as it isn't needed anymore.
-
-
-3. Create a virtual env and install the required python packages. Your requirements.txt file will depend on the Python version you're using (3.11 or 3.12) <br>
-    ```
-    	python -m venv dnd_env
-	dnd_env\Scripts\activate
-	pip install -r requirements.txt 
-	pip install "openai-whisper==20231117" --extra-index-url https://download.pytorch.org/whl/cpu
-
-    ``` 
-4. To interact with the animated GIF outputs, you will need to host a simple web server on your system as the final output a player will see. To do so, please install Node.js via [its Download page](https://nodejs.org/en/download/package-manager) and [http-server](https://www.npmjs.com/package/http-server).
-
-Run the following command to start an HTTP server within the repository. You can customize index.html with any additional elements as you'd like.
-```
-http-server
-``` 
-5. Open a terminal or you can use the existing one with dnd_env environment activated and start the Gradio GUI - <br>
-```
-python gradio_ui.py 
-```
-Click on the web link to open the GUI in the web browser <br>
-
-## How to Use 🛣️
-<img width="1270" alt="quick_demo_screenshot" src="https://github.com/user-attachments/assets/ddfea7f0-3f1d-4d1c-b356-3bc959a23837">
-
-### (Step 1 📷) Take a picture
-Take a picture via the Gradio image interface of any object you want! Your "theme" will become the image description, if the object in the image is clearly captured.
-### (Step 2 🗣️) Speak your prompt
-Start the recording, and wait till the server is listening to begin speaking your prompt to life. Click the Stop button to stop the generation.
-### (Step 3 ➕) Add theme to prompt
-Now, your prompt is transcribed! Click on the "Add Theme to Prompt" button to combine your prompt and theme.
-### (Step 4 ⚙️) Refine it with an LLM
-You can optionally ask an LLM model to refine your model by clicking on the LLM button. It will try its best to generate a prompt infusing the elements (although it does hallucinate at times).
-### (Step 5 🖼️) Generate your image (and depth map)
-Click "Generate Image" to see your image come to life! A depth map will automatically be generated for the image as well. Feel free to adjust the advanced parameters to control the image generation model!
-### (Step 6 🪄🖼️) Interact with the animated GIF
-To interact with the 3D hoverable animation created with depth maps, start a HTTP server as explained above, and you will be able to interact with the parallax.
-
-**Optionally:** Navigate over to *Advanced Parameters*, and set OCR to true and roll a die! 🎲 Take a snapshot using the Gradio Image interface. After recognizing the die, the system will try to output the correct die value, and a special location associated with the number you rolled (see locations.json for the list), to add a theme to your prompts. You can change this and the corresponding theme it sets.
+# Bringing Adventure Gaming to Life 🧙 on AI PC with the OpenVINO™ toolkit 💻
+
+**Authors:** Arisha Kumar, Garth Long, Ria Cheruvu, Dmitriy Pastushenkov, Paula Ramos, Raymond Lo, Zhuo Wu
+
+**Contact (for questions):** Ria Cheruvu, Dmitriy Pastushenkov
+
+**Tested on:** Intel® Core™ Ultra 7 and 9 Processors
+
+## Pipeline
+![SIGGRAPH Drawing](https://github.com/user-attachments/assets/3ce58b50-4ee9-4dae-aeb6-0af5368a3ddd)
+
+## Installation
+
+1. Clone this repository to get started
+
+2. Download and optimize required models
+	- Nano-Llava (MultiModal) - Image Recognition/Captioning from Webcam 
+	- Whisper - Speech Recognition
+	- Llama3-8b-instruct - Prompt Refinement
+	- AI Superesolution - Increase res of generated image
+	- Latent Consistency Models - Generating Image
+  	- Depth Anything v2 - Create 3d parallax animations
+    
+	```
+    	python -m venv model_installation_venv
+	model_installation_venv\Scripts\activate
+	pip install -r python3.12_requirements_model_installation.txt
+	python download_and_prepare_models.py
+    ``` 
+	After model installation, you can remove the virtual environment as it isn't needed anymore.
+
+
+3. Create a virtual env and install the required python packages. Your requirements.txt file will depend on the Python version you're using (3.11 or 3.12) <br>
+    ```
+    	python -m venv dnd_env
+	dnd_env\Scripts\activate
+	pip install -r requirements.txt 
+	pip install "openai-whisper==20231117" --extra-index-url https://download.pytorch.org/whl/cpu
+
+    ``` 
+4. To interact with the animated GIF outputs, you will need to host a simple web server on your system as the final output a player will see. To do so, please install Node.js via [its Download page](https://nodejs.org/en/download/package-manager) and [http-server](https://www.npmjs.com/package/http-server).
+
+Run the following command to start an HTTP server within the repository. You can customize index.html with any additional elements as you'd like.
+```
+http-server
+``` 
+5. Open a terminal or you can use the existing one with dnd_env environment activated and start the Gradio GUI - <br>
+```
+python gradio_ui.py 
+```
+Click on the web link to open the GUI in the web browser <br>
+
+## How to Use 🛣️
+<img width="1270" alt="quick_demo_screenshot" src="https://github.com/user-attachments/assets/ddfea7f0-3f1d-4d1c-b356-3bc959a23837">
+
+### (Step 1 📷) Take a picture
+Take a picture via the Gradio image interface of any object you want! Your "theme" will become the image description, if the object in the image is clearly captured.
+### (Step 2 🗣️) Speak your prompt
+Start the recording, and wait till the server is listening to begin speaking your prompt to life. Click the Stop button to stop the generation.
+### (Step 3 ➕) Add theme to prompt
+Now, your prompt is transcribed! Click on the "Add Theme to Prompt" button to combine your prompt and theme.
+### (Step 4 ⚙️) Refine it with an LLM
+You can optionally ask an LLM model to refine your model by clicking on the LLM button. It will try its best to generate a prompt infusing the elements (although it does hallucinate at times).
+### (Step 5 🖼️) Generate your image (and depth map)
+Click "Generate Image" to see your image come to life! A depth map will automatically be generated for the image as well. Feel free to adjust the advanced parameters to control the image generation model!
+### (Step 6 🪄🖼️) Interact with the animated GIF
+To interact with the 3D hoverable animation created with depth maps, start a HTTP server as explained above, and you will be able to interact with the parallax.
+
+**Optionally:** Navigate over to *Advanced Parameters*, and set OCR to true and roll a die! 🎲 Take a snapshot using the Gradio Image interface. After recognizing the die, the system will try to output the correct die value, and a special location associated with the number you rolled (see locations.json for the list), to add a theme to your prompts. You can change this and the corresponding theme it sets.
@@ -1,65 +1,65 @@
-import openvino as ov
-from huggingface_hub import hf_hub_download
-from depth_anything_v2.dpt import DepthAnythingV2
-import cv2
-import torch
-import torch.nn.functional as F
-import datasets
-import nncf
-
-parser = argparse.ArgumentParser()
-parser.add_argument("--model_dir", type=str, default="model", help="Directory to place the model in")
-args = parser.parse_args()
-model_local_dir = Path(args.model_dir + "/depth_anything_v2")
-
-encoder = "vits"
-model_type = "Small"
-model_id = f"depth_anything_v2_{encoder}"
-
-model_path = hf_hub_download(repo_id=f"{model_local_dir}/Depth-Anything-V2-{model_type}", filename=f"{model_id}.pth", repo_type="model")
-
-model = DepthAnythingV2(encoder=encoder, features=64, out_channels=[48, 96, 192, 384])
-model.load_state_dict(torch.load(model_path, map_location="cpu"))
-model.eval()
-
-OV_DEPTH_ANYTHING_PATH = Path(f"{model_local_dir}/{model_id}.xml")
-
-if not OV_DEPTH_ANYTHING_PATH.exists():
-    ov_model = ov.convert_model(model, example_input=torch.rand(1, 3, 518, 518), input=[1, 3, 518, 518])
-    ov.save_model(ov_model, OV_DEPTH_ANYTHING_PATH)
-
-# Fetch `skip_kernel_extension` module
-r = requests.get(
-    url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py",
-)
-open("skip_kernel_extension.py", "w").write(r.text)
-
-OV_DEPTH_ANYTHING_INT8_PATH = Path(f"{model_local_dir}/{model_id}_int8.xml")
-
-if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
-    subset_size = 300
-    calibration_data = []
-    dataset = datasets.load_dataset("Nahrawy/VIDIT-Depth-ControlNet", split="train", streaming=True).shuffle(seed=42).take(subset_size)
-    for batch in dataset:
-        image = np.array(batch["image"])[...,:3]
-        image = image / 255.0
-        image = transform({'image': image})['image']
-        image = np.expand_dims(image, 0)
-        calibration_data.append(image)
-
-if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
-    model = core.read_model(OV_DEPTH_ANYTHING_PATH)
-    quantized_model = nncf.quantize(
-        model=model,
-        subset_size=subset_size,
-        model_type=nncf.ModelType.TRANSFORMER,
-        calibration_dataset=nncf.Dataset(calibration_data),
-    )
-    ov.save_model(quantized_model, OV_DEPTH_ANYTHING_INT8_PATH)
-
-fp16_ir_model_size = OV_DEPTH_ANYTHING_PATH.with_suffix(".bin").stat().st_size / 2**20
-quantized_model_size = OV_DEPTH_ANYTHING_INT8_PATH.with_suffix(".bin").stat().st_size / 2**20
-
-print(f"FP16 model size: {fp16_ir_model_size:.2f} MB")
-print(f"INT8 model size: {quantized_model_size:.2f} MB")
+import openvino as ov
+from huggingface_hub import hf_hub_download
+from depth_anything_v2.dpt import DepthAnythingV2
+import cv2
+import torch
+import torch.nn.functional as F
+import datasets
+import nncf
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--model_dir", type=str, default="model", help="Directory to place the model in")
+args = parser.parse_args()
+model_local_dir = Path(args.model_dir + "/depth_anything_v2")
+
+encoder = "vits"
+model_type = "Small"
+model_id = f"depth_anything_v2_{encoder}"
+
+model_path = hf_hub_download(repo_id=f"{model_local_dir}/Depth-Anything-V2-{model_type}", filename=f"{model_id}.pth", repo_type="model")
+
+model = DepthAnythingV2(encoder=encoder, features=64, out_channels=[48, 96, 192, 384])
+model.load_state_dict(torch.load(model_path, map_location="cpu"))
+model.eval()
+
+OV_DEPTH_ANYTHING_PATH = Path(f"{model_local_dir}/{model_id}.xml")
+
+if not OV_DEPTH_ANYTHING_PATH.exists():
+    ov_model = ov.convert_model(model, example_input=torch.rand(1, 3, 518, 518), input=[1, 3, 518, 518])
+    ov.save_model(ov_model, OV_DEPTH_ANYTHING_PATH)
+
+# Fetch `skip_kernel_extension` module
+r = requests.get(
+    url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py",
+)
+open("skip_kernel_extension.py", "w").write(r.text)
+
+OV_DEPTH_ANYTHING_INT8_PATH = Path(f"{model_local_dir}/{model_id}_int8.xml")
+
+if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
+    subset_size = 300
+    calibration_data = []
+    dataset = datasets.load_dataset("Nahrawy/VIDIT-Depth-ControlNet", split="train", streaming=True).shuffle(seed=42).take(subset_size)
+    for batch in dataset:
+        image = np.array(batch["image"])[...,:3]
+        image = image / 255.0
+        image = transform({'image': image})['image']
+        image = np.expand_dims(image, 0)
+        calibration_data.append(image)
+
+if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
+    model = core.read_model(OV_DEPTH_ANYTHING_PATH)
+    quantized_model = nncf.quantize(
+        model=model,
+        subset_size=subset_size,
+        model_type=nncf.ModelType.TRANSFORMER,
+        calibration_dataset=nncf.Dataset(calibration_data),
+    )
+    ov.save_model(quantized_model, OV_DEPTH_ANYTHING_INT8_PATH)
+
+fp16_ir_model_size = OV_DEPTH_ANYTHING_PATH.with_suffix(".bin").stat().st_size / 2**20
+quantized_model_size = OV_DEPTH_ANYTHING_INT8_PATH.with_suffix(".bin").stat().st_size / 2**20
+
+print(f"FP16 model size: {fp16_ir_model_size:.2f} MB")
+print(f"INT8 model size: {quantized_model_size:.2f} MB")
 print(f"Model compression rate: {fp16_ir_model_size / quantized_model_size:.3f}")