Skip to content

Commit 66595c3

Browse files
author
Ria Cheruvu
authored
Renaming edge AI ref kit (openvinotoolkit#22)
1 parent 22c0cc2 commit 66595c3

25 files changed

+1809
-1809
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,69 @@
1-
# Bringing Adventure Gaming to Life 🧙 on AI PC with the OpenVINO™ toolkit 💻
2-
3-
**Authors:** Arisha Kumar, Garth Long, Ria Cheruvu, Dmitriy Pastushenkov, Paula Ramos, Raymond Lo, Zhuo Wu
4-
5-
**Contact (for questions):** Ria Cheruvu, Dmitriy Pastushenkov
6-
7-
**Tested on:** Intel® Core™ Ultra 7 and 9 Processors
8-
9-
## Pipeline
10-
![SIGGRAPH Drawing](https://github.com/user-attachments/assets/3ce58b50-4ee9-4dae-aeb6-0af5368a3ddd)
11-
12-
## Installation
13-
14-
1. Clone this repository to get started
15-
16-
2. Download and optimize required models
17-
- Nano-Llava (MultiModal) - Image Recognition/Captioning from Webcam
18-
- Whisper - Speech Recognition
19-
- Llama3-8b-instruct - Prompt Refinement
20-
- AI Superesolution - Increase res of generated image
21-
- Latent Consistency Models - Generating Image
22-
- Depth Anything v2 - Create 3d parallax animations
23-
24-
```
25-
python -m venv model_installation_venv
26-
model_installation_venv\Scripts\activate
27-
pip install -r python3.12_requirements_model_installation.txt
28-
python download_and_prepare_models.py
29-
```
30-
After model installation, you can remove the virtual environment as it isn't needed anymore.
31-
32-
33-
3. Create a virtual env and install the required python packages. Your requirements.txt file will depend on the Python version you're using (3.11 or 3.12) <br>
34-
```
35-
python -m venv dnd_env
36-
dnd_env\Scripts\activate
37-
pip install -r requirements.txt
38-
pip install "openai-whisper==20231117" --extra-index-url https://download.pytorch.org/whl/cpu
39-
40-
```
41-
4. To interact with the animated GIF outputs, you will need to host a simple web server on your system as the final output a player will see. To do so, please install Node.js via [its Download page](https://nodejs.org/en/download/package-manager) and [http-server](https://www.npmjs.com/package/http-server).
42-
43-
Run the following command to start an HTTP server within the repository. You can customize index.html with any additional elements as you'd like.
44-
```
45-
http-server
46-
```
47-
5. Open a terminal or you can use the existing one with dnd_env environment activated and start the Gradio GUI - <br>
48-
```
49-
python gradio_ui.py
50-
```
51-
Click on the web link to open the GUI in the web browser <br>
52-
53-
## How to Use 🛣️
54-
<img width="1270" alt="quick_demo_screenshot" src="https://github.com/user-attachments/assets/ddfea7f0-3f1d-4d1c-b356-3bc959a23837">
55-
56-
### (Step 1 📷) Take a picture
57-
Take a picture via the Gradio image interface of any object you want! Your "theme" will become the image description, if the object in the image is clearly captured.
58-
### (Step 2 🗣️) Speak your prompt
59-
Start the recording, and wait till the server is listening to begin speaking your prompt to life. Click the Stop button to stop the generation.
60-
### (Step 3 ➕) Add theme to prompt
61-
Now, your prompt is transcribed! Click on the "Add Theme to Prompt" button to combine your prompt and theme.
62-
### (Step 4 ⚙️) Refine it with an LLM
63-
You can optionally ask an LLM model to refine your model by clicking on the LLM button. It will try its best to generate a prompt infusing the elements (although it does hallucinate at times).
64-
### (Step 5 🖼️) Generate your image (and depth map)
65-
Click "Generate Image" to see your image come to life! A depth map will automatically be generated for the image as well. Feel free to adjust the advanced parameters to control the image generation model!
66-
### (Step 6 🪄🖼️) Interact with the animated GIF
67-
To interact with the 3D hoverable animation created with depth maps, start a HTTP server as explained above, and you will be able to interact with the parallax.
68-
69-
**Optionally:** Navigate over to *Advanced Parameters*, and set OCR to true and roll a die! 🎲 Take a snapshot using the Gradio Image interface. After recognizing the die, the system will try to output the correct die value, and a special location associated with the number you rolled (see locations.json for the list), to add a theme to your prompts. You can change this and the corresponding theme it sets.
1+
# Bringing Adventure Gaming to Life 🧙 on AI PC with the OpenVINO™ toolkit 💻
2+
3+
**Authors:** Arisha Kumar, Garth Long, Ria Cheruvu, Dmitriy Pastushenkov, Paula Ramos, Raymond Lo, Zhuo Wu
4+
5+
**Contact (for questions):** Ria Cheruvu, Dmitriy Pastushenkov
6+
7+
**Tested on:** Intel® Core™ Ultra 7 and 9 Processors
8+
9+
## Pipeline
10+
![SIGGRAPH Drawing](https://github.com/user-attachments/assets/3ce58b50-4ee9-4dae-aeb6-0af5368a3ddd)
11+
12+
## Installation
13+
14+
1. Clone this repository to get started
15+
16+
2. Download and optimize required models
17+
- Nano-Llava (MultiModal) - Image Recognition/Captioning from Webcam
18+
- Whisper - Speech Recognition
19+
- Llama3-8b-instruct - Prompt Refinement
20+
- AI Superesolution - Increase res of generated image
21+
- Latent Consistency Models - Generating Image
22+
- Depth Anything v2 - Create 3d parallax animations
23+
24+
```
25+
python -m venv model_installation_venv
26+
model_installation_venv\Scripts\activate
27+
pip install -r python3.12_requirements_model_installation.txt
28+
python download_and_prepare_models.py
29+
```
30+
After model installation, you can remove the virtual environment as it isn't needed anymore.
31+
32+
33+
3. Create a virtual env and install the required python packages. Your requirements.txt file will depend on the Python version you're using (3.11 or 3.12) <br>
34+
```
35+
python -m venv dnd_env
36+
dnd_env\Scripts\activate
37+
pip install -r requirements.txt
38+
pip install "openai-whisper==20231117" --extra-index-url https://download.pytorch.org/whl/cpu
39+
40+
```
41+
4. To interact with the animated GIF outputs, you will need to host a simple web server on your system as the final output a player will see. To do so, please install Node.js via [its Download page](https://nodejs.org/en/download/package-manager) and [http-server](https://www.npmjs.com/package/http-server).
42+
43+
Run the following command to start an HTTP server within the repository. You can customize index.html with any additional elements as you'd like.
44+
```
45+
http-server
46+
```
47+
5. Open a terminal or you can use the existing one with dnd_env environment activated and start the Gradio GUI - <br>
48+
```
49+
python gradio_ui.py
50+
```
51+
Click on the web link to open the GUI in the web browser <br>
52+
53+
## How to Use 🛣️
54+
<img width="1270" alt="quick_demo_screenshot" src="https://github.com/user-attachments/assets/ddfea7f0-3f1d-4d1c-b356-3bc959a23837">
55+
56+
### (Step 1 📷) Take a picture
57+
Take a picture via the Gradio image interface of any object you want! Your "theme" will become the image description, if the object in the image is clearly captured.
58+
### (Step 2 🗣️) Speak your prompt
59+
Start the recording, and wait till the server is listening to begin speaking your prompt to life. Click the Stop button to stop the generation.
60+
### (Step 3 ➕) Add theme to prompt
61+
Now, your prompt is transcribed! Click on the "Add Theme to Prompt" button to combine your prompt and theme.
62+
### (Step 4 ⚙️) Refine it with an LLM
63+
You can optionally ask an LLM model to refine your model by clicking on the LLM button. It will try its best to generate a prompt infusing the elements (although it does hallucinate at times).
64+
### (Step 5 🖼️) Generate your image (and depth map)
65+
Click "Generate Image" to see your image come to life! A depth map will automatically be generated for the image as well. Feel free to adjust the advanced parameters to control the image generation model!
66+
### (Step 6 🪄🖼️) Interact with the animated GIF
67+
To interact with the 3D hoverable animation created with depth maps, start a HTTP server as explained above, and you will be able to interact with the parallax.
68+
69+
**Optionally:** Navigate over to *Advanced Parameters*, and set OCR to true and roll a die! 🎲 Take a snapshot using the Gradio Image interface. After recognizing the die, the system will try to output the correct die value, and a special location associated with the number you rolled (see locations.json for the list), to add a theme to your prompts. You can change this and the corresponding theme it sets.
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,65 @@
1-
import openvino as ov
2-
from huggingface_hub import hf_hub_download
3-
from depth_anything_v2.dpt import DepthAnythingV2
4-
import cv2
5-
import torch
6-
import torch.nn.functional as F
7-
import datasets
8-
import nncf
9-
10-
parser = argparse.ArgumentParser()
11-
parser.add_argument("--model_dir", type=str, default="model", help="Directory to place the model in")
12-
args = parser.parse_args()
13-
model_local_dir = Path(args.model_dir + "/depth_anything_v2")
14-
15-
encoder = "vits"
16-
model_type = "Small"
17-
model_id = f"depth_anything_v2_{encoder}"
18-
19-
model_path = hf_hub_download(repo_id=f"{model_local_dir}/Depth-Anything-V2-{model_type}", filename=f"{model_id}.pth", repo_type="model")
20-
21-
model = DepthAnythingV2(encoder=encoder, features=64, out_channels=[48, 96, 192, 384])
22-
model.load_state_dict(torch.load(model_path, map_location="cpu"))
23-
model.eval()
24-
25-
OV_DEPTH_ANYTHING_PATH = Path(f"{model_local_dir}/{model_id}.xml")
26-
27-
if not OV_DEPTH_ANYTHING_PATH.exists():
28-
ov_model = ov.convert_model(model, example_input=torch.rand(1, 3, 518, 518), input=[1, 3, 518, 518])
29-
ov.save_model(ov_model, OV_DEPTH_ANYTHING_PATH)
30-
31-
# Fetch `skip_kernel_extension` module
32-
r = requests.get(
33-
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py",
34-
)
35-
open("skip_kernel_extension.py", "w").write(r.text)
36-
37-
OV_DEPTH_ANYTHING_INT8_PATH = Path(f"{model_local_dir}/{model_id}_int8.xml")
38-
39-
if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
40-
subset_size = 300
41-
calibration_data = []
42-
dataset = datasets.load_dataset("Nahrawy/VIDIT-Depth-ControlNet", split="train", streaming=True).shuffle(seed=42).take(subset_size)
43-
for batch in dataset:
44-
image = np.array(batch["image"])[...,:3]
45-
image = image / 255.0
46-
image = transform({'image': image})['image']
47-
image = np.expand_dims(image, 0)
48-
calibration_data.append(image)
49-
50-
if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
51-
model = core.read_model(OV_DEPTH_ANYTHING_PATH)
52-
quantized_model = nncf.quantize(
53-
model=model,
54-
subset_size=subset_size,
55-
model_type=nncf.ModelType.TRANSFORMER,
56-
calibration_dataset=nncf.Dataset(calibration_data),
57-
)
58-
ov.save_model(quantized_model, OV_DEPTH_ANYTHING_INT8_PATH)
59-
60-
fp16_ir_model_size = OV_DEPTH_ANYTHING_PATH.with_suffix(".bin").stat().st_size / 2**20
61-
quantized_model_size = OV_DEPTH_ANYTHING_INT8_PATH.with_suffix(".bin").stat().st_size / 2**20
62-
63-
print(f"FP16 model size: {fp16_ir_model_size:.2f} MB")
64-
print(f"INT8 model size: {quantized_model_size:.2f} MB")
1+
import openvino as ov
2+
from huggingface_hub import hf_hub_download
3+
from depth_anything_v2.dpt import DepthAnythingV2
4+
import cv2
5+
import torch
6+
import torch.nn.functional as F
7+
import datasets
8+
import nncf
9+
10+
parser = argparse.ArgumentParser()
11+
parser.add_argument("--model_dir", type=str, default="model", help="Directory to place the model in")
12+
args = parser.parse_args()
13+
model_local_dir = Path(args.model_dir + "/depth_anything_v2")
14+
15+
encoder = "vits"
16+
model_type = "Small"
17+
model_id = f"depth_anything_v2_{encoder}"
18+
19+
model_path = hf_hub_download(repo_id=f"{model_local_dir}/Depth-Anything-V2-{model_type}", filename=f"{model_id}.pth", repo_type="model")
20+
21+
model = DepthAnythingV2(encoder=encoder, features=64, out_channels=[48, 96, 192, 384])
22+
model.load_state_dict(torch.load(model_path, map_location="cpu"))
23+
model.eval()
24+
25+
OV_DEPTH_ANYTHING_PATH = Path(f"{model_local_dir}/{model_id}.xml")
26+
27+
if not OV_DEPTH_ANYTHING_PATH.exists():
28+
ov_model = ov.convert_model(model, example_input=torch.rand(1, 3, 518, 518), input=[1, 3, 518, 518])
29+
ov.save_model(ov_model, OV_DEPTH_ANYTHING_PATH)
30+
31+
# Fetch `skip_kernel_extension` module
32+
r = requests.get(
33+
url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py",
34+
)
35+
open("skip_kernel_extension.py", "w").write(r.text)
36+
37+
OV_DEPTH_ANYTHING_INT8_PATH = Path(f"{model_local_dir}/{model_id}_int8.xml")
38+
39+
if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
40+
subset_size = 300
41+
calibration_data = []
42+
dataset = datasets.load_dataset("Nahrawy/VIDIT-Depth-ControlNet", split="train", streaming=True).shuffle(seed=42).take(subset_size)
43+
for batch in dataset:
44+
image = np.array(batch["image"])[...,:3]
45+
image = image / 255.0
46+
image = transform({'image': image})['image']
47+
image = np.expand_dims(image, 0)
48+
calibration_data.append(image)
49+
50+
if not OV_DEPTH_ANYTHING_INT8_PATH.exists():
51+
model = core.read_model(OV_DEPTH_ANYTHING_PATH)
52+
quantized_model = nncf.quantize(
53+
model=model,
54+
subset_size=subset_size,
55+
model_type=nncf.ModelType.TRANSFORMER,
56+
calibration_dataset=nncf.Dataset(calibration_data),
57+
)
58+
ov.save_model(quantized_model, OV_DEPTH_ANYTHING_INT8_PATH)
59+
60+
fp16_ir_model_size = OV_DEPTH_ANYTHING_PATH.with_suffix(".bin").stat().st_size / 2**20
61+
quantized_model_size = OV_DEPTH_ANYTHING_INT8_PATH.with_suffix(".bin").stat().st_size / 2**20
62+
63+
print(f"FP16 model size: {fp16_ir_model_size:.2f} MB")
64+
print(f"INT8 model size: {quantized_model_size:.2f} MB")
6565
print(f"Model compression rate: {fp16_ir_model_size / quantized_model_size:.3f}")

0 commit comments

Comments
 (0)