|
1 |
| -# Bringing Adventure Gaming to Life 🧙 on AI PC with the OpenVINO™ toolkit 💻 |
2 |
| - |
3 |
| -**Authors:** Arisha Kumar, Garth Long, Ria Cheruvu, Dmitriy Pastushenkov, Paula Ramos, Raymond Lo, Zhuo Wu |
4 |
| - |
5 |
| -**Contact (for questions):** Ria Cheruvu, Dmitriy Pastushenkov |
6 |
| - |
7 |
| -**Tested on:** Intel® Core™ Ultra 7 and 9 Processors |
8 |
| - |
9 |
| -## Pipeline |
10 |
| - |
11 |
| - |
12 |
| -## Installation |
13 |
| - |
14 |
| -1. Clone this repository to get started |
15 |
| - |
16 |
| -2. Download and optimize required models |
17 |
| - - Nano-Llava (MultiModal) - Image Recognition/Captioning from Webcam |
18 |
| - - Whisper - Speech Recognition |
19 |
| - - Llama3-8b-instruct - Prompt Refinement |
20 |
| - - AI Superesolution - Increase res of generated image |
21 |
| - - Latent Consistency Models - Generating Image |
22 |
| - - Depth Anything v2 - Create 3d parallax animations |
23 |
| - |
24 |
| - ``` |
25 |
| - python -m venv model_installation_venv |
26 |
| - model_installation_venv\Scripts\activate |
27 |
| - pip install -r python3.12_requirements_model_installation.txt |
28 |
| - python download_and_prepare_models.py |
29 |
| - ``` |
30 |
| - After model installation, you can remove the virtual environment as it isn't needed anymore. |
31 |
| - |
32 |
| - |
33 |
| -3. Create a virtual env and install the required python packages. Your requirements.txt file will depend on the Python version you're using (3.11 or 3.12) <br> |
34 |
| - ``` |
35 |
| - python -m venv dnd_env |
36 |
| - dnd_env\Scripts\activate |
37 |
| - pip install -r requirements.txt |
38 |
| - pip install "openai-whisper==20231117" --extra-index-url https://download.pytorch.org/whl/cpu |
39 |
| -
|
40 |
| - ``` |
41 |
| -4. To interact with the animated GIF outputs, you will need to host a simple web server on your system as the final output a player will see. To do so, please install Node.js via [its Download page](https://nodejs.org/en/download/package-manager) and [http-server](https://www.npmjs.com/package/http-server). |
42 |
| -
|
43 |
| -Run the following command to start an HTTP server within the repository. You can customize index.html with any additional elements as you'd like. |
44 |
| -``` |
45 |
| -http-server |
46 |
| -``` |
47 |
| -5. Open a terminal or you can use the existing one with dnd_env environment activated and start the Gradio GUI - <br> |
48 |
| -``` |
49 |
| -python gradio_ui.py |
50 |
| -``` |
51 |
| -Click on the web link to open the GUI in the web browser <br> |
52 |
| -
|
53 |
| -## How to Use 🛣️ |
54 |
| -<img width="1270" alt="quick_demo_screenshot" src="https://github.com/user-attachments/assets/ddfea7f0-3f1d-4d1c-b356-3bc959a23837"> |
55 |
| -
|
56 |
| -### (Step 1 📷) Take a picture |
57 |
| -Take a picture via the Gradio image interface of any object you want! Your "theme" will become the image description, if the object in the image is clearly captured. |
58 |
| -### (Step 2 🗣️) Speak your prompt |
59 |
| -Start the recording, and wait till the server is listening to begin speaking your prompt to life. Click the Stop button to stop the generation. |
60 |
| -### (Step 3 ➕) Add theme to prompt |
61 |
| -Now, your prompt is transcribed! Click on the "Add Theme to Prompt" button to combine your prompt and theme. |
62 |
| -### (Step 4 ⚙️) Refine it with an LLM |
63 |
| -You can optionally ask an LLM model to refine your model by clicking on the LLM button. It will try its best to generate a prompt infusing the elements (although it does hallucinate at times). |
64 |
| -### (Step 5 🖼️) Generate your image (and depth map) |
65 |
| -Click "Generate Image" to see your image come to life! A depth map will automatically be generated for the image as well. Feel free to adjust the advanced parameters to control the image generation model! |
66 |
| -### (Step 6 🪄🖼️) Interact with the animated GIF |
67 |
| -To interact with the 3D hoverable animation created with depth maps, start a HTTP server as explained above, and you will be able to interact with the parallax. |
68 |
| -
|
69 |
| -**Optionally:** Navigate over to *Advanced Parameters*, and set OCR to true and roll a die! 🎲 Take a snapshot using the Gradio Image interface. After recognizing the die, the system will try to output the correct die value, and a special location associated with the number you rolled (see locations.json for the list), to add a theme to your prompts. You can change this and the corresponding theme it sets. |
| 1 | +# Bringing Adventure Gaming to Life 🧙 on AI PC with the OpenVINO™ toolkit 💻 |
| 2 | + |
| 3 | +**Authors:** Arisha Kumar, Garth Long, Ria Cheruvu, Dmitriy Pastushenkov, Paula Ramos, Raymond Lo, Zhuo Wu |
| 4 | + |
| 5 | +**Contact (for questions):** Ria Cheruvu, Dmitriy Pastushenkov |
| 6 | + |
| 7 | +**Tested on:** Intel® Core™ Ultra 7 and 9 Processors |
| 8 | + |
| 9 | +## Pipeline |
| 10 | + |
| 11 | + |
| 12 | +## Installation |
| 13 | + |
| 14 | +1. Clone this repository to get started |
| 15 | + |
| 16 | +2. Download and optimize required models |
| 17 | + - Nano-Llava (MultiModal) - Image Recognition/Captioning from Webcam |
| 18 | + - Whisper - Speech Recognition |
| 19 | + - Llama3-8b-instruct - Prompt Refinement |
| 20 | + - AI Superesolution - Increase res of generated image |
| 21 | + - Latent Consistency Models - Generating Image |
| 22 | + - Depth Anything v2 - Create 3d parallax animations |
| 23 | + |
| 24 | + ``` |
| 25 | + python -m venv model_installation_venv |
| 26 | + model_installation_venv\Scripts\activate |
| 27 | + pip install -r python3.12_requirements_model_installation.txt |
| 28 | + python download_and_prepare_models.py |
| 29 | + ``` |
| 30 | + After model installation, you can remove the virtual environment as it isn't needed anymore. |
| 31 | + |
| 32 | + |
| 33 | +3. Create a virtual env and install the required python packages. Your requirements.txt file will depend on the Python version you're using (3.11 or 3.12) <br> |
| 34 | + ``` |
| 35 | + python -m venv dnd_env |
| 36 | + dnd_env\Scripts\activate |
| 37 | + pip install -r requirements.txt |
| 38 | + pip install "openai-whisper==20231117" --extra-index-url https://download.pytorch.org/whl/cpu |
| 39 | +
|
| 40 | + ``` |
| 41 | +4. To interact with the animated GIF outputs, you will need to host a simple web server on your system as the final output a player will see. To do so, please install Node.js via [its Download page](https://nodejs.org/en/download/package-manager) and [http-server](https://www.npmjs.com/package/http-server). |
| 42 | +
|
| 43 | +Run the following command to start an HTTP server within the repository. You can customize index.html with any additional elements as you'd like. |
| 44 | +``` |
| 45 | +http-server |
| 46 | +``` |
| 47 | +5. Open a terminal or you can use the existing one with dnd_env environment activated and start the Gradio GUI - <br> |
| 48 | +``` |
| 49 | +python gradio_ui.py |
| 50 | +``` |
| 51 | +Click on the web link to open the GUI in the web browser <br> |
| 52 | +
|
| 53 | +## How to Use 🛣️ |
| 54 | +<img width="1270" alt="quick_demo_screenshot" src="https://github.com/user-attachments/assets/ddfea7f0-3f1d-4d1c-b356-3bc959a23837"> |
| 55 | +
|
| 56 | +### (Step 1 📷) Take a picture |
| 57 | +Take a picture via the Gradio image interface of any object you want! Your "theme" will become the image description, if the object in the image is clearly captured. |
| 58 | +### (Step 2 🗣️) Speak your prompt |
| 59 | +Start the recording, and wait till the server is listening to begin speaking your prompt to life. Click the Stop button to stop the generation. |
| 60 | +### (Step 3 ➕) Add theme to prompt |
| 61 | +Now, your prompt is transcribed! Click on the "Add Theme to Prompt" button to combine your prompt and theme. |
| 62 | +### (Step 4 ⚙️) Refine it with an LLM |
| 63 | +You can optionally ask an LLM model to refine your model by clicking on the LLM button. It will try its best to generate a prompt infusing the elements (although it does hallucinate at times). |
| 64 | +### (Step 5 🖼️) Generate your image (and depth map) |
| 65 | +Click "Generate Image" to see your image come to life! A depth map will automatically be generated for the image as well. Feel free to adjust the advanced parameters to control the image generation model! |
| 66 | +### (Step 6 🪄🖼️) Interact with the animated GIF |
| 67 | +To interact with the 3D hoverable animation created with depth maps, start a HTTP server as explained above, and you will be able to interact with the parallax. |
| 68 | +
|
| 69 | +**Optionally:** Navigate over to *Advanced Parameters*, and set OCR to true and roll a die! 🎲 Take a snapshot using the Gradio Image interface. After recognizing the die, the system will try to output the correct die value, and a special location associated with the number you rolled (see locations.json for the list), to add a theme to your prompts. You can change this and the corresponding theme it sets. |
0 commit comments