2024/08/08: Slides are released at nqobu/nvidia.
- Visit TWCC (https://www.twcc.ai) and click
Sign In
. - Enter your email and password, and then login through iService.
- You should be re-directed to the user dashboard (https://www.twcc.ai/user/dashboard) after sign in.
- Click the top-left dropdown and select the 「國網教育訓練用計畫」 instead of other wallets.
WARNING: This is an important step to avoid charges on your other wallets. - Click the second dropdown:
Services > Interactive Container
. - And click
Create
. - Search for
nemo
ortriton
and click it, depending on which tutorial you want to run. - Select container image
nemo-24.05:latest
ortritonserver-24.05-trtllm-python-py3:latest
, depending on which tutorial you want to run, and then scroll down. - Select
c.super
(V100 GPU x1) for the configuration type, and clickREVIEW & CREATE
- Confirm again that you are using the correct wallet 「國網教育訓練用計畫」, and click
CREATE
. - Wait for the container to be initialized and ready. You can click
REFRESH
to check the status after a few minutes.
WARNING: After finishing the tutorial, make sure to check the container and clickDELETE
to avoid using up all your credits. - When the container shows
Ready
, click the container name to enter the details page. - Scroll down the container details page.
- Click the
LAUNCH
button in theJupyter
row to open the Jupyter Notebook. - Click
New
and thenTerminal
to open a terminal. - You can now run commands in the terminal.
Related video: TWCC 開發型容器-基本操作
In the terminal, run:
# Clone the repository and link the workspace
cd ~
git clone https://github.com/j3soon/LLM-Tutorial
ls -al /workspace
# Check the workspace does not contain any important data, all data in this directory will be deleted in the next step
sudo rm -rf /workspace
sudo ln -s $PWD/LLM-Tutorial/workspace /.
# Change ownership for NeMo 24.05, which is required when patching it later in the notebook
sudo chown -R $(id -u):$(id -g) /opt/NeMo
# All done! Go back to Jupyter Notebook / Jupyter Lab
Note: To paste text in the jupyter terminal webpage, press Ctrl+Shift+V
. To copy text, select the text, right-click, and choose Copy
.
You should now see the LLM-Tutorial
folder in the Jupyter file browser.
Navigate to LLM-Tutorial/workspace
and open the notebook you want to run.
Since these notebooks require a lot of time to run, we recommend you to run all cells before the last cell in the notebook (the last cell deletes all relevant data). Then, you can start going through the notebook.
- NeMo_Training_TinyLlama.ipynb
- TensorRT-LLM.ipynb
- NeMo_Guardrails.ipynb (requires a free NVIDIA NIM API Key)
After you have finished the tutorial, make sure to delete the container to avoid using up all your credits.
The container list should be empty:
If you have been using TWCC in the past, you may encounter unexpected errors during pip install
. This is because TWCC mounts the user home directory automatically for ease of development. This will cause the package installed by pip
to be stored under the ~/.local
directory. You can back up the .local
directory and remove it:
mv ~/.local ~/.local.bak
mv ~/.bashrc ~/.bashrc.bak
and then delete and re-create the container (restarting the Jupyter kernel may not be enough).
After that, you should re-run the environment setup steps above (rm
/ln
/chown
).
If you encounter the following error message:
ImportError: cannot import name 'ParameterSource' from 'click.core' (/usr/local/lib/python3.10/dist-packages/click/core.py)
Run:
pip install -U click
as mentioned in this post, and re-run the cell.
This may happen if you are using TWCC in the past and have somehow end up with zero disk quota due to no subscribed projects.
Example error message:
~$ git clone https://github.com/j3soon/LLM-Tutorial
Cloning into 'LLM-Tutorial'...
error: copy-fd: write returned: Disk quota exceeded
fatal: cannot copy '/usr/share/git-core/templates/hooks/fsmonitor-watchman.sample' to '/home/uXXXXXXX/LLM-Tutorial/.git/hooks/fsmonitor-watchman.sample': Disk quota exceeded
Click VIEW DETAILS
in the user dashboard and check if the Total Storage
quota is below 100GiB
. If so, you indeed stumbled upon the disk quota issue. The HFS Portal
should show similar results.
Follow the steps below to resolve the issue:
-
Keep the used disk space below 100 GB for both
Home
andWork
directories by removing unnecessary files. -
In the
HFS User Portal
, clickChange Project
and apply the國網教育訓練用計畫
project. -
The disk quota should be restored to 100 GB. You may need to wait a while for the disk quota information to be updated.
If you encountered the following error message when running sudo rm -rf /workspace
:
env: ‘rm’: Permission denied
This may be due to the sudo
alias in ~/.bashrc
being added when installing conda. You can remove the alias by commenting the following line in ~/.bashrc
:
# alias sudo='sudo env PATH=$PA'
and then open a new terminal or run source ~/.bashrc
.