diff --git a/Cookbook/cn/opensource/Inference/vLLM_Inference_tutorial.ipynb b/Cookbook/cn/opensource/Inference/vLLM_Inference_tutorial.ipynb
new file mode 100644
index 00000000..e85eed5d
--- /dev/null
+++ b/Cookbook/cn/opensource/Inference/vLLM_Inference_tutorial.ipynb
@@ -0,0 +1,150 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 使用vLLM进行Yi-1.5-6B-Chat模型的推理\n",
+    "\n",
+    "欢迎来到本教程！在这里，我们将指导您如何使用vLLM进行Yi-1.5-6B-Chat模型的推理。vLLM是一个快速且易于使用的大型语言模型（LLM）推理和服务库。让我们开始吧！"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 🚀 在Colab上运行\n",
+    "\n",
+    "我们还提供了一键运行的[Colab脚本](https://colab.research.google.com/drive/1KuydGHHbI31Q0WIpwg7UmH0rfNjii8Wl?usp=drive_link)，让开发变得更简单！"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 安装\n",
+    "\n",
+    "首先，我们需要安装相关的依赖。根据官方文档要求，使用pip安装vLLM需要CUDA 12.1。您可以参考官方[文档](https://docs.vllm.ai/en/stable/getting_started/installation.html)获取更多详情。\n",
+    "\n",
+    "现在让我们安装vLLM："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {
+    "id": "installation"
+   },
+   "source": [
+    "!pip install vllm"
+   ],
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 加载模型\n",
+    "\n",
+    "接下来，我们将加载Yi-1.5-6B-Chat模型。请注意电脑的显存和硬盘占用情况。如果出现错误，可能是由于资源不足引起的。\n",
+    "\n",
+    "本教程使用Yi-1.5-6B-Chat模型。以下是该模型的显存和硬盘占用情况：\n",
+    "\n",
+    "| 模型 | 显存使用 | 硬盘占用 |\n",
+    "|-------|------------|------------------|\n",
+    "| Yi-1.5-6B-Chat | 21G | 15G |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {
+    "id": "load-model"
+   },
+   "source": [
+    "from transformers import AutoTokenizer\n",
+    "from vllm import LLM, SamplingParams\n",
+    "\n",
+    "# 加载分词器\n",
+    "tokenizer = AutoTokenizer.from_pretrained(\"01-ai/Yi-1.5-6B-Chat\")\n",
+    "\n",
+    "# 设置采样参数\n",
+    "sampling_params = SamplingParams(\n",
+    "    temperature=0.8, \n",
+    "    top_p=0.8)\n",
+    "\n",
+    "# 加载模型\n",
+    "llm = LLM(model=\"01-ai/Yi-1.5-6B-Chat\")"
+   ],
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 模型推理\n",
+    "\n",
+    "现在，让我们准备一个提示词模版并使用模型进行推理。在这个例子中，我们将使用一个简单的问候语提示词。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "metadata": {
+    "id": "inference"
+   },
+   "source": [
+    "# 准备提示词模版\n",
+    "prompt = \"你好！\"  # 根据需要更改提示词\n",
+    "messages = [\n",
+    "    {\"role\": \"user\", \"content\": prompt}\n",
+    "]\n",
+    "text = tokenizer.apply_chat_template(\n",
+    "    messages,\n",
+    "    tokenize=False,\n",
+    "    add_generation_prompt=True\n",
+    ")\n",
+    "print(text)\n",
+    "\n",
+    "# 生成回复\n",
+    "outputs = llm.generate([text], sampling_params)\n",
+    "\n",
+    "# 打印输出\n",
+    "for output in outputs:\n",
+    "    prompt = output.prompt\n",
+    "    generated_text = output.outputs[0].text\n",
+    "    print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n",
+    "# 期望的回复：\"你好！今天见到你很高兴。我能为你做些什么呢？\""
+   ],
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "就是这样！您已经成功地使用vLLM进行了Yi-1.5-6B-Chat模型的推理。请随意尝试不同的提示词并调整采样参数，看看模型会如何响应。祝您实验愉快！"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}