diff --git a/Cookbook/cn/opensource/Inference/vLLM_Inference_tutorial.ipynb b/Cookbook/cn/opensource/Inference/vLLM_Inference_tutorial.ipynb new file mode 100644 index 00000000..e85eed5d --- /dev/null +++ b/Cookbook/cn/opensource/Inference/vLLM_Inference_tutorial.ipynb @@ -0,0 +1,150 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 使用vLLM进行Yi-1.5-6B-Chat模型的推理\n", + "\n", + "欢迎来到本教程!在这里,我们将指导您如何使用vLLM进行Yi-1.5-6B-Chat模型的推理。vLLM是一个快速且易于使用的大型语言模型(LLM)推理和服务库。让我们开始吧!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 🚀 在Colab上运行\n", + "\n", + "我们还提供了一键运行的[Colab脚本](https://colab.research.google.com/drive/1KuydGHHbI31Q0WIpwg7UmH0rfNjii8Wl?usp=drive_link),让开发变得更简单!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 安装\n", + "\n", + "首先,我们需要安装相关的依赖。根据官方文档要求,使用pip安装vLLM需要CUDA 12.1。您可以参考官方[文档](https://docs.vllm.ai/en/stable/getting_started/installation.html)获取更多详情。\n", + "\n", + "现在让我们安装vLLM:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "installation" + }, + "source": [ + "!pip install vllm" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 加载模型\n", + "\n", + "接下来,我们将加载Yi-1.5-6B-Chat模型。请注意电脑的显存和硬盘占用情况。如果出现错误,可能是由于资源不足引起的。\n", + "\n", + "本教程使用Yi-1.5-6B-Chat模型。以下是该模型的显存和硬盘占用情况:\n", + "\n", + "| 模型 | 显存使用 | 硬盘占用 |\n", + "|-------|------------|------------------|\n", + "| Yi-1.5-6B-Chat | 21G | 15G |" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "load-model" + }, + "source": [ + "from transformers import AutoTokenizer\n", + "from vllm import LLM, SamplingParams\n", + "\n", + "# 加载分词器\n", + "tokenizer = AutoTokenizer.from_pretrained(\"01-ai/Yi-1.5-6B-Chat\")\n", + "\n", + "# 设置采样参数\n", + "sampling_params = SamplingParams(\n", + " temperature=0.8, \n", + " top_p=0.8)\n", + "\n", + "# 加载模型\n", + "llm = LLM(model=\"01-ai/Yi-1.5-6B-Chat\")" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 模型推理\n", + "\n", + "现在,让我们准备一个提示词模版并使用模型进行推理。在这个例子中,我们将使用一个简单的问候语提示词。" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "inference" + }, + "source": [ + "# 准备提示词模版\n", + "prompt = \"你好!\" # 根据需要更改提示词\n", + "messages = [\n", + " {\"role\": \"user\", \"content\": prompt}\n", + "]\n", + "text = tokenizer.apply_chat_template(\n", + " messages,\n", + " tokenize=False,\n", + " add_generation_prompt=True\n", + ")\n", + "print(text)\n", + "\n", + "# 生成回复\n", + "outputs = llm.generate([text], sampling_params)\n", + "\n", + "# 打印输出\n", + "for output in outputs:\n", + " prompt = output.prompt\n", + " generated_text = output.outputs[0].text\n", + " print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n", + "# 期望的回复:\"你好!今天见到你很高兴。我能为你做些什么呢?\"" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "就是这样!您已经成功地使用vLLM进行了Yi-1.5-6B-Chat模型的推理。请随意尝试不同的提示词并调整采样参数,看看模型会如何响应。祝您实验愉快!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.8" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}