Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #555

Merged
merged 1 commit into from
Jul 26, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions Cookbook/cn/opensource/Inference/vLLM_Inference_tutorial.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 使用vLLM进行Yi-1.5-6B-Chat模型的推理\n",
"\n",
"欢迎来到本教程!在这里,我们将指导您如何使用vLLM进行Yi-1.5-6B-Chat模型的推理。vLLM是一个快速且易于使用的大型语言模型(LLM)推理和服务库。让我们开始吧!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🚀 在Colab上运行\n",
"\n",
"我们还提供了一键运行的[Colab脚本](https://colab.research.google.com/drive/1KuydGHHbI31Q0WIpwg7UmH0rfNjii8Wl?usp=drive_link),让开发变得更简单!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 安装\n",
"\n",
"首先,我们需要安装相关的依赖。根据官方文档要求,使用pip安装vLLM需要CUDA 12.1。您可以参考官方[文档](https://docs.vllm.ai/en/stable/getting_started/installation.html)获取更多详情。\n",
"\n",
"现在让我们安装vLLM:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "installation"
},
"source": [
"!pip install vllm"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 加载模型\n",
"\n",
"接下来,我们将加载Yi-1.5-6B-Chat模型。请注意电脑的显存和硬盘占用情况。如果出现错误,可能是由于资源不足引起的。\n",
"\n",
"本教程使用Yi-1.5-6B-Chat模型。以下是该模型的显存和硬盘占用情况:\n",
"\n",
"| 模型 | 显存使用 | 硬盘占用 |\n",
"|-------|------------|------------------|\n",
"| Yi-1.5-6B-Chat | 21G | 15G |"
]
},
{
"cell_type": "code",
"metadata": {
"id": "load-model"
},
"source": [
"from transformers import AutoTokenizer\n",
"from vllm import LLM, SamplingParams\n",
"\n",
"# 加载分词器\n",
"tokenizer = AutoTokenizer.from_pretrained(\"01-ai/Yi-1.5-6B-Chat\")\n",
"\n",
"# 设置采样参数\n",
"sampling_params = SamplingParams(\n",
" temperature=0.8, \n",
" top_p=0.8)\n",
"\n",
"# 加载模型\n",
"llm = LLM(model=\"01-ai/Yi-1.5-6B-Chat\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 模型推理\n",
"\n",
"现在,让我们准备一个提示词模版并使用模型进行推理。在这个例子中,我们将使用一个简单的问候语提示词。"
]
},
{
"cell_type": "code",
"metadata": {
"id": "inference"
},
"source": [
"# 准备提示词模版\n",
"prompt = \"你好!\" # 根据需要更改提示词\n",
"messages = [\n",
" {\"role\": \"user\", \"content\": prompt}\n",
"]\n",
"text = tokenizer.apply_chat_template(\n",
" messages,\n",
" tokenize=False,\n",
" add_generation_prompt=True\n",
")\n",
"print(text)\n",
"\n",
"# 生成回复\n",
"outputs = llm.generate([text], sampling_params)\n",
"\n",
"# 打印输出\n",
"for output in outputs:\n",
" prompt = output.prompt\n",
" generated_text = output.outputs[0].text\n",
" print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n",
"# 期望的回复:\"你好!今天见到你很高兴。我能为你做些什么呢?\""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"就是这样!您已经成功地使用vLLM进行了Yi-1.5-6B-Chat模型的推理。请随意尝试不同的提示词并调整采样参数,看看模型会如何响应。祝您实验愉快!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading