-
Notifications
You must be signed in to change notification settings - Fork 60
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of github.com:GaiaNet-AI/docs
- Loading branch information
Showing
49 changed files
with
1,490 additions
and
219 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
name: to dapp | ||
on: | ||
push: | ||
branches: [ "main" ] | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
actions: write | ||
checks: write | ||
contents: write | ||
deployments: write | ||
issues: write | ||
packages: write | ||
pull-requests: write | ||
repository-projects: write | ||
security-events: write | ||
statuses: write | ||
steps: | ||
- uses: actions/checkout@v4 | ||
with: | ||
token: ${{ secrets.GH_TOKEN }} | ||
- name: Use Node.js | ||
uses: actions/setup-node@v4 | ||
with: | ||
node-version: 20.x | ||
- name: Install Dependencies | ||
run: yarn | ||
|
||
- name: Build Project | ||
run: | | ||
yarn build | ||
- name: Push directory to another repository | ||
if: ${{ github.event_name == 'push' }} | ||
uses: cpina/github-action-push-to-another-repository@main | ||
env: | ||
API_TOKEN_GITHUB: ${{ secrets.GH_TOKEN }} | ||
with: | ||
source-directory: 'build' | ||
destination-github-username: 'GaiaNet-AI' | ||
destination-repository-name: 'gaianet-dapp' | ||
target-directory: 'public/docs' | ||
user-email: juyichen0413@foxmail.com | ||
target-branch: main |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"label": "GaiaNet Node Creator Guide", | ||
"position": 4, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "How to finetune your own LLMs and create vector collections based on your own proprietary and private knowledge" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"label": "GaiaNet Node with finetuned LLMs", | ||
"position": 2, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "How to finetune LLMs to speak the way you like." | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
sidebar_position: 1 | ||
--- | ||
|
||
# Finetune LLMs | ||
|
||
You could finetune an open-source LLM to | ||
|
||
* teach it to follow conversations | ||
* teach it to respect and follow instructions | ||
* make it refuse to answer certain questions | ||
* give it a specific "speaking" style | ||
* make it response in certain formats (e.g., JSON) | ||
* give it focus on a specific domain area | ||
* teach it certain knowledge | ||
|
||
To do that, you need to create a set of question and answer pairs to show the model the prompt and the expected response. | ||
Then, you can use a finetuning tool to perform the training and make the model respond the expected answer for | ||
each question. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
--- | ||
sidebar_position: 2 | ||
--- | ||
|
||
# llama.cpp | ||
|
||
The popular llama.cpp tool comes with a `finetune` utility. It works well on CPUs! This finetune guide is reproduced with | ||
permission from Tony Yuan's [Finetune an open-source LLM for the chemistry subject](https://github.com/YuanTony/chemistry-assistant/tree/main/fine-tune-model) project. | ||
|
||
## Build the finetune utility from llama.cpp | ||
|
||
The `finetune` utility in llama.cpp can work with quantitized GGUF files on CPUs, and hence dramatically reducing the hardware requirements and expenses for finetuning LLMs. | ||
|
||
Checkout and download the llama.cpp source code. | ||
|
||
``` | ||
git clone https://github.com/ggerganov/llama.cpp | ||
cd llama.cpp | ||
``` | ||
|
||
Build the llama.cpp binary. | ||
|
||
``` | ||
mkdir build | ||
cd build | ||
cmake .. | ||
cmake --build . --config Release | ||
``` | ||
|
||
If you have Nvidia GPU and CUDA toolkit installed, you should build llama.cpp with CUDA support. | ||
|
||
``` | ||
mkdir build | ||
cd build | ||
cmake .. -DLLAMA_CUBLAS=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc | ||
cmake --build . --config Release | ||
``` | ||
|
||
## Get the base model | ||
|
||
We are going to use Meta's Llama2 chat 13B model as the base model. Note that we are using a Q5 quantitized GGUF model file directly to save computing resources. You can use any of the Llama2 compatible GGUF models on Hugging Face. | ||
|
||
``` | ||
cd .. # change to the llama.cpp directory | ||
cd models/ | ||
curl -LO https://huggingface.co/second-state/Llama-2-13B-Chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf | ||
``` | ||
|
||
## Create a question and answer set for fine-tuning | ||
|
||
Next we came up with 1700+ pairs of QAs for the chemistry subject. It is like the following in a [CSV file](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/train.csv). | ||
|
||
Question | Answer | ||
----- | ------- | ||
What is unique about hydrogen? | It's the most abundant element in the universe, making up over 75% of all matter. | ||
What is the main component of Jupiter? | Hydrogen is the main component of Jupiter and the other gas giant planets. | ||
Can hydrogen be used as fuel? | Yes, hydrogen is used as rocket fuel. It can also power fuel cells to generate electricity. | ||
What is mercury's atomic number? | The atomic number of mercury is 80 | ||
What is Mercury? | Mercury is a silver colored metal that is liquid at room temperature. It has an atomic number of 80 on the periodic table. It is toxic to humans. | ||
|
||
> We used GPT-4 to help me come up many of these QAs. | ||
Then, we wrote a [Python script](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/convert.py) to convert each row in the CSV file into a sample QA in the Llama2 chat template format. Notice that each QA pair starts with `<SFT>` as an indicator for the finetune program to start a sample. The result [train.txt](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/train.txt) file can now be used in fine-tuning. | ||
|
||
Put the [train.txt](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/train.txt) file in the `llama.cpp/models` directory with the GGUF base model. | ||
|
||
## Finetune! | ||
|
||
Use the following command to start the fine-tuning process on your CPUs. I am putting it in the background so that it can run continuous now. | ||
It could several days or even a couple of weeks depending on how many CPUs you have. | ||
|
||
``` | ||
nohup ../build/bin/finetune --model-base llama-2-13b-chat.Q5_K_M.gguf --lora-out lora.bin --train-data train.txt --sample-start '<SFT>' --adam-iter 1024 & | ||
``` | ||
|
||
You can check the process every a few hours in the `nohup.out` file. It will report `loss` for each iteration. You can stop the process when the `loss` goes consistently under `0.1`. | ||
|
||
**Note 1** If you have multiple CPUs (or CPU cores), you can speed up the finetuning process by adding a `-t` parameter to the above command to use more threads. For example, if you have 60 CPU cores, you could do `-t 60` to use all of them. | ||
|
||
**Note 2** If your finetuning process is interrupted, you can restart it from `checkpoint-250.gguf`. The next file it outputs is `checkpoint-260.gguf`. | ||
|
||
``` | ||
nohup ../build/bin/finetune --model-base llama-2-13b-chat.Q5_K_M.gguf --checkpoint-in checkpoint-250.gguf --lora-out lora.bin --train-data train.txt --sample-start '<SFT>' --adam-iter 1024 & | ||
``` | ||
|
||
## Merge | ||
|
||
The fine-tuning process updates several layers of the LLM's neural network. Those updated layers are saved in a file called `lora.bin` and you can now merge them back to the base LLM to create the new fine-tuned LLM. | ||
|
||
``` | ||
../build/bin/export-lora --model-base llama-2-13b-chat.Q5_K_M.gguf --lora lora.bin --model-out chemistry-assistant-13b-q5_k_m.gguf | ||
``` | ||
|
||
The result is this file. | ||
|
||
``` | ||
curl -LO https://huggingface.co/juntaoyuan/chemistry-assistant-13b/resolve/main/chemistry-assistant-13b-q5_k_m.gguf | ||
``` | ||
|
||
**Note 3** If you want to use a checkpoint to generate a `lora.bin` file, use the following command. This is needed when you believe the final `lora.bin` is an overfit. | ||
|
||
``` | ||
../build/bin/finetune --model-base llama-2-13b-chat.Q5_K_M.gguf --checkpoint-in checkpoint-250.gguf --only-write-lora --lora-out lora.bin | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"label": "Knowledge bases", | ||
"position": 1, | ||
"link": { | ||
"type": "generated-index", | ||
"description": "How to create vector collections based on your own proprietary and private knowledge" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
sidebar_position: 1 | ||
--- | ||
|
||
# What is a RAG-based LLM application | ||
|
||
Retrieval-augmented generation (RAG) is a way to solve the hallucinations of the Large Language Model (LLM) by attaching external data sources to the model. The RAG way will enhance the accuracy and reliability of LLMs with facts retrieved from external knowledge. That's why a GaiaNet node is RAG-based LLM application. | ||
|
||
For example, if you ask ChatGPT the question What is Layer 2, the answer is that Layer 2 is a concept from the computer network. However, if you ask a blockchain person, he answers that Layer 2 is a way to scale the original Ethereum network. That's the difference between the original model and the model with RAG. | ||
|
||
We will cover the external knowledge preparation and how a RAG-based application completes a conversation. If you have learned how a RAG application works, go to [Build a RAG application with GaiaNet](web-tool) to start building one. | ||
|
||
1. Create embeddings for your own knowledge | ||
2. Lifecycle of a user query on a RAG-based LLM application | ||
|
||
For a RAG-based LLM application, besides the application itself, we will use | ||
|
||
* a chat model like Llama-3-8B for generating responses to the user | ||
* a text embedding model like all-miniLM-V2 for creating and retrieving embeddings | ||
* a Vector DB like Qdrant for storing embeddings. | ||
|
||
## Workflow for creating embeddings | ||
|
||
The first step is to create embeddings for our knowledge base and store the embeddings in a vector DB. | ||
|
||
 | ||
|
||
First of all, we split the long text into small paragraphs (ie, chunks). All LLMs have a maximum context length. The model can't read the context if the text is too long. | ||
|
||
The most used rule for a GaiaNet node is to put the content in one chapter together. Remember, insert a blank line between two chunks. You can also use other algorithms to chunk your text. | ||
|
||
After chunking the document, we can convert these chunks to embeddings leveraging the embedding model. The embedding model is trained to create embeddings based on text and search for similar embeddings. We will use the latter function in the process of user query. | ||
|
||
|
||
Additionally, we will also need a vector DB to store the embeddings so that we can retrieve these embeddings quickly at any time. | ||
|
||
In Gaianet, we will get a database snapshot with the embeddings to use at last. Check out how to create your embeddings using [GaiaNet web tool](web-tool.md), [from a plain text file](text.md), and [from a markdown file](markdown.md). | ||
|
||
## Lifecycle of a user query on a RAG-based LLM application | ||
|
||
Next, let's learn the lifecycle of a user query on a RAG-based LLM application. We will take the [a Gaianet Node with Gaianet knowledge](https://knowledge.gaianet.network/chatbot-ui/index.html) as an example. | ||
|
||
 | ||
|
||
### Ask a question | ||
|
||
when you send a question in human language to the node, the embedding model will first convert your question to embedding. | ||
|
||
### Retrieve similar embeddings | ||
|
||
Then, the embedding model will search all the embeddings stored in the Qdrant vector DB and retrieve the embeddings that are similar to the question embeddings. | ||
|
||
### Response to the user query | ||
|
||
The embedding node will return the retrieved embeddings to the chat model. The chat model will use the retrieved embeddings plus your input questions as context to answer your queries finally. | ||
|
||
That's how a RAG-based application works. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
--- | ||
sidebar_position: 4 | ||
--- | ||
|
||
# Knowledge base from a markdown file | ||
|
||
In this section, we will discuss how to create a vector collection snapshot from a markdown file. The | ||
snapshot file can then be loaded by a GaiaNet node as its knowledge base. | ||
You will have the option to create a vector for each markdown section. | ||
|
||
## Prerequisites | ||
|
||
Install the WasmEdge Runtime, the cross-platform LLM runtime. | ||
|
||
``` | ||
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-ggml | ||
``` | ||
|
||
Download a chat model and an embedding model. | ||
|
||
``` | ||
curl -LO https://huggingface.co/gaianet/Nomic-embed-text-v1.5-Embedding-GGUF/resolve/main/nomic-embed-text-v1.5-f16.gguf | ||
``` | ||
|
||
The embedding model is a special kind of LLM that turns sentences into vectors. The vectors can then be stored in a vector database and searched later. When the sentences are from a body of text that represents a knowledge domain, that vector database becomes our RAG knowledge base. | ||
|
||
## Start a vector database | ||
|
||
By default, we use Qdrant as the vector database. You can start a Qdrant instance on your server using Docker. The following command starts it in the background. | ||
|
||
``` | ||
mkdir qdrant_storage | ||
mkdir qdrant_snapshots | ||
nohup docker run -d -p 6333:6333 -p 6334:6334 \ | ||
-v $(pwd)/qdrant_storage:/qdrant/storage:z \ | ||
-v $(pwd)/qdrant_snapshots:/qdrant/snapshots:z \ | ||
qdrant/qdrant | ||
``` | ||
|
||
## Create the vector collection snapshot | ||
|
||
Delete the default collection if it exists. | ||
|
||
``` | ||
curl -X DELETE 'http://localhost:6333/collections/default' | ||
``` | ||
|
||
Create a new collection called default. Notice that it is 768 dimensions. That is the output vector size of the embedding model `nomic-embed-text-v1.5`. If you are using a different embedding model, you should use a dimension that fits the model. | ||
|
||
``` | ||
curl -X PUT 'http://localhost:6333/collections/default' \ | ||
-H 'Content-Type: application/json' \ | ||
--data-raw '{ | ||
"vectors": { | ||
"size": 768, | ||
"distance": "Cosine", | ||
"on_disk": true | ||
} | ||
}' | ||
``` | ||
|
||
Download a program to chunk a document and create embeddings. | ||
|
||
``` | ||
curl -LO https://github.com/GaiaNet-AI/embedding-tools/raw/main/markdown_embed/markdown_embed.wasm | ||
``` | ||
|
||
It chunks the document based on markdown sections. You can check out the [Rust source code](https://github.com/GaiaNet-AI/embedding-tools/tree/main/markdown_embed) here and modify it if you need to use a different chunking strategy. | ||
|
||
Next, you can run the program by passing a collection name, vector dimension, and the source document. You can pass in the desired markdown heading level for chunking using the `--heading_level` option. Make sure that Qdrant is running on your local machine. The model is preloaded under the name embedding. The wasm app then uses the embedding model to create the 768-dimension vectors from `paris.md` and saves them into the default collection. | ||
|
||
``` | ||
curl -LO https://huggingface.co/datasets/gaianet/paris/raw/main/paris.md | ||
wasmedge --dir .:. \ | ||
--nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5-f16.gguf \ | ||
markdown_embed.wasm embedding default 768 paris.md --heading_level 1 | ||
``` | ||
|
||
You can create a snapshot of the collection, which can be shared and loaded into a different Qdrant database. You can find the snapshot file in the `qdrant_snapshots` directory. | ||
|
||
``` | ||
curl -X POST 'http://localhost:6333/collections/default/snapshots' | ||
``` | ||
|
||
We also recommend you to compress the snapshot file for GaiaNet node use. | ||
|
||
``` | ||
tar czvf my.snapshot.tar.gz my.snapshot | ||
``` | ||
|
||
Finally, upload the `my.snapshot.tar.gz` file to Huggingface so that the GaiaNet node can download and use it. | ||
|
||
## Next steps | ||
|
||
* [Start](../../node-guide/quick-start) a new GaiaNet node | ||
* [Customize](../../node-guide/customize) the GaiaNet node | ||
|
||
Have fun! |
Oops, something went wrong.