Skip to content

Commit

Permalink
Merge pull request #60 from intls/fix-docs
Browse files Browse the repository at this point in the history
Fixed-Docs-Errors
  • Loading branch information
alabulei1 authored Sep 6, 2024
2 parents 417d36f + 5f16b0e commit 54b6997
Show file tree
Hide file tree
Showing 16 changed files with 87 additions and 87 deletions.
20 changes: 10 additions & 10 deletions docs/creator-guide/finetune/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
sidebar_position: 1
---

# Finetune LLMs
# Fine-tune LLMs

You could finetune an open-source LLM to
You could fine-tune an open-source LLM to

* teach it to follow conversations
* teach it to respect and follow instructions
* make it refuse to answer certain questions
* give it a specific "speaking" style
* make it response in certain formats (e.g., JSON)
* give it focus on a specific domain area
* teach it certain knowledge
* Teach it to follow conversations.
* Teach it to respect and follow instructions.
* Make it refuse to answer certain questions.
* Give it a specific "speaking" style.
* Make it response in certain formats (e.g., JSON).
* Give it focus on a specific domain area.
* Teach it certain knowledge.

To do that, you need to create a set of question and answer pairs to show the model the prompt and the expected response.
Then, you can use a finetuning tool to perform the training and make the model respond the expected answer for
Then, you can use a fine-tuning tool to perform the training and make the model respond the expected answer for
each question.
24 changes: 12 additions & 12 deletions docs/creator-guide/finetune/llamacpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ sidebar_position: 2

# llama.cpp

The popular llama.cpp tool comes with a `finetune` utility. It works well on CPUs! This finetune guide is reproduced with
The popular llama.cpp tool comes with a `finetune` utility. It works well on CPUs! This fine-tune guide is reproduced with
permission from Tony Yuan's [Finetune an open-source LLM for the chemistry subject](https://github.com/YuanTony/chemistry-assistant/tree/main/fine-tune-model) project.

## Build the finetune utility from llama.cpp
## Build the fine-tune utility from llama.cpp

The `finetune` utility in llama.cpp can work with quantitized GGUF files on CPUs, and hence dramatically reducing the hardware requirements and expenses for finetuning LLMs.
The `finetune` utility in llama.cpp can work with quantized GGUF files on CPUs, and hence dramatically reducing the hardware requirements and expenses for fine-tuning LLMs.

Checkout and download the llama.cpp source code.
Check out and download the llama.cpp source code.

```
git clone https://github.com/ggerganov/llama.cpp
Expand All @@ -27,7 +27,7 @@ cmake ..
cmake --build . --config Release
```

If you have Nvidia GPU and CUDA toolkit installed, you should build llama.cpp with CUDA support.
If you have NVIDIA GPU and CUDA toolkit installed, you should build llama.cpp with CUDA support.

```
mkdir build
Expand All @@ -38,7 +38,7 @@ cmake --build . --config Release

## Get the base model

We are going to use Meta's Llama2 chat 13B model as the base model. Note that we are using a Q5 quantitized GGUF model file directly to save computing resources. You can use any of the Llama2 compatible GGUF models on Hugging Face.
We are going to use Meta's Llama2 chat 13B model as the base model. Note that we are using a Q5 quantized GGUF model file directly to save computing resources. You can use any of the Llama2 compatible GGUF models on Hugging Face.

```
cd .. # change to the llama.cpp directory
Expand All @@ -60,24 +60,24 @@ What is Mercury? | Mercury is a silver colored metal that is liquid at room temp

> We used GPT-4 to help me come up many of these QAs.
Then, we wrote a [Python script](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/convert.py) to convert each row in the CSV file into a sample QA in the Llama2 chat template format. Notice that each QA pair starts with `<SFT>` as an indicator for the finetune program to start a sample. The result [train.txt](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/train.txt) file can now be used in fine-tuning.
Then, we wrote a [Python script](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/convert.py) to convert each row in the CSV file into a sample QA in the Llama2 chat template format. Notice that each QA pair starts with `<SFT>` as an indicator for the fine-tune program to start a sample. The result [train.txt](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/train.txt) file can now be used in fine-tuning.

Put the [train.txt](https://raw.githubusercontent.com/YuanTony/chemistry-assistant/main/fine-tune-model/train.txt) file in the `llama.cpp/models` directory with the GGUF base model.

## Finetune!

Use the following command to start the fine-tuning process on your CPUs. I am putting it in the background so that it can run continuous now.
It could several days or even a couple of weeks depending on how many CPUs you have.
Use the following command to start the fine-tuning process on your CPUs. I am putting it in the background so that it can run continuously now.
It could take several days or even a couple of weeks depending on how many CPUs you have.

```
nohup ../build/bin/finetune --model-base llama-2-13b-chat.Q5_K_M.gguf --lora-out lora.bin --train-data train.txt --sample-start '<SFT>' --adam-iter 1024 &
```

You can check the process every a few hours in the `nohup.out` file. It will report `loss` for each iteration. You can stop the process when the `loss` goes consistently under `0.1`.
You can check the process every few hours in the `nohup.out` file. It will report the `loss` for each iteration. You can stop the process when the `loss` goes consistently under `0.1`.

**Note 1** If you have multiple CPUs (or CPU cores), you can speed up the finetuning process by adding a `-t` parameter to the above command to use more threads. For example, if you have 60 CPU cores, you could do `-t 60` to use all of them.
**Note 1** If you have multiple CPUs (or CPU cores), you can speed up the fine-tuning process by adding a `-t` parameter to the above command to use more threads. For example, if you have 60 CPU cores, you could do `-t 60` to use all of them.

**Note 2** If your finetuning process is interrupted, you can restart it from `checkpoint-250.gguf`. The next file it outputs is `checkpoint-260.gguf`.
**Note 2** If your fine-tuning process is interrupted, you can restart it from `checkpoint-250.gguf`. The next file it outputs is `checkpoint-260.gguf`.

```
nohup ../build/bin/finetune --model-base llama-2-13b-chat.Q5_K_M.gguf --checkpoint-in checkpoint-250.gguf --lora-out lora.bin --train-data train.txt --sample-start '<SFT>' --adam-iter 1024 &
Expand Down
22 changes: 11 additions & 11 deletions docs/creator-guide/knowledge/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,37 +19,37 @@ compatible LLM service that is grounded by long-term knowledge on the server sid
can simply chat with it or provide realtime / short-term memory since the LLM is already aware of the
domain or background.

> For example, if you ask ChatGPT the question What is Layer 2, the answer is that Layer 2 is a concept from the computer network. However, if you ask a blockchain person, he answers that Layer 2 is a way to scale the original Ethereum network. That's the difference between a generic LLM and knowledge-supplemented LLMs.
> For example, if you ask ChatGPT the question What is Layer 2, the answer is that Layer 2 is a concept from the computer network. However, if you ask a blockchain person, they answer that Layer 2 is a way to scale the original Ethereum network. That's the difference between a generic LLM and knowledge-supplemented LLMs.
We will cover the external knowledge preparation and how a knowledge-supplemented LLM completes a conversation. If you have learned how a RAG application works, go to [Build a RAG application with Gaia](web-tool) to start building one.

1. Create embeddings for your own knowledge as the long-term memory
2. Lifecycle of a user query on a knowledge-supplemented LLM
1. Create embeddings for your own knowledge as the long-term memory.
2. Lifecycle of a user query on a knowledge-supplemented LLM.

For this solution, we will use

* a chat model like Llama-3-8B for generating responses to the user
* a text embedding model like [nomic-embed-text](https://huggingface.co/second-state/Nomic-embed-text-v1.5-Embedding-GGUF) for creating and retrieving embeddings
* a Vector DB like Qdrant for storing embeddings
* a chat model like Llama-3-8B for generating responses to the user.
* a text embedding model like [nomic-embed-text](https://huggingface.co/second-state/Nomic-embed-text-v1.5-Embedding-GGUF) for creating and retrieving embeddings.
* a Vector DB like Qdrant for storing embeddings.

## Workflow for creating knowledge embeddings

The first step is to create embeddings for our knowledge base and store the embeddings in a vector DB.

![create-embedding](https://github.com/GaiaNet-AI/docs/assets/45785633/2ff40178-64f4-4e2e-bbd9-f12ce35186b7)

First of all, we split the long text into sections (ie, chunks). All LLMs have a maximum context length. The model can't read the context if the text is too long.
First of all, we split the long text into sections (i.e, chunks). All LLMs have a maximum context length. The model can't read the context if the text is too long.
The most used rule for a Gaia node is to put the content in one chapter together. Remember, insert a blank line between two chunks. You can also use other algorithms to chunk your text.

After chunking the document, we can convert these chunks to embeddings leveraging the embedding model. The embedding model is trained to create embeddings based on text and search for similar embeddings. We will use the latter function in the process of user query.
After chunking the document, we can convert these chunks into embeddings leveraging the embedding model. The embedding model is trained to create embeddings based on text and search for similar embeddings. We will use the latter function in the process of user query.

Additionally, we will also need a vector DB to store the embeddings so that we can retrieve these embeddings quickly at any time.
Additionally, we will need a vector DB to store the embeddings so that we can retrieve these embeddings quickly at any time.

On a Gaia node, we will get a database snapshot with the embeddings to use at last. Check out how to create your embeddings using [Gaia web tool](web-tool.md), [from a plain text file](text.md), and [from a markdown file](markdown.md).

## Lifecycle of a user query on a knoweldge-supplemented LLM
## Lifecycle of a user query on a knowledge-supplemented LLM

Next, let's learn the lifecycle of a user query on a knowledge-supplemented LLM. We will take the [a Gaia Node with Gaia knowledge](https://knowledge.gaianet.network/chatbot-ui/index.html) as an example.
Next, let's learn the lifecycle of a user query on a knowledge-supplemented LLM. We will take [a Gaia Node with Gaia knowledge](https://knowledge.gaianet.network/chatbot-ui/index.html) as an example.

![user-query-rag](https://github.com/GaiaNet-AI/docs/assets/45785633/c64b85ea-65f0-43d2-8ab3-78889d21c248)

Expand Down
2 changes: 1 addition & 1 deletion docs/creator-guide/knowledge/firecrawl.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_position: 12

# Knowledge base from a URL

In this section, we will discuss how to create a vector collection snapshot from a Web URL. First, we will parse url to a structured markdown file. Then, we will follow the steps from [Knowledge base from a markdown file](markdown.md) to create embedding for your URL.
In this section, we will discuss how to create a vector collection snapshot from a Web URL. First, we will parse the URL to a structured markdown file. Then, we will follow the steps from [Knowledge base from a markdown file](markdown.md) to create embedding for your URL.

## Parse the URL content to a markdown file

Expand Down
10 changes: 5 additions & 5 deletions docs/creator-guide/knowledge/web-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,18 @@ After formatted, it will look like the following.

```
What is a blockchain?
A blockchain is a distributed, cryptographically-secure database structure that allows network participants to establish a trusted and immutable record of transactional data without the need for intermediaries. A blockchain can execute a variety of functions beyond transaction settlement, such as smart contracts. Smart contracts are digital agreements that are embedded in code and that can have limitless formats and conditions. Blockchains have proven themselves as superior solutions for securely coordinating data, but they are capable of much more, including tokenization, incentive design, attack-resistance, and reducing counterparty risk. The very first blockchain was the Bitcoin blockchain, which itself was a culmination of over a century of advancements in cryptography and database technology.
A blockchain is a distributed, cryptographically-secure database structure that allows network participants to establish a trusted and immutable record of transactional data without the need for intermediaries. A blockchain can execute a variety of functions beyond transaction settlement, such as smart contracts. Smart contracts are digital agreements that are embedded in code and can have limitless formats and conditions. Blockchains have proven themselves as superior solutions for securely coordinating data, but they are capable of much more, including tokenization, incentive design, attack-resistance, and reducing counterparty risk. The very first blockchain was the Bitcoin blockchain, which was itself a culmination of over a century of advancements in cryptography and database technology.
What is blockchain software?
Blockchain software is like any other software. The first of its kind was Bitcoin, which was released as open source software, making it available to anyone to use or change. There are a wide variety of efforts across the blockchain ecosystem to improve upon Bitcoin's original software. Ethereum has its own open source blockchain software. Some blockchain software is proprietary and not available to the public.
```

## Generate the snapshot file

1. Visit this URL: https://tools.gaianet.xyz/, upload the above prepared txt file
2. Edit your `dbname` . ***Note: Do not include spaces or special characters in the dbname***
3. Choose Embedding model, we suggest use `nomic-embed-text-v1.5.f16`
4. Click the "Make RAG" button and wait
1. Visit this URL: https://tools.gaianet.xyz/, upload the above prepared txt file.
2. Edit your `dbname` . ***Note: Do not include spaces or special characters in the dbname***.
3. Choose Embedding model, we suggest use `nomic-embed-text-v1.5.f16`.
4. Click the "Make RAG" button and wait.

When finished, the chatbot will display GaiaNet Node config info. It is a JSON format as follows.

Expand Down
14 changes: 7 additions & 7 deletions docs/litepaper.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@ The emergence of ChatGPT and Large Language Model (LLM) has revolutionized how h
Agents are software applications that can complete tasks on its own autonomously like a human. The agent can understand the task, plan the steps to complete the task, execute all the steps, handle errors and exceptions, and deliver the results. While a powerful LLM could act as the “brain” for the agent, we need to connect to external data sources (eyes and ears), domain-specific knowledge base and prompts (skills), context stores (memory), and external tools (hands). For agent tasks, we often need to customize the LLM itself

* to reduce hallucinations in a specific domain,
* to generate responses in a specific format (eg a JSON schema),
* to answer “politically incorrect” questions (eg to analyze CVE exploits for an agent in the security domain),
* and to answer requests in a specific style (eg to mimic a person).
* to reduce hallucinations in a specific domain.
* to generate responses in a specific format (e.g., a JSON schema).
* to answer “politically incorrect” questions (e.g., to analyze CVE exploits for an agent in the security domain).
* and to answer requests in a specific style (e.g., to mimic a person).

![What is a GaiaNet agent](gaianet_agent.png)

Agents are complex software that require significant amount of engineering and resources. Today, most agents are close-source and hosted on SaaS-based LLMs. Popular examples include GPTs and Microsoft/GitHub copilots on OpenAI LLMs, and Duet on Google’s Gemini LLMs.

However, as we discussed, a key requirement for agents is to customize and adapt its underlying LLM and software stack for domain-specific tasks — an area where centralized SaaS perform very poorly. For example, with ChatGPT, every small task must be handled by a very large model. It is also enormously expensive to finetune or modify any ChatGPT models. The one-size-fits-all LLMs are detrimental to the agent use case in capabilities, alignment, and cost structure. Furthermore, the SaaS hosted LLMs lack privacy controls on how the agent’s private knowledge might be used and shared. Because of these shortcomings, it is difficult for individual knowledge workers to create and monetize agents for his or her own domain and tasks on SaaS platforms like OpenAI, Google, Anthropic, Microsoft and AWS.
However, as we discussed, a key requirement for agents is to customize and adapt its underlying LLM and software stack for domain-specific tasks — an area where centralized SaaS perform very poorly. For example, with ChatGPT, every small task must be handled by a very large model. It is also enormously expensive to fine-tune or modify any ChatGPT models. The one-size-fits-all LLMs are detrimental to the agent use case in capabilities, alignment, and cost structure. Furthermore, the SaaS hosted LLMs lack privacy controls on how the agent’s private knowledge might be used and shared. Because of these shortcomings, it is difficult for individual knowledge workers to create and monetize agents for his or her own domain and tasks on SaaS platforms like OpenAI, Google, Anthropic, Microsoft and AWS.

In this paper, we propose a decentralized software platform and protocol network for AI agents for everyone. Specifically, our goals are two-folds.

Expand All @@ -39,15 +39,15 @@ In this paper, we propose a decentralized software platform and protocol network
## Open-source and decentralization

As of April 2024, there are over 6000 open-source LLMs published on Hugging face. Compared with close-source LLMs, such as GPT4, open-source LLMs offer advantages in privacy, cost, and systematic bias. Even with general QA performance, open-source LLMs are closing the gap with close-source counterparties quickly.
As of April 2024, there are over 6000 open-source LLMs published on Hugging face. Compared with close-source LLMs, such as GPT-4, open-source LLMs offer advantages in privacy, cost, and systematic bias. Even with general QA performance, open-source LLMs are closing the gap with close-source counterparties quickly.

![Open vs close source LLMs](closed_vs_open.jpg)

For AI agent use cases, it has been demonstrated that smaller but task-specific LLMs often outperform larger general models.

However, it is difficult for individuals and businesses to deploy and orchestrate multiple finetuned LLMs on their own heterogeneous GPU infrastructure. The complex software stack for agents, as well as the complex interaction with external tools, are fragile and error-prone.

Furthermore, LLM agents have entirely different scaling characteristics than past application servers. LLM is extremely computationally intensive. A LLM agent server can typically only serve one user at a time, and it often blocks for seconds at a time. The scaling need is no longer to handle many async requests on a single server, but to load balance among many discreet servers on the internet scale.
Furthermore, LLM agents have entirely different scaling characteristics than past application servers. LLM is extremely computationally intensive. A LLM agent server can typically only serve one user at a time, and it often blocks for seconds at a time. The scaling need is no longer to handle many async requests on a single server, but to load balance among many discrete servers on the internet scale.

The GaiaNet project provides a cross-platform and highly efficient SDK and runtime for finetuned open-source LLMs with proprietary knowledge bases, customized prompts, structured responses, and external tools for function calling. A GaiaNet node can be started in minutes on any personal, cloud, or edge device. It can then offer services through an incentivized web3 network.

Expand Down
Loading

0 comments on commit 54b6997

Please sign in to comment.