Skip to content

Commit

Permalink
Merge pull request #29 from GaiaNet-AI/add-firecrawl
Browse files Browse the repository at this point in the history
Add firecrawl
  • Loading branch information
juntao authored Jul 11, 2024
2 parents 8100830 + 5b4e72e commit b09dc50
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 0 deletions.
Binary file added docs/creator-guide/knowledge/firecrawl-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
48 changes: 48 additions & 0 deletions docs/creator-guide/knowledge/firecrawl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
sidebar_position: 5
---

# Knowledge base from a URL

In this section, we will discuss how to create a vector collection snapshot from a Web URL. First, we will parse url to a structured markdown file. Then, we will follow the steps from [Knowledge base from a markdown file](markdown.md) to create embedding for your URL.

## Parse the URL content to a markdown file

Firecrawl can crawl and convert any website into LLM-ready markdown or structured data. It also supports crawling a URL and all accessible subpages.

> To use Firecrawl, you need to sign up on [Firecrawl](https://firecrawl.dev/) and get an API key.
First, install the dependencies. We are assuming that you already have Node.JS 20+ installed.

```
git clone https://github.com/JYC0413/firecrawl-integration.git
cd firecrawl-integration
npm install
```

Then, export the API key in the terminal.

```
export FIRECRAWL_KEY="your_api_key_here"
```

next, we can use the following command line to run the service.

```
node crawlWebToMd.js
```

After the application is running successfully, you will see the prompt appear on the Terminal.

![](firecrawl-01.png)

You can type your URL in the terminal right now. Here we have two choices.

* Multiple pages: input your link with `/` at the end, the program will crawl and convert the page and its subpages to one single markdown file. This way will cost lots of API token usage.
* One single page: input your link without `/` at the end. the program will crawl and convert the current page to one single markdown file.

The output markdown file will be located in this folder named `output.md`.

## Create embeddings from the markdown files

Please follow the tutorial [Knowledge base from a markdown file](markdown.md) to convert your markdown file to a snapshot of embeddings that can be imported into a GaiaNet node.

0 comments on commit b09dc50

Please sign in to comment.