Skip to content

Commit

Permalink
Update troubleshooting.md
Browse files Browse the repository at this point in the history
  • Loading branch information
juntao authored May 20, 2024
1 parent bb56c32 commit c890e09
Showing 1 changed file with 36 additions and 10 deletions.
46 changes: 36 additions & 10 deletions docs/node-guide/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,24 +21,30 @@ ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cuda_runtime/lib/libcudart.
ln -s /usr/local/lib/python3.10/dist-packages/nvidia/cublas/lib/libcublasLt.so.12 /usr/lib/libcublasLt.so.12
```

## The "Too many open files" Error on macOS
## Fail to recover from collection snapshot on Windows WSL

When running `gaianet init` to initialize a new node on macOS, you may encounter an error related to snapshot recovery if your snapshot contains a large amount of text. The error message may be the following:
On Windows WSL, you could see this error while running `gaianet init`.

```
* [Error] Failed to recover from the collection snapshot. {"status":{"error":"Service internal error: Too many open files (os error 24)"},"time":1.574064833}
* [Error] Failed to recover from the collection snapshot. {"status":{"error":"Service internal error: Too many open files (os error 24)"},"time":1.574064833}
* Import the Qdrant collection snapshot ...
The process may take a few minutes. Please wait ...
* [Error] Failed to recover from the collection snapshot. {"status":{"error":"Service internal error: Tokio task join error: task 1242 panicked"},"time":0.697784244}
```

This issue is caused by the default file descriptor (FD) limit on macOS, which is set to a relatively low value of 256.

To resolve this issue, you can increase the default FD limit on your system. To do so, run the following command:
When you look into the `~/gaianet/log/init-qdrant.log` file, you could see this line of error

```
ulimit -n 10000
2024-05-20T07:24:52.900895Z ERROR qdrant::startup: Panic occurred in file /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cgroups-rs-0.3.4/src/memory.rs at line 587: called `Result::unwrap()` on an `Err` value: Error { kind: ReadFailed("/sys/fs/cgroup/memory.high"), cause: Some(Os { code: 2, kind: NotFound, message: "No such file or directory" }) }
```

This will temporarily set the FD limit to 10,000. Next, use `gaianet init` and `gaianet start` commands in the SAME terminal.
The solution is to disable the `autoMemoryReclaim` feature in WSL. Step to turn on/off this feature:

1. Edit `C:\Users<Your user name>.wslconfig`
2. Remove or comment out `autoMemoryReclaim` in `[experimental]` section.

![]()

Thanks to [RoggeOhta](https://github.com/RoggeOhta) for discovering this. You can learn more about it [here](https://github.com/GaiaNet-AI/gaianet-node/issues/46).

## File I/O error

Expand Down Expand Up @@ -73,4 +79,24 @@ Warning: /main/consensus/consensus.snapshot: No such file or directory
curl: (23) Failure writing output to destination
```

The reason for this type of error is that when executing `gaianet init`, the comments in `config.json` are run. The solution is to delete the comments in `config.json` and re-run the `gaianet init` command.
The reason for this type of error is a mis-configured `config.json` file. The solution is to delete the comments in `config.json` and re-run the `gaianet init` command.

## The "Too many open files" Error on macOS

When running `gaianet init` to initialize a new node on macOS, you may encounter an error related to snapshot recovery if your snapshot contains a large amount of text. The error message may be the following:

```
* [Error] Failed to recover from the collection snapshot. {"status":{"error":"Service internal error: Too many open files (os error 24)"},"time":1.574064833}
* [Error] Failed to recover from the collection snapshot. {"status":{"error":"Service internal error: Too many open files (os error 24)"},"time":1.574064833}
```

This issue is caused by the default file descriptor (FD) limit on macOS, which is set to a relatively low value of 256.

To resolve this issue, you can increase the default FD limit on your system. To do so, run the following command:

```
ulimit -n 10000
```

This will temporarily set the FD limit to 10,000. Next, use `gaianet init` and `gaianet start` commands in the SAME terminal.

0 comments on commit c890e09

Please sign in to comment.