Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR sets up the basic infrastructure to run an LLM inside Min using node-llama-cpp inside a utility process. Any llama.cpp-formatted model file should work; the model can be configured by updating
modelPath
insidellmService.mjs
. My testing so far has been with either this model or this one.My original intent with this was to see if it was possible to generate high-quality page summaries to display in the searchbar. Unfortunately, with llama-3.2-1b, the quality of the summaries seems quite poor. llama-3.2-3b does much better, but keeping the model loaded requires around 5GB of memory. I think this means that any use case that requires the model to continually be loaded in the background is infeasible, but it might work in a situation where the user explicitly requests to use it, which would allow us to load the model for a brief period of time and then immediately unload it. I'm planning to experiment with language translation (replacing the current cloud-based version) and with an explicit "summarize page" command, but if anyone has additional ideas for where this could be useful, I'd be happy to test them.