-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default ramalama serve to only listen on localhost #876
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request modifies the Sequence diagram for ramalama serve interactionsequenceDiagram
participant User
participant CLI
participant Container
User->>CLI: ramalama serve
CLI->>CLI: Reads config (host = 127.0.0.0)
CLI->>Container: Starts container with llama.cpp listening on 127.0.0.0
Container-->>CLI: Container running
CLI-->>User: Service running on localhost
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rhatdan - I've reviewed your changes - here's some feedback:
Overall Comments:
- It might be worth mentioning the security implications of binding to 0.0.0.0 vs 127.0.0.1 in the commit message.
Here's what I looked at during the review
- 🟡 General issues: 3 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
docs/ramalama.conf
Outdated
@@ -42,7 +42,7 @@ | |||
|
|||
# IP address for llama.cpp to listen on. | |||
# | |||
#host = "0.0.0.0" | |||
#host = "127.0.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): Clarify the loopback IP change from 0.0.0.0 to 127.0.0.0.
Typically, '127.0.0.1' is used as the loopback address. Using '127.0.0.0' might unintentionally cause binding issues. Please confirm if this change is intentional.
ramalama/cli.py
Outdated
@@ -856,7 +856,7 @@ def serve_parser(subparsers): | |||
run_serve_perplexity_args(parser) | |||
add_network_argument(parser, "") | |||
parser.add_argument("-d", "--detach", action="store_true", dest="detach", help="run the container in detached mode") | |||
parser.add_argument("--host", default=config.get('host', "0.0.0.0"), help="IP address to listen") | |||
parser.add_argument("--host", default=config.get('host', "127.0.0.0"), help="IP address to listen") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): Review the host IP default in CLI arguments.
The default is now set to '127.0.0.0', which is uncommon for a loopback address. Confirm if this specific IP should be used or if '127.0.0.1' is more appropriate.
docs/ramalama.conf.5.md
Outdated
@@ -78,7 +78,7 @@ Run RamaLama using the specified container engine. | |||
Valid options are: Podman and Docker | |||
This field can be overridden by the RAMALAMA_CONTAINER_ENGINE environment variable. | |||
|
|||
**host**="0.0.0.0" | |||
**host**="127.0.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential typo in IP address
The IP address 127.0.0.0
appears to be a typo. If the intention is to bind to the loopback interface, it should be 127.0.0.1
. If all interfaces are intended, then 0.0.0.0
should be used. Given the previous value was 0.0.0.0
, this is likely what was intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback, we will generate more comments like this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./bin/ramalama serve --port 9000 tinyllama ✔ 20.18.1
build: 4763 (f777a73e) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
system info: n_threads = 8, n_threads_batch = 8, total_threads = 12
system_info: n_threads = 8 (n_threads_batch = 8) / 12 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 |
main: couldn't bind HTTP server socket, hostname: 127.0.0.0, port: 9000
on macOS it needs to be 127.0.0.1
I think this is a good idea, but just confirming. Will we still be able to access the server outside the container when we run: "ramalama serve" ? I think that's why we set it to 0.0.0.0 at least initially. I think this is worth a quick manual test to check. I wouldn't be surprised if this config has kinda multiple contexts also, like being passed to podman, llama.cpp and/or vllm separately, can't remember. |
On all platforms it needs to be 127.0.0.1 , I think it's just a typo :) |
@ericcurtin to your question: no it doesn't work with containers so I can't reach the port as it's not exposed by the process inside the container |
I think we want that server somewhat accessible externally. Although 0.0.0.0 does seem over the top and insecure. 127.0.0.1 and only accessible inside the container also seems over the top, too secure. I think what the user would expect as a default is that is behaves sort of as if we weren't containerized at all, like if we ran llama-server on 127.0.0.1 with no containers. |
And yeah it's important to test this change in particular with and without containers, because what's considered a local network is kinda different in both cases. |
on macOS/Windows, there is a process on the host listening on 127.0.0.1 that redirects the traffic to the container but the process inside the containers needs to bind on 0.0.0.0 so maybe a special case when running inside containers ? b/c right now with this PR it's failing on macOS/Windows when using containers and the default command |
So --host is only used with --nocontainer then? They should conflict if set. |
24e3a58
to
c9b37af
Compare
I tested this on macOS, with no containers it's fine, works as expected. When using containers it's not fine, the port is inaccessible |
macOS is a funny case because we are also dealing with podman-machine networking, as well as the container, as well as macOS host OS. |
f689528
to
7dcb7fa
Compare
Currently RamaLama is listening on 0.0.0.0 which means that it can listen on all ports, including ports that are exposed outside of the host. Moving to 127.0.0.1 means that the service is only available on the local system. This will only effect llama.cpp running without containers. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Tested this again now that it's green: via native macOS - works (127.0.0.1:8080 is accessible in the local network on the machine as no encapsulation is involved) If we merge this and people start using it, they will complain there is no connectivity to the server, maybe some of these options fix things I dunno:
|
With containers, you have to open the firewall rules to access, the same settings without this change would work. |
Test on the MAC or Linux without the patch and the ports on the host will not be exposed, I believe. |
Without this PR, all 3 scenarios expose the port |
I think it can't be merged like that else default experience for mac/windows users using containers will be painful default should not be changed for at least these use cases |
Summary by Sourcery
Documentation: