Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/incus: add AppArmor rules #386060

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

MakiseKurisu
Copy link
Contributor

@MakiseKurisu MakiseKurisu commented Mar 1, 2025

Add the necessary AppArmor rules when incusd tries to extract an image to create a new instance.

Fix #350012

Details can be found here.

CC @NixOS/lxc @umbra @broizter since I can't select them as reviewers.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Mar 1, 2025
@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Mar 1, 2025
@MakiseKurisu
Copy link
Contributor Author

Right now this ruleset is hand picked, since I err on the safe side to avoid broad rules, though there are some very broad rules defined in apparmor-profiles-4.0.3/etc/apparmor.d/abstractions/base. So alternatively, it can also be those rules to be fixed to solve the original issue (they don't make sense on our system any way):

  # we might as well allow everything to use common libraries
  /{usr/,}lib{,32,64}/**                r,
  /{usr/,}lib{,32,64}/**.so*       mr,
  /{usr/,}lib/@{multiarch}/**            r,
  /{usr/,}lib/@{multiarch}/**.so*   mr,
  /{usr/,}lib/tls/i686/{cmov,nosegneg}/*.so*    mr,
  /{usr/,}lib/i386-linux-gnu/tls/i686/{cmov,nosegneg}/*.so*    mr,

@adamcstephens
Copy link
Contributor

I'd love to get this support added, but we'll want a nixos test to validate it. Both for this PR and to ensure it continues to work. Are you willing to look into adding one? This test suite is a not trivial, but hopefully the patterns are clear enough. If not, feel free to ask here or on matrix #lxc:nixos.org

@MakiseKurisu MakiseKurisu requested review from adamcstephens and removed request for adamcstephens March 3, 2025 05:04
@mkg20001
Copy link
Member

mkg20001 commented Mar 4, 2025

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 386060


x86_64-linux

⏩ 2 packages blacklisted:
  • nixos-install-tools
  • tests.nixos-functions.nixos-test

@mkg20001
Copy link
Member

mkg20001 commented Mar 4, 2025

When running nixpkgs-review against this pr and building nixosTests.incus this error happens:

(please rebase against master and then fix the error)

error:
       … while evaluating the attribute 'drvPath'
         at /home/maciej/.cache/nixpkgs-review/pr-386060/nixpkgs/lib/customisation.nix:418:7:
          417|     // {
          418|       drvPath =
             |       ^
          419|         assert condition;

       … while calling the 'derivationStrict' builtin
         at <nix/derivation-internal.nix>:34:12:
           33|
           34|   strict = derivationStrict drvAttrs;
             |            ^
           35|

       (stack trace truncated; use '--show-trace' to show the full, detailed trace)

       error: The option `nodes.machine.security.dbus' does not exist. Definition values:
       - In `makeTest parameters':
           {
             apparmor = "";
           }

```

@adamcstephens
Copy link
Contributor

-      security = {
-        apparmor.enable = appArmor;
-        dbus.apparmor = lib.optionalString appArmor "enabled";
-      };
+      security.apparmor.enable = appArmor;
+
+      services.dbus.apparmor = lib.optionalString appArmor "enabled";

But that's not enough to get things working. The tests fail on dnsmasq getting blocked.

vm-test-run-incus> machine # [   18.050936] audit: type=1400 audit(1741144712.157:8): apparmor="DENIED" operation="open" class="file" profile="incus_dnsmasq-incusbr0_</var/lib/incus>" name="/nix/store/0hwjxv6z4myq1s2ncj1kapbb8vzbj15p-dbus-1.14.10-lib/lib/libdbus-1.so.3.32.4" pid=1049 comm="dnsmasq" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

Copy link
Contributor

@adamcstephens adamcstephens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start but needs some fixing. You can try this locally by running nix-build -A nixosTests.incus.apparmor and you should immediately see the failure mkg posted.

appArmor = incusTest {
inherit lts pkgs system;
appArmor = true;
instanceContainer = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reduce the scope a bit more and disable legacy init for now. initLegacy = false; will also make the tests quicker by reducing the number of nixos instances.

@@ -4,6 +4,7 @@ import ../make-test-python.nix (
lib,

lts ? true,
appArmor ? false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you'll be in here anyway, please put this with the other test variants in alphabetical order. Thankd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that AA is more like a general setting like LTS, and not an individual test itself. So we might want to check more combinations later on.

I'll still change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we iron out the rules for all the incus things, I hope to turn apparmor on by default for the tests.

@MakiseKurisu
Copy link
Contributor Author

The tests fail on dnsmasq getting blocked.

This is because I'm only trying to fix instance creation initially, which just need to handle tar's AA usage. I don't use Incus network since I just bridge it with host interfaces.

dnsmasq also has its own AA profile, so its rule must also be studied. Though at this point I might as well use wildcard rules since it will become unmaintainable very quickly.

@adamcstephens
Copy link
Contributor

Yeah, I assumed you were just bridging. That's my typical use as well, but using the built in DHCP server is really the default setup and what the tests are using.

I tried to include the dnsmasq profile but that didn't seem to work. I'm admittedly pretty inexperienced with apparmor though.

I agree we should keep this maintainable. In the end, what's the real risk of letting processes read shared libraries out of the store?

@MakiseKurisu
Copy link
Contributor Author

I think it's more about removing the potential attack vector and the principle of minimal permission. Nix store is world readable which is why you should not put clear secret there, but well misconfiguration happens, and that's when a strict rule can make a difference. If you use AA you probably care about those things.

Though a correct implementation will probably need to parse the selected Incus derivation, find the underlying binaries (tar, xz, dnsmasq, etc.), and then generate a list of used so's. That feels way too complex, so I'm doing some broad rules now. However, the "container CPU limits can be hotplug changed" failed, and I don't see any obvious AA denies (apparmor="DENIED"):

machine: Waiting for successful exec: nproc | grep '^1$'
(finished: Waiting for successful exec: nproc | grep '^1$', in 0.12 seconds)
(finished: subtest: container CPU limits can be managed, in 4.77 seconds)
subtest: container CPU limits can be hotplug changed
machine: must succeed: incus config set container-systemd1 limits.cpu 2
(finished: must succeed: incus config set container-systemd1 limits.cpu 2, in 0.10 seconds)
machine: must succeed: sleep 1
(finished: must succeed: sleep 1, in 1.04 seconds)
machine: Waiting for successful exec: nproc | grep '^2$'
machine # [   44.713307] dhcpcd[763]: vethba0376a9: probing for an IPv4LL address
machine # [   49.009586] dhcpcd[763]: vethba0376a9: using IPv4LL address 169.254.201.150
machine # [   49.010516] dhcpcd[763]: vethba0376a9: adding route to 169.254.0.0/16
Test "container CPU limits can be hotplug changed" failed with error: "action timed out after 90 seconds"
cleanup

@MakiseKurisu MakiseKurisu changed the title nixos/incus: add AppArmor rules to support instance creation nixos/incus: add AppArmor rules Mar 5, 2025
@mkg20001
Copy link
Member

mkg20001 commented Mar 5, 2025

you may want to take a look at closure-info.nix: https://github.com/NixOS/nixpkgs/blob/9ba655c111d44efe1f127f7aabd12148842988f5/pkgs/build-support/closure-info.nix

this may help with getting the binaries incus uses

@adamcstephens
Copy link
Contributor

However, the "container CPU limits can be hotplug changed" failed, and I don't see any obvious AA denies

If you run the normal container tests does this pass?

@MakiseKurisu
Copy link
Contributor Author

Seems so:

(finished: Waiting for successful exec: nproc | grep '^1$', in 0.51 seconds)
(finished: subtest: container CPU limits can be managed, in 11.41 seconds)
subtest: container CPU limits can be hotplug changed
machine: must succeed: incus config set container-legacy1 limits.cpu 2
(finished: must succeed: incus config set container-legacy1 limits.cpu 2, in 0.35 seconds)
machine: must succeed: sleep 1
(finished: must succeed: sleep 1, in 1.20 seconds)
machine: Waiting for successful exec: nproc | grep '^2$'
(finished: Waiting for successful exec: nproc | grep '^2$', in 0.56 seconds)
(finished: subtest: container CPU limits can be hotplug changed, in 2.11 seconds)
subtest: container memory limits can be managed
machine: must succeed: incus stop container-legacy1

@MakiseKurisu
Copy link
Contributor Author

Well I ran AA test again and it passed that specific sub test, but now failed on a different one:

subtest: containers stop with incus-startup.service
machine: must succeed: incus info container-systemd1 | grep 'PID'
(finished: must succeed: incus info container-systemd1 | grep 'PID', in 0.36 seconds)
machine: must succeed: ps 5006
(finished: must succeed: ps 5006, in 0.20 seconds)
machine: must succeed: systemctl stop incus-startup.service
machine # [  210.055861] systemd[1]: Stopping Incus Instances Startup/Shutdown...
machine # [  210.627582] incusd[5446]: time="2025-03-06T12:06:53Z" level=warning msg="Failed shutting down instance, forcefully stopping" err="Instance is busy running a \"update\" operation" instance=container-systemd1 project=default
machine # [  210.651829] incusd[5446]: time="2025-03-06T12:06:53Z" level=warning msg="Failed forcefully stopping instance" err="Instance is busy running a \"update\" operation" instance=container-systemd1 project=default
machine # [  210.783793] incus-preseed-start[5535]: Error: Failed to update profile "default": The following instances failed to update (profile change still saved):
machine # [  210.791654] incus-preseed-start[5535]:  - Project: default, Instance: container-systemd2: Failed to create instance update operation: Instance is busy running a "stop" operation
machine # [  210.821691] systemd[1]: incus-preseed.service: Main process exited, code=exited, status=1/FAILURE
machine # [  210.842782] systemd[1]: incus-preseed.service: Failed with result 'exit-code'.
machine # [  210.853898] systemd[1]: Failed to start Incus initialization with preseed file.
machine # [  211.939745] veth0741ff3a: renamed from physfyZy6R
machine # [  211.948626] incusbr0: port 2(veth338bbc2f) entered disabled state
machine # [  211.357546] dhcpcd[749]: veth338bbc2f: carrier lost
machine # [  211.689915] dhcpcd[749]: veth338bbc2f: deleting route to 169.254.0.0/16
machine # [  211.896885] (udev-worker)[5598]: Network interface NamePolicy= disabled on kernel command line.
machine # [  211.972518] (udev-worker)[5598]: physfyZy6R: Process '/nix/store/4k90qpzh1a4sldhnf7cxwkm9c0agq4fp-bash-interactive-5.2p37/bin/sh -c 'echo 2 > /proc/sys/net/ipv6/conf/physfyZy6R/use_tempaddr'' failed with exit code 1.
machine # [  212.924978] veth338bbc2f: left allmulticast mode
machine # [  212.932694] veth338bbc2f: left promiscuous mode
machine # [  212.939616] incusbr0: port 2(veth338bbc2f) entered disabled state
machine # [  212.734620] dhcpcd[749]: dhcpcd_prestartinterface: No such device
machine # [  212.738740] dhcpcd[749]: veth0741ff3a: waiting for carrier
machine # [  212.745666] dhcpcd[749]: veth338bbc2f: removing interface
machine # [  213.122908] dhcpcd[749]: veth0741ff3a: removing interface
machine # [  214.658502] audit: type=1400 audit(1741262816.543:29): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus-container-systemd2_</var/lib/incus>" pid=5655 comm="apparmor_parser"
machine # [  214.814538] veth4fab0c49: left allmulticast mode
machine # [  214.817911] veth4fab0c49: left promiscuous mode
machine # [  214.821499] incusbr0: port 1(veth4fab0c49) entered disabled state
machine # [  214.243739] dnsmasq[5515]: error binding DHCP socket to device incusbr0
machine # [  214.249759] dnsmasq-dhcp[5515]: router advertisement on fd42:ee43:56f:b7a7::,
machine # [  215.771938] audit: type=1400 audit(1741262817.657:30): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="incus_dnsmasq-incusbr0_</var/lib/incus>" pid=5663 comm="apparmor_parser"
machine # [  215.516855] systemd[1]: incus-startup.service: Deactivated successfully.
machine # [  215.528856] systemd[1]: Stopped Incus Instances Startup/Shutdown.
machine # [  215.535649] systemd[1]: incus.service: Deactivated successfully.
machine # [  215.542903] systemd[1]: incus.service: Unit process 4975 (lxcfs) remains running after unit stopped.
machine # [  215.552875] systemd[1]: incus.service: Consumed 4.237s CPU time, 1.6G memory peak, 3M read from disk, 4M written to disk, 808B incoming IP traffic, 891B outgoing IP traffic.
(finished: must succeed: systemctl stop incus-startup.service, in 5.77 seconds)
machine: waiting for failure: ps 5006
Test "containers stop with incus-startup.service" failed with error: "action timed out after 120 seconds"

@MakiseKurisu
Copy link
Contributor Author

I think this is probably something on my computer though. When I reran normal container test, it failed at the same sub test.

@MakiseKurisu
Copy link
Contributor Author

Just did another run when I booted up the computer, and it clears for all tests (there is apparmor="STATUS" log which indicates that AA is enabled):

machine: must succeed: systemctl start incus-startup.service
machine # [   82.659328] systemd[1]: Starting Incus Instances Startup/Shutdown...
machine # [   83.276376] systemd[1]: Starting Incus Container and Virtual Machine Management Daemon...
machine # [   85.012275] audit: type=1400 audit(1741349261.921:32): apparmor="STATUS" operation="profile_load" profile="unconfined" name="incus_dnsmasq-incusbr0_</var/lib/incus>" pid=5911 comm="apparmor_parser"
machine # [   84.736851] dnsmasq[5912]: started, version 2.90 cachesize 150
machine # [   84.737816] dnsmasq[5912]: compile time options: IPv6 GNU-getopt DBus no-UBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset nftset auth cryptohash DNSSEC loop-detect inotify dumpfile
machine # [   84.739457] dnsmasq-dhcp[5912]: DHCP, IP range 10.0.10.2 -- 10.0.10.254, lease time 1h
machine # [   84.740723] dnsmasq-dhcp[5912]: DHCPv6 stateless on incusbr0
machine # [   84.742277] dnsmasq-dhcp[5912]: DHCPv4-derived IPv6 names on incusbr0
machine # [   84.743588] dnsmasq-dhcp[5912]: router advertisement on incusbr0
machine # [   84.745450] dnsmasq-dhcp[5912]: DHCPv6 stateless on fd42:e911:e611:7bc9::, constructed for incusbr0
machine # [   84.747236] dnsmasq-dhcp[5912]: DHCPv4-derived IPv6 names on fd42:e911:e611:7bc9::, constructed for incusbr0
machine # [   84.748906] dnsmasq-dhcp[5912]: router advertisement on fd42:e911:e611:7bc9::, constructed for incusbr0
machine # [   84.750901] dnsmasq-dhcp[5912]: IPv6 router advertisement enabled
machine # [   84.754493] dnsmasq-dhcp[5912]: DHCP, sockets bound exclusively to interface incusbr0
machine # [   84.756412] dnsmasq[5912]: using only locally-known addresses for incus
machine # [   84.758261] dnsmasq[5912]: reading /etc/resolv.conf
machine # [   84.759773] dnsmasq[5912]: using nameserver 10.0.2.3#53
machine # [   84.761661] dnsmasq[5912]: using only locally-known addresses for incus
machine # [   84.770627] dnsmasq[5912]: read /etc/hosts - 5 names
machine # [   84.771552] dnsmasq-dhcp[5912]: read /var/lib/incus/networks/incusbr0/dnsmasq.hosts/container-systemd1.eth0
machine # [   84.773279] dnsmasq-dhcp[5912]: read /var/lib/incus/networks/incusbr0/dnsmasq.hosts/container-systemd2.eth0
machine # [   85.360669] systemd[1]: Finished Incus Instances Startup/Shutdown.
(finished: must succeed: systemctl start incus-startup.service, in 2.75 seconds)
(finished: subtest: containers stop with incus-startup.service, in 5.37 seconds)
(finished: subtest: container remains running when softDaemonRestart is enabled and service is stopped, in 8.48 seconds)
machine: must succeed: incus list --format json --all-projects
machine # [   85.468658] incusd[5857]: time="2025-03-07T12:07:42Z" level=warning msg="Failed to update instance types: Get \"https://images.linuxcontainers.org/meta/instance-types/.yaml\": lookup images.linuxcontainers.org: no such host"
machine # [   85.474883] incusd[5857]: time="2025-03-07T12:07:42Z" level=error msg="Failed updating instance types" err="Get \"https://images.linuxcontainers.org/meta/instance-types/.yaml\": lookup images.linuxcontainers.org: no such host"
(finished: must succeed: incus list --format json --all-projects, in 0.12 seconds)
subtest: Stopping all running instances
(finished: subtest: Stopping all running instances, in 0.00 seconds)
(finished: run the VM test script, in 86.92 seconds)
machine # [   85.998281] incusbr0: port 1(vethb310388f) entered blocking state
machine # [   86.001698] incusbr0: port 1(vethb310388f) entered disabled state
machine # [   86.003758] vethb310388f: entered allmulticast mode
machine # [   86.006706] vethb310388f: entered promiscuous mode
machine # [   85.644512] dhcpcd[758]: vethb310388f: waiting for carrier
test script finished in 87.05s
cleanup
kill machine (pid 11)
qemu-system-x86_64: terminating on signal 15 from pid 8 (/nix/store/26yi95240650jxp5dj78xzch70i1kzlz-python3-3.12.9/bin/python3.12)
kill vlan (pid 9)
(finished: cleanup, in 0.01 seconds)
/nix/store/aw99v6lf37yjzys9j4cxs9hwksz7gdq7-vm-test-run-incus

So I can only think something on my computer is causing the issue. I am using a virtual machine as my primary PC so might be some nested virtualization qwirks.

Probably worth for reviewing again?

@broizter
Copy link

broizter commented Mar 7, 2025

I wanna test this pull request. How would I add the changes this PR makes into my NixOS configuration? I'm using the 25.05 channel. No flakes. Is this correct?

nixpkgs.overlays = [
  (self: super: {
    incus = super.incus.overrideAttrs (old: {
      patches = (old.patches or []) ++ [
        (super.fetchpatch {
          url = "https://github.com/NixOS/nixpkgs/pull/386060.patch";
          sha256 = "xxxx";
        })
      ];
    });
  })
];

@bjornfor
Copy link
Contributor

bjornfor commented Mar 7, 2025

I wanna test this pull request. How would I add the changes this PR makes into my NixOS configuration? I'm using the 25.05 channel. No flakes. Is this correct?

No, that would apply this PR, a nixpkgs patch, to the incus package -- and incus and nixpkgs are two entirely different code bases. I think there was a proposal for a NixOS module that would apply nixpkgs patches onto itself, but I cannot find anything now.

Anyway, I think for one-off testing the easiest is to clone nixpkgs and checkout this PR, then build your NixOS with NIX_PATH=nixpkgs=/path/to/local/nixpkgs nixos-rebuild ....

@MakiseKurisu
Copy link
Contributor Author

MakiseKurisu commented Mar 8, 2025

Channel can serve a similar purpose as flake input, but it is 100% easier to use flake, and I usually mention how I dogfood my PR in my own nixos-config in the PR message (check my past PRs), but this was quite trivial so I didn't do it this time.

The easiest for you though is to just copy the extra AA rules in nixos/modules/virtualisation/incus.nix in your own config.

If you actually want to add PR as an input without flake, it should be something like this:

  1. run sudo nix-channel --add https://github.com/MakiseKurisu/nixpkgs/archive/refs/heads/incusd.tar.gz incusd && sudo nix-channel --update
  2. Add following to your nixos-config:
  disabledModules = [ "virtualisation/incus.nix" ];
  imports = [
    <incusd/nixos/modules/virtualisation/incus.nix>
   ];

@adamcstephens adamcstephens dismissed their stale review March 10, 2025 15:15

addressed

@adamcstephens
Copy link
Contributor

@MakiseKurisu are you interested in completing the apparmor support? I was curious so ran this against the full test suite, but the VM support isn't currently working either.

If not, we can get this merged, but it would be great to work out the remaining blockers so we can enable apparmor by default in the tests.

vm-test-run-incus> machine # [  159.781286] audit: type=1400 audit(1741616116.423:81): apparmor="DENIED" operation="exec" class="file" profile="incus-vm-legacy1_</var/lib/incus>" name="/nix/store/r9vdy6ssql2r7f51pwc1wyzqrd0kar12-qemu-host-cpu-only-9.2.0/bin/.qemu-system-x86_64-wrapped" pid=14051 comm="qemu-system-x86" requested_mask="x" denied_mask="x" fsuid=0 ouid=0

@MakiseKurisu
Copy link
Contributor Author

I can check this tomorrow to see how to handle this. Just reinstalled my main system today.

Currently limit the test scope to instanceContainer since there is a known issue.
allTests might cause too many false positives.

initLegacy is also disabled to reduce test size.
@MakiseKurisu
Copy link
Contributor Author

Update rule now clears nixosTests.incus.all. Let me know if you want to enable this as default in this PR as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incus: Creating new containers/vms blocked by apparmor
5 participants