Add scalalib.ApeModule to produce APE (actually portable executable) assembly jar #4503

kiendang · 2025-02-07T10:16:06Z

Add an ApeModule to support producing actually portable executable assembly jars.

object foo extends JavaModule with scalalib.ApeModule

./mill show foo.assemblyApe

out.jar would run cross-platform (tested with arm Mac, x86 Linux and Windows (on cmd) with this example hello world project).

lefou

I can't tell what's the best name, but Ape feels a bit cryptic. It's IMHO not a well established term. Since we seem to use just one tool/technology jere, we might as well name the module after it. This is what we do in many other modules and it also leaves room for alternative implementations for APE. CosmoModule might be the better choice.

scalalib/src/mill/scalalib/ApeModule.scala

lefou · 2025-02-07T15:01:13Z

scalalib/src/mill/scalalib/ApeModule.scala

+    PathRef(Task.dest / "cosmocc" / "bin" / "cosmocc")
+  }
+
+  def launcherScript: T[Option[String]] = Task {


This needs a prefix. Either ape or cosmo.

lefou · 2025-02-07T15:04:06Z

scalalib/src/mill/scalalib/ApeModule.scala

+    }
+  }
+
+  def assemblyApe: T[PathRef] = Task {


cosmoAssembly or apeAssembly. Depends on which name we choose above. Should be consistent though.

lefou · 2025-02-07T15:08:14Z

scalalib/src/mill/scalalib/Assembly.scala

      destJar: os.Path,
      inputPaths: Agg[os.Path],
      manifest: mill.api.JarManifest = mill.api.JarManifest.MillDefault,
-      prependShellScript: Option[String] = None,
+      prepend: Option[os.Source] = None,


I assume you need a byte array instead of a string. Woud be a Option[String|Array[Byte]] sufficient? Or do you expect large data here? A os.Source feels a bit too generic.

Option[String|Array[Byte]] would work as well. The data is around 300K in this case.

@lihaoyi What do you prefer?

os.Source does feel a bit unusual, I think a 300kb byte array is probably fine for now

Basic question: What is the simplest way to pass a x: String | Array[Byte] to os.write? Looks like there is no implicit conversion for unions. os.write(..., x) does not work. x: os.Source, x.asInstanceOf[os.Source] doesn't work either. Is there any other way that's less clunky than x match { case x: String => os.write(..., x); case x: Array[Byte] => os.write(..., x) }?

I don't know the answer. Could be, this isn't fully implemented in upickle. But we could also either fallback to always use a byte array, which sounds a bit inconvenient for the other cases, or go with a Option[Either[Array[Byte],String]] for now.

I think just using byte array is fine. Strings can be converted with just .getBytes, so it's no big hardship. Definitely easier than wrapping in Eithers and Lefts and Rights

kiendang · 2025-02-07T16:07:08Z

@lefou Thanks for the review. Admittedly this is just a POC PR I quickly drafted up to demonstrate the idea. I haven't thought too much about organizing things yet, which obviously needs fleshing out if this is to be developed further.

Just want to mention a few drawbacks to this approach

The prepended compiled binary is around 300KB, quite a bit larger than the current shell script.
As I have tested, the assembly out.jar runs on arm Mac, arm and x86 Linux. On x86 Windows it runs on CMD, but doesn't on PowerShell (it just doesn't show anything). I'm not a Windows user so not yet sure what went wrong or even where to look. I'm using the same hello world example project here.
Even binaries compiled with cosmocc have hiccups with a couple of environments (https://github.com/jart/cosmopolitan?tab=readme-ov-file#platform-notes)
My launcher script use execvp which evokes the execve syscall. According to this table. execve has some problem on Windows, but that might have been solved in a recent update.

lihaoyi · 2025-02-08T02:06:13Z

In terms of naming, I think ActuallyPortableExecutableModule would be best. It's bit of an unusual/niche thing so having the name be verbose is probably better than conciseness.

300kb is fine; any Scala program gets 4-5mb from scala-library anyway, and most Java programs will have similar dependency trees. Mill's own executable assemblies bundle Coursier with Graal for a >60mb jar prepend and it works great

We need to see what's going on with Powershell to see if it can be resolved. Currently Powershell is the default shell for Windows Terminal, and the whole point of APEs is to work seamlessly on Windows environments where previously you would need to rename out.jar to out.bat. If it's technically impossible at the moment I don't mind closing out the bounty regardless, but we should dig in a bit to see why it doesn't work.

One end-to-end test that should be done is to build the Mill dist.executable jar with an APE prefix replacing the shell prefix, and verify that it can be run without the assembly needing to be renamed with a .bat extension.

kiendang · 2025-02-08T05:21:42Z

@lihaoyi I figured out the problem with PowerShell. It looks like PowerShell strictly follows PATHEXT. It won't run any binary if the file extension is not included in PATHEXT. Changing the file name to out.jar.com works. This means that we would have this naming problem on PowerShell regardless of the file content. It's impossible to make ./out.jar run as is on PowerShell, be it APE or any other format, unless the user already set $Env:PATHEXT += ';.JAR'. Now I understand why in most of Cosmopolitan examples the files have the .com extension.

kiendang · 2025-02-08T05:46:36Z

This section on PATHEXT is interesting

To ensure that scripts for another scripting language run in the current console session, add the file extension used by the scripting language. For example, to run Python scripts in the current console, add the .py extension to the environment variable. For Windows to support the .py extension as an executable file you must register the file extension using the ftype and assoc commands of the CMD command shell.

and I checked

cmd /c ftype jarfile

jarfile="C:\Program Files\Java\jdk-17\bin\javaw.exe" -jar "%1" %*

This means that if the user installs java using the Windows installer, which registers jarfile with ftype, and we could change the assembly zip to be able to run java -jar out.jar instead of java -cp out.jar <main class> then we could do ./out.jar on PowerShell. That's a big condition though. I suppose it wouldn't work if you get java via other methods, i.e. download the archive, through coursier, ...

lefou · 2025-02-08T12:18:36Z

In terms of naming, I think ActuallyPortableExecutableModule would be best

I don't like that or a shorter version ProtableExecModule. We should name it after the tool at work, CosmoModule. We also have ScalafmtModule, ProguardModule or JmhModule instead of FormatModule, ShrinkAndObfuscateModule or BenchmarkModule.

lihaoyi · 2025-02-10T01:39:25Z

@kiendang if the extension matters, what if we name the output file .exe? That should be executable on all windows shells right? And since Linux/Mac doesn't look at extensions, it should work there as well?

sake92 · 2025-02-11T19:00:50Z

This is the default for PATHEXT: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
So any of these should work: mill.com, mill.exe, mill.cmd.
Depends what we think makes most sense. Mill.cmd feels like middle ground to me.
Mill.com sounds like a website, mill.exe sounds windows specific.. might confuse people

kiendang · 2025-02-11T20:41:54Z

@lihaoyi Yup that would work.

@sake92 I actually think .EXE is the more appropriate here. The APE file format is a binary file format. It even starts with MZ which is the executable format for .EXE files. .CMD would be shell script files to be run by the Windows CMD shell. I don't think using either extension would mean much on Linux/Unix, but on Windows using .CMD instead of .EXE extension for these binaries would not be fitting.

@ all, I have a bit of problem doing this

One end-to-end test that should be done is to build the Mill dist.executable jar with an APE prefix

One way of enabling dist.ape.executable with the least amount of code I can think of is having dist.ape "extends" dist via an export clause

dist/build.mill

package build.dist

object `package` extends RootModule with InstallModule {
  ...
  
  object ape
    extends RootModule
    with InstallModule
    with mill.scalalib.ApeModule {

    export `package`.{executableRaw as _, *}

    override def executableRaw = ...
  }
}

but looks like mill doesn't support export clauses? If I do ./mill show dist.ape.executable I got

1 tasks failed
generateScriptSources dist/package.mill:334:21 expected "}"
    export `package`.{executableRaw as _, *}
                    ^
dist/package.mill:334:21 expected "}"
    export `package`.{executableRaw as _, *}

also inside dist/build.mill I can't seem able to access the mill.scalalib.ApeModule trait I added. Even putting something simple like trait x extends mill.scalalib.ApeModule would give me

[build.mill-61/65] compile
[build.mill-61] [info] compiling 1 Scala source to /home/kien/mill/out/mill-build/compile.dest/classes ...
[build.mill-61] [error] /home/kien/mill/dist/package.mill:333:35: type ApeModule is not a member of package mill.scalalib
[build.mill-61] [error] did you mean BomModule, RunModule, or SbtModule?
[build.mill-61] [error]   trait ape extends mill.scalalib.ApeModule {
[build.mill-61] [error]                                   ^
[build.mill-61] [error] one error found

Not sure if I miss something.

lihaoyi · 2025-02-11T22:59:51Z

@kiendang Let's go with @lefou's suggestion of CosmoModule?

I'm not sure about export, but your snippet looks wrong. You should not need to have nested RootModules, and you shouldn't need to use exports to share tasks between modules as you should use traits instead.

However, for now you can't use stuff defined in your PR in Mill's own {build,package}.mill scripts until we publish a new version and update .config/mill-version. So it should be fine to just exercise your code in unit/integration/example tests, and we can optionally integrate it into mill proper in a follow up

kiendang · 2025-02-12T14:43:53Z

Added the tests to integration/feature/cosmopolitan.

I moved mill.scalalib.ApeModule to mill.scalalib.cosmopolitan.CosmopolitanModule. Almost everywhere in the Cosmopolitan official repo and documentation it's referred to as Cosmopolitan instead of Cosmo (which I think is just a informal shortened name) so I think it's a better fit. If you think this is too long I'll change it to CosmoModule (sorry for the bikeshedding).

Right now where should I cache the Cosmo toolchain? I see #3930 but looks like nothing is standardized yet?

lihaoyi · 2025-02-12T15:08:38Z

Yes there's no standard global cache yet, so caching i the task.dest folder is fine for now

scalalib/src/mill/scalalib/cosmopolitan/CosmopolitanModule.scala

lefou · 2025-02-13T09:39:00Z

scalalib/src/mill/scalalib/cosmopolitan/CosmopolitanModule.scala

+        requests.get.stream(s"https://cosmo.zip/pub/cosmocc/${versionedCosmocc}.zip")
+      )
+      os.remove.all(Task.dest / versionedCosmocc)
+      os.call(("unzip", zip, "-d", Task.dest / versionedCosmocc))


Let's use os.unzip instead.

I did try os.unzip but it doesn't preserve permissions. Would have to have an extra step to give cosmocc executable permission. cosmoc.zip also contains other files. I'm not sure if not having the original permissions changes anything (though unlikely).

Let's go with os.zip then and os.perms.set to fix anything necessary, and open an issue on OS-Lib about os.unzip not preserving permissions so we can fix that

Unfortunately this doesn't look like a simple fix. With os.unzip after giving executable perm to files inside cosmocc/bin compilation still fails with

x86_64-linux-cosmo-gcc: fatal error: cannot execute 'cc1': posix_spawnp: No such file or directory compilation terminated.

Looks like os.unzip lose stuff beyond permissions, e.g. if I'm not wrong symlinks are not preserved but replaced with copied files, and I'm not sure what exactly fails the compilation.

ok, using a subprocess is probably fine for now, but can you open an issie upstream

lefou

The name you settled with is fine. But I'd like to have some more consistent naming. We now have the Cosmopolitan, cosmo and ape as prefix. Would be nice to get rid of ape in favor to cosmo.

kiendang · 2025-02-14T09:05:34Z

One more thing. The current shell script approach uses the JAVA_OPTS env var: java $JAVA_OPTS -cp .... It's not possible to do the exact same thing with C's execvp. execvp("java", {"java", getenv("JAVA_OPTS", ...)}) just doesn't work if JAVA_OPTS contains multiple values. Parsing and splitting JAVA_OPTS ourselves is out of the question.

The cleanest approach is to use java argument files, i.e. write the values of JAVA_OPTS to a temp file then do java @<path/to/temp/file> .... Would have the exact same effect. However argument files require Java 9 minimum.

Another approach is to append JAVA_OPTS to JAVA_TOOL_OPTIONS or JDK_JAVA_OPTIONS which will be picked up by java. The former works with Java 8.

There is a very small edge case with appending to JAVA_TOOL_OPTIONS approach in which the behavior differs from the current shell script approach though. Command line arguments take precedence over JDK_JAVA_OPTIONS, which in turns takes precedence over JAVA_TOOL_OPTIONS, so in case the same argument is set in both JAVA_OPTS and JDK_JAVA_OPTIONS, the value in JDK_JAVA_OPTIONS will win whereas with the current shell script approach the value in JAVA_OPTS wins. This looks like a rare edge case though.

In summary there are 3 approaches

Argument files. Cleanest and ensures same behavior with the current shell script approach, but requires Java 9.
Append to JDK_JAVA_OPTIONS. Also requires Java 9, so should just use argument files instead.
Append to JAVA_TOOL_OPTIONS. Works with Java 8, but there's a small edge case where the behavior differs from the current shell script approach.

sideeffffect · 2025-02-14T17:01:26Z

Parsing and splitting JAVA_OPTS ourselves is out of the question.

Maybe it's possible after all:

#include <stdio.h>
#include <stdlib.h>
#include <wordexp.h>

int main() {
    char *env_var = getenv("JAVA_OPTS");
    if (!env_var) {
        printf("JAVA_OPTS is not set.\n");
        return 1;
    }

    wordexp_t p;
    if (wordexp(env_var, &p, 0) != 0) {
        perror("wordexp failed");
        return 1;
    }

    // Print parsed options
    printf("Parsed JAVA_OPTS:\n");
    for (size_t i = 0; i < p.we_wordc; i++) {
        printf("  Arg[%zu]: %s\n", i, p.we_wordv[i]);
    }

    wordfree(&p); // Free allocated memory
    return 0;
}

Courtesy of ChatGPT

While wordexp doesn't seem to be available in Cosmpolitan, there are permissively-licensed implementations:

https://github.com/leleliu008/libwordexp

https://www.gnu.org/software/libc/manual/html_node/Wordexp-Example.html

lihaoyi · 2025-02-19T12:39:23Z

Looks good to me overall. Just some nits and docs and I think we can merge it

lihaoyi · 2025-02-19T12:40:10Z

integration/feature/cosmo/resources/build.mill

@@ -0,0 +1,12 @@
+package build
+
+import mill._, javalib._, scalalib._


Let's put an alias javalib.cosmo so people don't need to import it from scalalib._ in Java projects

kiendang mentioned this pull request Feb 7, 2025

Make assembly jars Actually Portable Executables (500USD Bounty) #4445

Open

kiendang force-pushed the ape branch 2 times, most recently from e8dc45c to df9f091 Compare February 7, 2025 10:18

lefou reviewed Feb 7, 2025

View reviewed changes

kiendang force-pushed the ape branch from df9f091 to e0f38d7 Compare February 11, 2025 18:25

kiendang force-pushed the ape branch from e0f38d7 to d45dc95 Compare February 11, 2025 20:52

kiendang force-pushed the ape branch from d45dc95 to 234d07a Compare February 12, 2025 14:36

lefou reviewed Feb 13, 2025

View reviewed changes

scalalib/src/mill/scalalib/cosmopolitan/CosmopolitanModule.scala Outdated Show resolved Hide resolved

lefou reviewed Feb 13, 2025

View reviewed changes

scalalib/src/mill/scalalib/cosmopolitan/CosmopolitanModule.scala Outdated Show resolved Hide resolved

kiendang force-pushed the ape branch from 5d3c75d to 9d40f40 Compare February 13, 2025 09:08

lefou reviewed Feb 13, 2025

View reviewed changes

lihaoyi mentioned this pull request Feb 14, 2025

Properly support permissions and symlinks in Zip files (500USD Bounty) com-lihaoyi/os-lib#356

Open

kiendang force-pushed the ape branch from 9d40f40 to 2c6e373 Compare February 14, 2025 16:55

kiendang marked this pull request as ready for review February 14, 2025 16:55

kiendang force-pushed the ape branch 2 times, most recently from ef7c23e to b37159b Compare February 15, 2025 07:10

lihaoyi reviewed Feb 19, 2025

View reviewed changes

kiendang force-pushed the ape branch 2 times, most recently from b985591 to f71da73 Compare February 20, 2025 07:45

kiendang added 25 commits February 20, 2025 18:07

Add scalalib.ApeModule to produce APE assembly jar

011bbd5

Make cosmocc downloading task persistent

d9ddb3f

Add appropriate prefix to task names

22d3065

Add support for forkArgs

b527037

Move ApeModule to cosmopolitan.CosmopolitanModule

ba4dd57

Add tests

1447d02

Simplify

9867745

Stop re-downloading cosmocc if already exists

aa4a7a6

Adjust failure message

144df10

Narrow type

32b2783

Cosmo everything

d866f1c

Add compiler optim

59eafe2

Fix

a50aac9

Require a main class

b80ab89

Add example test

86606bb

Add support for JAVA_OPTS

6e8fea6

Add test for JAVA_OPTS

03758f1

Remove duplicated test

2c32bfd

Disable tests on Windows

c11be49

Refactor

b81b1b6

Add javalib.cosmo.CosmoModule alias

b36aaf2

Move launcher.c template to a file

e470e72

Move example test to javalib

ac24385

Update doc

faaabc9

Use os.unzip instead of calling subprocess

126dd9b

kiendang force-pushed the ape branch from f71da73 to 126dd9b Compare February 20, 2025 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scalalib.ApeModule to produce APE (actually portable executable) assembly jar #4503

Add scalalib.ApeModule to produce APE (actually portable executable) assembly jar #4503

kiendang commented Feb 7, 2025 •

edited

Loading

lefou left a comment

lefou Feb 7, 2025

lefou Feb 7, 2025

lefou Feb 7, 2025

kiendang Feb 7, 2025

lefou Feb 13, 2025

lihaoyi Feb 13, 2025

kiendang Feb 13, 2025

lefou Feb 13, 2025

lihaoyi Feb 13, 2025

kiendang commented Feb 7, 2025

lihaoyi commented Feb 8, 2025 •

edited

Loading

kiendang commented Feb 8, 2025

kiendang commented Feb 8, 2025

lefou commented Feb 8, 2025

lihaoyi commented Feb 10, 2025

sake92 commented Feb 11, 2025

kiendang commented Feb 11, 2025

lihaoyi commented Feb 11, 2025

kiendang commented Feb 12, 2025

lihaoyi commented Feb 12, 2025

lefou Feb 13, 2025

kiendang Feb 13, 2025

lihaoyi Feb 13, 2025

kiendang Feb 13, 2025

lihaoyi Feb 13, 2025

lefou left a comment •

edited

Loading

kiendang commented Feb 14, 2025

sideeffffect commented Feb 14, 2025 •

edited

Loading

lihaoyi commented Feb 19, 2025

lihaoyi Feb 19, 2025

		@@ -0,0 +1,12 @@
		package build

		import mill._, javalib._, scalalib._

Add scalalib.ApeModule to produce APE (actually portable executable) assembly jar #4503

Are you sure you want to change the base?

Add scalalib.ApeModule to produce APE (actually portable executable) assembly jar #4503

Conversation

kiendang commented Feb 7, 2025 • edited Loading

lefou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kiendang commented Feb 7, 2025

lihaoyi commented Feb 8, 2025 • edited Loading

kiendang commented Feb 8, 2025

kiendang commented Feb 8, 2025

lefou commented Feb 8, 2025

lihaoyi commented Feb 10, 2025

sake92 commented Feb 11, 2025

kiendang commented Feb 11, 2025

lihaoyi commented Feb 11, 2025

kiendang commented Feb 12, 2025

lihaoyi commented Feb 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lefou left a comment • edited Loading

Choose a reason for hiding this comment

kiendang commented Feb 14, 2025

sideeffffect commented Feb 14, 2025 • edited Loading

lihaoyi commented Feb 19, 2025

Choose a reason for hiding this comment

kiendang commented Feb 7, 2025 •

edited

Loading

lihaoyi commented Feb 8, 2025 •

edited

Loading

lefou left a comment •

edited

Loading

sideeffffect commented Feb 14, 2025 •

edited

Loading