Skip to content

Commit

Permalink
Use mlkem-native as AWS-LC's ML-KEM implementation
Browse files Browse the repository at this point in the history
This imports mlkem-native (https://github.com/pq-code-package/mlkem-native)
into AWS-LC, replacing the reference implementation.

This commit focuses on the minimal configuration of mlkem-native: No assembly
and no FIPS-202 code are imported.

mlkem-native is a high-performance, high-assurance C90 implementation of
ML-KEM developed under the Post-Quantum Cryptography Alliance (PQCA) and
the Linux Foundation. It is a fork of the reference implementation that
AWS-LC previously relied on, and remains close to it. mlkem-native is the
default ML-KEM implementation in
[libOQS](https://github.com/open-quantum-safe/liboqs).

**Import Mechanism**

The mlkem-native source code is unmodified and imported using the importer
script `crypto/fipsmodule/ml_kem/importer.sh`; the details of the import
are in META.yml.

Future updates to the source tree would ideally happen through a re-import
of a different version of mlkem-native, though a temporary change-log is
conceivable, similar to how the changes from the reference implementation
were documented so far.

**Import Scope**

mlkem-native has a C-only version as well as native 'backends' in AVX2 and
Neon for high performance. This commit only imports the C-only
version. Integration of native backends will be done separately.

mlkem-native offers its own FIPS-202 implementation, including fast
versions of batched FIPS-202. However, this commit does not import those,
but instead provides glue-code around AWS-LC's own FIPS-202
implementation. The path to leveraging the FIPS-202 performance
improvements in mlkem-native would be to integrate them directly
into [crypto/fipsmodule/sha](crytpo/fipsmodule/sha).

**Side-channels**

mlkem-native's CI uses a patched version of valgrind to check for various
compilers and compile flags that there are no secret-dependent memory
accesses, branches, or divisions. The relevant assertions have been kept
but are unused unless `MLK_CT_TESTING_ENABLED` is set, which is the case if
and only if `BORINGSSL_CONSTANT_TIME_VALIDATION` is set.

Similar to AWS-LC, mlkem-native uses value barriers to block potentially
harmful compiler reasoning and optimization. Where standard gcc/clang
inline assembly is not available, mlkem-native falls back to a slower 'opt
blocker' based on a volatile global (an idea by DjB) -- both is described
in
[verify.h](https://github.com/aws/aws-lc/blob/df5b09029e27d54b2b117eeddb6abd983528ae15/crypto/fipsmodule/ml_kem/mlkem/verify.h).
It will be interesting to see if the opt-blocker variant works on all
platforms that AWS-LC cares about.

**Formal Verification**

All C-code imported in this commit is formally verified using the C Bounded
Model Checker ([CBMC](https://github.com/diffblue/cbmc/)) to be free of
various classes of undefined behaviour, including out-of-bounds memory
accesses and arithmetic overflow; the latter is of particular interest for
ML-KEM because of the use of lazy modular reduction for improved
performance.

The heart of the CBMC proofs are function contract and loop annotations to
the C-code. Function contracts are denoted `__contract__(...)` clauses and
occur at the time of declaration, while loop contracts are denoted
`__loop__` and follow the `for` statement.

The function contract and loop statements are kept in the source, but
removed by the preprocessor so long as the CBMC macro is undefined. Keeping
them simplifies the import, and care has been taken to make them readable
to the non-expert, and thereby serve as precise documentation of
assumptions and guarantees upheld by the code.

The CBMC proofs are automatic and don't require further proofs scripts;
yet, they come with their own build system and toolchain dependencies,
which this commit does not attempt to import. See
[proofs/cbmc](https://github.com/pq-code-package/mlkem-native/tree/main/proofs/cbmc)
in the mlkem-native repository. Mid-term, however, CI infrastructure should
be setup that allows to import and check the CBMC proofs as part of the
AWS-LC CI.

**FIPS Compliance**

The current reference implementation in AWS-LC accommodates FIPS (IG) requirements via:
* Adding explicit stack buffer via `OPENSSL_cleanse`
* Adding a Pairwise Consistency Test (PCT) after key generation (only for
the FIPS-build)

mlkem-native unconditionally includes stack zeroization. mlkem-native's
default secure `memset` is replaced by `OPENSSL_cleanse`.

mlkem-native conditionally includes a PCT, guarded by
`MLK_KEYGEN_PCT`. This is set in the config if and only if `AWSLC_FIPS` is
set.

**Performance**

It is expected -- but should be checked! -- that the ML-KEM performance
with this PR is comparable to that of the reference implementation. This is
because the mlkem-native's fast backends are not yet imported, the FIPS-202
code remains that of AWS-LC, and mlkem-native is otherwise close to the
reference implementation.

**Multilevel build**

At the core, mlkem-native is currently a 'single-level' implementation of
ML-KEM: A build of the main source tree provides an implementation of
exactly one of ML-KEM-512/768/1024, depending on the MLKEM_K
parameter. This property is inherited from the ML-KEM reference
implementation, while AWS-LC's fork of the reference implementation has
changed this behaviour and passes the security level as a runtime
parameter.

To build all security levels, level-specific sources are built 3 times,
once per security level, and linked with a single build of the
level-independent code. The single-compilation-unit approach pursued by
AWS-LC makes this process fairly simple since one merely needs to include
the single-compilation-unit file provided by mlkem-native three times, and
configure it so that the level-independent code is included only once. The
final include moreover `#undef`'ines all macros defined by mlkem-native,
reducing the risk of name clashes with other parts of
crypto/fipsmodule/bcm.c.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
  • Loading branch information
hanno-becker committed Feb 20, 2025
1 parent 2ae4f21 commit 0cd0d85
Show file tree
Hide file tree
Showing 36 changed files with 7,310 additions and 13 deletions.
5 changes: 5 additions & 0 deletions crypto/fipsmodule/ml_kem/META.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: mlkem-native
source: pq-code-package/mlkem-native.git
branch: main
commit: bb8b6838919c27aa68064f538fd6d9eb6cdaf9e6
imported-at: 2025-02-19T10:08:38+0000
18 changes: 5 additions & 13 deletions crypto/fipsmodule/ml_kem/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,9 @@
# AWS-LC ML-KEM readme file
# ML-KEM

The source code in this folder implements ML-KEM as defined in FIPS 203 Module-Lattice-Based Key-Encapsulation Mechanism Standard ([link](https://csrc.nist.gov/pubs/fips/203/final).

**Source code origin and modifications.** The source code was imported from a branch of the official repository of the Crystals-Kyber team that follows the standard draft: https://github.com/pq-crystals/kyber/tree/standard. The code was taken at [commit](https://github.com/pq-crystals/kyber/commit/11d00ff1f20cfca1f72d819e5a45165c1e0a2816) as of 03/26/2024. At the moment, only the reference C implementation is imported.
## Source code origin and modifications.**

The code was refactored in [this PR](https://github.com/aws/aws-lc/pull/1763) by parametrizing all functions that depend on values that are specific to a parameter set, i.e., that directly or indirectly depend on the value of KYBER_K. To do this, in `params.h` we defined a structure that holds those ML-KEM parameters and functions
that initialize a given structure with values corresponding to a parameter set. This structure is then passed to every function that requires it as a function argument. In addition, the following changes were made to the source code in `ml_kem_ref` directory:
- `randombytes.{h|c}` are deleted because we are using the randomness generation functions provided by AWS-LC.
- `kem.c`: call to randombytes function is replaced with a call to RAND_bytes and the appropriate header file is included (openssl/rand.h).
- `fips202.{h|c}` are deleted as all SHA3/SHAKE functionality is provided instead by AWS-LC fipsmodule/sha rather than the reference implementation.
- `symmetric-shake.c`: unnecessary include of fips202.h is removed.
- `api.h`: `pqcrystals` prefix substituted with `ml_kem` (to be able to build alongside `crypto/kyber`).
- `poly.c`: the `poly_frommsg` function was modified to address the constant-time issue described [here](https://github.com/pq-crystals/kyber/commit/9b8d30698a3e7449aeb34e62339d4176f11e3c6c).
- All internal header files were updated with unique `ML_KEM_*` include guards.

**Testing.** The KATs were obtained from an independent implementation of ML-KEM written in SPARK Ada subset: https://github.com/awslabs/LibMLKEM.
The source code in [mlkem](mlkem) is imported without change from [mlkem-native](https://github.com/pq-code-package/mlkem-native) using [importer.sh](importer.sh); see [META.yml](META.yml) for
the exact hash. mlkem-native's FIPS-202 code is not imported, but glue code [fips202_glue.h](fips202_glue.h) and [fips202x4_glue.h](fips202x4_glue.h) provided to use AWS-LC's own FIPS-202
implementation from [crypto/fipsmodule/sha](../sha). [mlkem_native_bcm.c](mlkem_native_bcm.c) is imported using [importer.sh](importer.sh) from the mlkem-native file `examples/monolithic_build/mlkem_native_monobuild.c` which is a [`crypto/fipsmodule/bcm.c`](../bcm.c)-like file including all mlkem-native compilation units. This file is imported once per security level in [mlkem_c.c](mlkem_c.c) in such a way that level-independent code is shared.
73 changes: 73 additions & 0 deletions crypto/fipsmodule/ml_kem/fips202_glue.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/*
* Copyright (c) 2024-2025 The mlkem-native project authors
* SPDX-License-Identifier: Apache-2.0
*/

#ifndef MLK_AWSLC_FIPS202_GLUE_H
#define MLK_AWSLC_FIPS202_GLUE_H
#include <stddef.h>
#include <stdint.h>

#include "../sha/internal.h"

#define SHAKE128_RATE 168
#define SHAKE256_RATE 136
#define SHA3_256_RATE 136
#define SHA3_384_RATE 104
#define SHA3_512_RATE 72

#define shake128ctx KECCAK1600_CTX

static MLK_INLINE void shake128_init(shake128ctx *state)
{
/* Return code checks can be omitted
* SHAKE_Init always returns 1 when called with correct block size value */
(void) SHAKE_Init(state, SHAKE128_BLOCKSIZE);
}

static MLK_INLINE void shake128_release(shake128ctx *state)
{
(void) state;
}

static MLK_INLINE void shake128_absorb_once(shake128ctx *state,
const uint8_t *input, size_t inlen)
{
/* TODO: Document why this function does not fail in the context
* of the calls made by mlkem-native. */
(void) SHAKE_Absorb(state, input, inlen);
}

static MLK_INLINE void shake128_squeezeblocks(uint8_t *output, size_t nblocks,
shake128ctx *state)
{
/* TODO: Document why this function does not fail in the context
* of the calls made by mlkem-native. */
(void) SHAKE_Squeeze(output, state, nblocks * SHAKE128_RATE);
}

static MLK_INLINE void shake256(uint8_t *output, size_t outlen,
const uint8_t *input, size_t inlen)
{
/* TODO: Document why this function does not fail in the context
* of the calls made by mlkem-native. */
(void) SHAKE256(input, inlen, output, outlen);
}

static MLK_INLINE void sha3_256(uint8_t *output, const uint8_t *input,
size_t inlen)
{
/* TODO: Document why this function does not fail in the context
* of the calls made by mlkem-native. */
(void) SHA3_256(input, inlen, output);
}

static MLK_INLINE void sha3_512(uint8_t *output, const uint8_t *input,
size_t inlen)
{
/* TODO: Document why this function does not fail in the context
* of the calls made by mlkem-native. */
(void) SHA3_512(input, inlen, output);
}

#endif /* MLK_AWSLC_FIPS202_GLUE_H */
71 changes: 71 additions & 0 deletions crypto/fipsmodule/ml_kem/fips202x4_glue.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
/*
* Copyright (c) 2024-2025 The mlkem-native project authors
* SPDX-License-Identifier: Apache-2.0
*/

/*
* This is a shim establishing the FIPS-202 API required by
* mlkem-native from the API exposed by AWS-LC.
*/

#ifndef MLK_AWSLC_FIPS202X4_GLUE_H
#define MLK_AWSLC_FIPS202X4_GLUE_H

#include <stddef.h>
#include <stdint.h>

#include "fips202_glue.h"

typedef shake128ctx shake128x4ctx[4];

static MLK_INLINE void shake128x4_absorb_once(shake128x4ctx *state,
const uint8_t *in0,
const uint8_t *in1,
const uint8_t *in2,
const uint8_t *in3, size_t inlen)
{
shake128_absorb_once(&(*state)[0], in0, inlen);
shake128_absorb_once(&(*state)[1], in1, inlen);
shake128_absorb_once(&(*state)[2], in2, inlen);
shake128_absorb_once(&(*state)[3], in3, inlen);
}

static MLK_INLINE void shake128x4_squeezeblocks(uint8_t *out0, uint8_t *out1,
uint8_t *out2, uint8_t *out3,
size_t nblocks,
shake128x4ctx *state)
{
shake128_squeezeblocks(out0, nblocks, &(*state)[0]);
shake128_squeezeblocks(out1, nblocks, &(*state)[1]);
shake128_squeezeblocks(out2, nblocks, &(*state)[2]);
shake128_squeezeblocks(out3, nblocks, &(*state)[3]);
}

static MLK_INLINE void shake128x4_init(shake128x4ctx *state)
{
shake128_init(&(*state)[0]);
shake128_init(&(*state)[1]);
shake128_init(&(*state)[2]);
shake128_init(&(*state)[3]);
}

static MLK_INLINE void shake128x4_release(shake128x4ctx *state)
{
shake128_release(&(*state)[0]);
shake128_release(&(*state)[1]);
shake128_release(&(*state)[2]);
shake128_release(&(*state)[3]);
}

static MLK_INLINE void shake256x4(uint8_t *out0, uint8_t *out1, uint8_t *out2,
uint8_t *out3, size_t outlen, uint8_t *in0,
uint8_t *in1, uint8_t *in2, uint8_t *in3,
size_t inlen)
{
shake256(out0, outlen, in0, inlen);
shake256(out1, outlen, in1, inlen);
shake256(out2, outlen, in2, inlen);
shake256(out3, outlen, in3, inlen);
}

#endif /* MLK_AWSLC_FIPS202X4_GLUE_H */
52 changes: 52 additions & 0 deletions crypto/fipsmodule/ml_kem/importer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/bin/bash

GITHUB_SERVER_URL=https://github.com/
GITHUB_REPOSITORY=${GITHUB_REPOSITORY:=pq-code-package/mlkem-native.git}
GITHUB_SHA=${GITHUB_SHA:=main}

SRC=mlkem
TMP=$(mktemp -d) || exit 1
echo "Temporary working directory: $TMP"

# Check if source directory already exists
if [ -d "$SRC" ]; then
echo "Source directory or symlink $SRC does already exist -- please remove it before re-running the importer"
exit 1
fi

# Work in temporary directory
pushd $TMP

# Fetch repository
echo "Fetching repository ..."
git init >/dev/null
git remote add origin $GITHUB_SERVER_URL/$GITHUB_REPOSITORY >/dev/null
git fetch origin --depth 1 $GITHUB_SHA >/dev/null
git checkout FETCH_HEAD >/dev/null
GITHUB_COMMIT=$(git rev-parse FETCH_HEAD)

popd

echo "Pull source code from remote repository..."

# Copy mlkem-native source tree -- C-only, no FIPS-202
mkdir $SRC
cp $TMP/mlkem/* $SRC
# Copy and statically simplify BCM file
unifdef -DMLK_MONOBUILD_CUSTOM_FIPS202 \
-UMLK_MONOBUILD_WITH_NATIVE_ARITH \
-UMLK_MONOBUILD_WITH_NATIVE_FIPS202 \
$TMP/examples/monolithic_build/mlkem_native_monobuild.c \
> mlkem_native_bcm.c

echo "Remove temporary artifacts ..."
rm -rf $TMP

echo "Generating META.yml file ..."
cat <<EOF > META.yml
name: mlkem-native
source: $GITHUB_REPOSITORY
branch: $GITHUB_SHA
commit: $GITHUB_COMMIT
imported-at: $(date "+%Y-%m-%dT%H:%M:%S%z")
EOF
Loading

0 comments on commit 0cd0d85

Please sign in to comment.