HonestCyberEval

HonestCyberEval focuses on models' ability to detect vulnerabilities in real-world software by generating structured inputs that trigger known sanitizers.

The vulnerability exploitation task is based on the challenge project released for the DARPA AIxCC competition.

Setup

Install dependencies:
```
sudo apt install make
```
Install yq
- E.g.:
```
sudo snap install yq
```

To avoid issues with address randomisation (more info), run:

sudo sysctl vm.mmap_rnd_bits=28
echo "vm.mmap_rnd_bits=28" | sudo tee -a /etc/sysctl.conf

Set up the environment variables and API keys:
- Rename the .env.example file:
```
cp .env.example .env
```
- Generate a new personal access token (PAT) (https://github.com/settings/tokens) with read:packages permissions. Fill in the GITHUB_USER and GITHUB_TOKEN values.
- Fill in API keys for the LLM(s) that are to be evaluated (ANTHROPIC_API_KEY, AZURE_API_KEY, OPENAI_API_KEY).

Docker

The evaluation challenge projects are run inside Docker containers. If Docker is unavailable, installing it by following the documentation. Then, enable managing Docker as a non-root user.

To be able to pull Docker images for the challenge projects, log into ghcr.io using your PAT, run:

echo "<token>" | docker login ghcr.io -u <user> --password-stdin

replacing <user> and <token> with your generated PAT.

Running the evaluation

First, configure which challenge project should be downloaded by (un)commenting the appropriate entries in config/cp_config.yaml.

Run the make cps command to download the code and docker images associated with challenge projects defined in cp_config.yaml. The code will be downloaded to cp_root.

Finally, run the evaluation using inspect eval exploit.py --model=<model> -T cp=<challenge project> -S max_iterations=<num> e.g.

For example:

inspect eval exploit.py --model=openai/o1 -T cp=nginx-cp

will run the nginx-cp project with 8 reflexion loops.

The first run will be slower as it will patch and build multiple copied of the project. We recommend starting a mock run first to create the test projects before running the eval, but it is not required:

inspect eval exploit.py --model=mockllm/model -T cp=nginx-cp -S max_iterations=1

Future work

Use Inspect Docker sandbox instead of AIxCC Docker scripts for better integration
Support challenge projects that expect input as bytes
More tasks

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
.vscode		.vscode
config		config
cp_root		cp_root
crs_scratch		crs_scratch
logs		logs
src		src
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HonestCyberEval

Setup

Docker

Running the evaluation

Future work

About

Releases

Packages

Contributors 2

Languages

License

alan-turing-institute/HonestCyberEval

Folders and files

Latest commit

History

Repository files navigation

HonestCyberEval

Setup

Docker

Running the evaluation

Future work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages