Skip to content

Commit 5db8ff7

Browse files
Tyler Titsworthpre-commit-ci[bot]
Tyler Titsworth
andauthored
Add Performance Thresholds to Test Runner (#36)
Signed-off-by: Tyler Titsworth <tyler.titsworth@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent fc9afb4 commit 5db8ff7

10 files changed

+185
-24
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
docs/assets
1111
docs/repos/
1212
logs/
13+
models-perf/
1314
output/
1415
site
1516
venv/

.pre-commit-config.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,12 @@ repos:
7272
language: system
7373
name: pylint
7474
types: [python]
75-
- entry: bash -c "python -m tox -e py310"
75+
- entry: bash -c "python -m tox -e py310,clean"
7676
files: ^test-runner/
7777
id: tox
7878
language: system
7979
name: tox
80-
- entry: bash -c "mkdocs build --clean"
80+
- entry: bash -c "rm -rf site/ && mkdocs build --clean"
8181
# files: ^docs/
8282
id: mkdocs
8383
language: system

test-runner/README.md

+20-5
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ A test is defined as a set of commands to be executed along with their associate
3030
| [volumes](https://github.com/compose-spec/compose-spec/blob/master/spec.md#volumes) | Optional[List[[Volume](utils/test.py#L13)]] | A list of volumes to be mounted when running the test in a container. |
3131
| [env](https://github.com/compose-spec/compose-spec/blob/master/spec.md#environment) | Optional[Dict[str, str]] | A list of environment variables to be set when the test is running. |
3232
| mask | Optional[List[str]] | A list of keys to [mask](#masking) in the test output. |
33+
| performance | Optional[str] | Check test performance thresholds in the format `perf/path/to/model.yaml:test-id` |
3334
| notebook | Optional[str] | A flag indicating whether the test utilizes a [jupyter notebook](#notebook-test). |
3435
| serving | Optional[str] | A flag indicating whether a [serving test](#serving-test) should be invoked. |
3536
| [cap_add](https://github.com/compose-spec/compose-spec/blob/master/spec.md#cap_add) | Optional[str] | Specifies additional container capabilities. |
@@ -75,12 +76,12 @@ In the example above, the first output will be `hello`, and the second output wi
7576

7677
Masking is a feature that allows you to hide sensitive information in the logs generated by the test runner. This is useful when you want to prevent benchmark information from being publicly exposed.
7778

78-
To enable masking, add the `mask` parameter to your `tests.yaml` file as a list of strings. Each string should be a key whose value you want to mask without any kind of delimiter.
79+
To enbable masking, add the `mask` parameter to your `tests.yaml` file as a list of strings. Each string should be a key whose value you want to mask without any kind of delimiter.
7980

80-
By default, masking is not enabled. To enable masking, use the `-m` flag when running the test runner application.
81+
By default, masking is enabled. To disable masking, add `"mask": [false]` to your `.actions.json` file.
8182

8283
```bash
83-
python -m -f path/to/tests.yaml
84+
python -f path/to/tests.yaml
8485
```
8586

8687
```bash
@@ -92,6 +93,21 @@ test:
9293

9394
In the example above, the output will be `hello:***`
9495

96+
#### Performance Thresholds
97+
98+
You can utilize performance thresholds stored in another github repository by providing the `PERF_REPO` environment variable in GitHub's `org-name/repo-name` format.
99+
100+
```yaml
101+
test:
102+
cmd: "echo 'my-key: 100'"
103+
performance: perf/my-model:my-test-id
104+
```
105+
106+
```bash
107+
export PERF_REPO=...
108+
python test-runner/test_runner.py -f path/to/tests.yaml
109+
```
110+
95111
#### Notebook Test
96112

97113
A notebook test is a special type of test designed to run Jupyter notebooks. This is indicated by setting the notebook attribute to `True` in the test definition. When a test is marked as a notebook test, the command specified in the cmd attribute is expected to be [papermill](https://github.com/nteract/papermill) command. If papermill is not already installed in the provided `image` property, then it will be installed.
@@ -139,7 +155,7 @@ For more options, see the `--help` output below:
139155

140156
```text
141157
$ python test_runner.py --help
142-
usage: test_runner.py [-h] [-a ACTIONS_PATH] -f FILE_PATH [-v] [-l LOGS_PATH] [-m]
158+
usage: test_runner.py [-h] [-a ACTIONS_PATH] -f FILE_PATH [-v] [-l LOGS_PATH]
143159
144160
optional arguments:
145161
-h, --help show this help message and exit
@@ -150,7 +166,6 @@ optional arguments:
150166
-v, --verbose DEBUG Loglevel
151167
-l LOGS_PATH, --logs LOGS_PATH
152168
-l /path/to/logs
153-
-m, --mask Enable mask parameter for sensitive information in logs
154169
```
155170

156171
### Run Modes

test-runner/dev-requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
black>=24.4.1
22
coverage>=7.5.0
33
expandvars>=0.12.0
4+
gitpython>=3.1.43
45
hypothesis>=6.100.1
6+
Pint>=0.21.1
57
pydantic==2.7.2
68
pylint>=3.1.0
79
pytest>=8.1.1

test-runner/requirements.txt

+2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
expandvars>=0.12.0
2+
gitpython>=3.1.43
3+
Pint>=0.21.1
24
pydantic==2.7.2
35
python_on_whales>=0.70.1
46
pyyaml>=6.0.1

test-runner/test_runner.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
from expandvars import expandvars
3939
from python_on_whales import DockerException, docker
4040
from tabulate import tabulate
41-
from utils.test import Test
41+
from utils.test import PerfException, Test
4242
from yaml import YAMLError, full_load
4343

4444

@@ -187,7 +187,7 @@ def get_test_list(args: dict, tests_yaml: List[dict]):
187187
# returns the stdout of the test and the RETURNCODE
188188
try: # Try for Runtime Failure Conditions
189189
log = test.container_run() if test.img else test.run()
190-
except DockerException as err:
190+
except (DockerException, PerfException, YAMLError) as err:
191191
logging.error(err)
192192
summary.append([idx + 1, test.name, "FAIL"])
193193
ERROR = True

test-runner/tests.yaml

+13-8
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,17 @@
1515
test1:
1616
img: ${REGISTRY}/${REPO}:latest # substitute env from host
1717
cmd: head -n 1 /workspace/test-runner/requirements.txt # volume mounted file
18-
# device: /dev/dri
19-
# ipc: host
18+
# device: /dev/dri
19+
# ipc: host
2020
notebook: True
2121
env:
2222
REGISTRY: ${REGISTRY} # substitute env from host
2323
DEBUG: 'true' # single quotes
2424
volumes:
25-
- src: /tf_dataset
26-
dst: /tmp
27-
- src: $PWD
28-
dst: /workspace
25+
- src: /tf_dataset
26+
dst: /tmp
27+
- src: $PWD
28+
dst: /workspace
2929
test2:
3030
cmd: echo -n $TEST && python -c 'print(" World", end="")' # var substitution inline
3131
env:
@@ -41,8 +41,13 @@ test6:
4141
img: ${CACHE_REGISTRY}/cache/library/python:3.11-slim-bullseye
4242
cmd: "echo 'hello: world'"
4343
mask:
44-
- hello
44+
- hello
4545
test7:
4646
cmd: "echo 'world: hello'"
4747
mask:
48-
- world
48+
- world
49+
test8:
50+
cmd: "echo 'test: 123 throughput'"
51+
mask:
52+
- test
53+
performance: perf/test.yaml:test

test-runner/tests/utest.py

+46-3
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
from hypothesis import given
2222
from hypothesis.strategies import dictionaries, text
2323
from test_runner import get_test_list, parse_args, set_log_filename
24-
from utils.test import Test
24+
from utils.test import PerfException, Test
2525

2626

2727
@pytest.fixture
@@ -143,6 +143,11 @@ def test_get_test_list(test_args_input, test_json_input):
143143
"mask": ["hello"],
144144
},
145145
"test7": {"cmd": "echo 'world: hello'", "mask": ["world"]},
146+
"test8": {
147+
"cmd": "echo 'test: 123 throughput'",
148+
"mask": ["test"],
149+
"performance": "perf/test.yaml:test",
150+
},
146151
}
147152

148153
test_fn, disable_masking = get_test_list(test_args_input, test_json_input)
@@ -154,9 +159,47 @@ def test_masking(test_class_input):
154159
"test masking."
155160
for test in test_class_input:
156161
if test.mask != [] and test.img:
157-
assert ":***" in test.container_run()
162+
assert ": ***" in test.container_run()
158163
if test.mask != [] and not test.img:
159-
assert ":***" in test.run()
164+
assert ": ***" in test.run()
165+
166+
167+
def test_perf_thresholds():
168+
"test performance thresholds."
169+
test_cases = [
170+
{
171+
"cmd": "echo 'test: 123 throughput'",
172+
"performance": "perf/test.yaml:test",
173+
"expected_output": "test: 123 throughput",
174+
"should_raise_exception": False,
175+
},
176+
{
177+
"cmd": "echo 'test: 121 throughput'",
178+
"performance": "perf/test.yaml:test",
179+
"should_raise_exception": True,
180+
},
181+
{
182+
"cmd": "echo 'test: 123 millithroughput'",
183+
"performance": "perf/test.yaml:test",
184+
"should_raise_exception": True,
185+
},
186+
{
187+
"cmd": "echo 'test: 125 throughput'",
188+
"performance": "perf/test.yaml:not-test",
189+
"should_raise_exception": True,
190+
},
191+
]
192+
193+
for test_case in test_cases:
194+
test = Test(name="test", **test_case)
195+
if test_case["should_raise_exception"]:
196+
try:
197+
with pytest.raises(Exception, match="Failed") as exc_info:
198+
test.run()
199+
except:
200+
assert isinstance(exc_info.value, PerfException)
201+
else:
202+
assert test_case["expected_output"] in test.run()
160203

161204

162205
@given(name=text(), arguments=dictionaries(text(), text()))

test-runner/utils/test.py

+90-4
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,27 @@
2121
from subprocess import PIPE, Popen
2222
from typing import Dict, List, Optional
2323

24+
import pint
2425
from expandvars import expandvars
26+
from git import Repo
2527
from pydantic import BaseModel
2628
from python_on_whales import DockerException, docker
29+
from yaml import YAMLError, full_load
30+
31+
units = pint.UnitRegistry()
32+
33+
34+
class PerfException(Exception):
35+
"Constructs a PerfException class."
36+
37+
38+
class Threshold(BaseModel):
39+
"Constructs a Threshold class."
40+
name: str
41+
modelName: str
42+
boundary: float
43+
lower_is_better: bool
44+
unit: str
2745

2846

2947
class Volume(BaseModel):
@@ -49,12 +67,28 @@ class Test(BaseModel):
4967
groups_add: Optional[List[str]] = ["109", "44"]
5068
hostname: Optional[str] = None
5169
ipc: Optional[str] = None
70+
performance: Optional[str] = None
5271
privileged: Optional[bool] = False
5372
pull: Optional[str] = "missing"
5473
user: Optional[str] = None
5574
shm_size: Optional[str] = None
5675
workdir: Optional[str] = None
5776

77+
def __init__(self, **data):
78+
super().__init__(**data)
79+
if self.performance:
80+
perf_repo = os.environ.get("PERF_REPO")
81+
if perf_repo:
82+
if not os.path.exists("models-perf"):
83+
Repo.clone_from(
84+
f"https://github.com/{perf_repo}", "models-perf", progress=None
85+
)
86+
else:
87+
logging.error(
88+
"Performance mode enabled, but PERF_REPO environment variable not set"
89+
)
90+
units.load_definitions("./models-perf/definitions.txt")
91+
5892
def get_path(self, name):
5993
"""Given a filename, find that file from the users current working directory
6094
@@ -171,6 +205,54 @@ def notebook_run(self, img: str):
171205
load=True,
172206
)
173207

208+
def check_perf(self, content):
209+
"""
210+
Check the performance of the test against the thresholds.
211+
212+
Args:
213+
content (str): test output log
214+
215+
Raises:
216+
PerfException: if the performance does not meet the target performance
217+
"""
218+
with open(
219+
f"models-perf/{self.performance.split(':')[0]}", "r", encoding="utf-8"
220+
) as file:
221+
try:
222+
thresholds = full_load(file)
223+
except YAMLError as yaml_exc:
224+
raise YAMLError(yaml_exc)
225+
model_thresholds = [
226+
threshold
227+
for threshold in thresholds
228+
if self.performance.split(":")[1] == threshold["test_id"]
229+
]
230+
for threshold in model_thresholds:
231+
perf = re.search(
232+
rf"{threshold['key']}[:]?\s+(.\d+[\s]?.*)",
233+
content,
234+
re.IGNORECASE,
235+
)
236+
if perf:
237+
if threshold["lower_is_better"]:
238+
if units.Quantity(perf.group(1)) > units.Quantity(
239+
f"{threshold['boundary']} {threshold['unit']}"
240+
):
241+
if not self.mask:
242+
logging.info("%s: %s", threshold["key"], perf.group(1))
243+
raise PerfException(
244+
f"Performance Threshold {threshold['name']} did not meet the target performance."
245+
)
246+
else:
247+
if units.Quantity(perf.group(1)) < units.Quantity(
248+
f"{threshold['boundary']} {threshold['unit']}"
249+
):
250+
if not self.mask:
251+
logging.info("%s: %s", threshold["key"], perf.group(1))
252+
raise PerfException(
253+
f"Performance Threshold {threshold['name']} did not meet the target performance."
254+
)
255+
174256
def container_run(self):
175257
"""Runs the docker container.
176258
@@ -235,9 +317,11 @@ def container_run(self):
235317
log = ""
236318
for _, stream_content in output_generator:
237319
# All process logs will have the stream_type of stderr despite it being stdout
320+
if self.performance:
321+
self.check_perf(stream_content.decode("utf-8"))
238322
for item in self.mask:
239323
stream_content = re.sub(
240-
rf"({item}[:=-_\s])(.*)",
324+
rf"({item}[:]?\s+)(.*)",
241325
r"\1***",
242326
stream_content.decode("utf-8"),
243327
).encode("utf-8")
@@ -271,14 +355,16 @@ def run(self):
271355
)
272356
try:
273357
stdout, stderr = p.communicate()
358+
if self.performance:
359+
self.check_perf(stdout.decode("utf-8"))
274360
for item in self.mask:
275361
stdout = re.sub(
276-
rf"({item}[:=-_\s])(.*)", r"\1***", stdout.decode("utf-8")
362+
rf"({item}[:]?\s+)(.*)", r"\1***", stdout.decode("utf-8")
277363
).encode("utf-8")
278364
if stderr:
279-
logging.error(stderr.decode("utf-8"))
365+
logging.error(stderr.decode("utf-8").strip())
280366
if stdout:
281-
logging.info("Test Output: %s", stdout.decode("utf-8"))
367+
logging.info("Test Output: %s", stdout.decode("utf-8").strip())
282368
return stdout.decode("utf-8")
283369
except KeyboardInterrupt:
284370
os.killpg(os.getpgid(p.pid), SIGKILL)

tox.ini

+7
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ passenv = DOCKER_*
1515
setenv =
1616
CACHE_REGISTRY = {env:CACHE_REGISTRY}
1717
PATH = {env:PATH}:/usr/local/bin/docker
18+
PERF_REPO = {env:PERF_REPO}
1819
PWD = {env:PWD}
1920
REGISTRY = {env:REGISTRY}
2021
REPO = {env:REPO}
@@ -52,3 +53,9 @@ python =
5253
3.11: py311
5354
3.12: py312
5455
parallel_show_output = true
56+
57+
[testenv:clean]
58+
allowlist_externals=/bin/bash
59+
commands =
60+
/bin/bash -c "rm -rf .coverage* models-perf"
61+
ignore_errors = True

0 commit comments

Comments
 (0)