Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nightly test polish #1665

Merged
merged 22 commits into from
Mar 17, 2025
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

50 changes: 39 additions & 11 deletions tools/hammer_loop.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,24 @@ if pgrep -fl -U "$(id -u)" "$cds"; then
exit 1
fi

WORK_ROOT=${WORK_ROOT:-/tmp}
TEST_ROOT="$WORK_ROOT/hammer_loop"
REGION_ROOT=${REGION_ROOT:-/var/tmp/hammer_loop}
if [[ ! -d "$TEST_ROOT" ]]; then
mkdir -p "$TEST_ROOT"
if [[ $? -ne 0 ]]; then
echo "Failed to make test root $TEST_ROOT"
exit 1
fi
else
# Delete previous test data
rm -r "$TEST_ROOT"
fi

loop_log="$TEST_ROOT/hammer_loop.log"
test_log="$TEST_ROOT/hammer_loop_test.log"
dsc_ds_log="$TEST_ROOT/hammer_loop_dsc.log"

loops=20

usage () {
Expand All @@ -37,23 +55,27 @@ usage () {
}

while getopts 'l:' opt; do
case "$opt" in
case "$opt" in
l) loops=$OPTARG
;;
*) echo "Invalid option"
usage
exit 1
;;
esac
exit 1
;;
esac
done

if ! "$dsc" create --cleanup --ds-bin "$cds" --extent-count 60 --extent-size 50; then
if ! "$dsc" create --cleanup --ds-bin "$cds" --extent-count 60 \
--output-dir "$dsc_ds_log" \
--extent-size 50 --region-dir "$REGION_ROOT"
then
echo "Failed to create region"
exit 1
fi

# Start up dsc, verify it really did start.
"$dsc" start --ds-bin "$cds" &
"$dsc" start --ds-bin "$cds" --region-dir "$REGION_ROOT" \
--output-dir "$dsc_ds_log" &
dsc_pid=$!
sleep 5
if ! pgrep -P $dsc_pid; then
Expand All @@ -78,9 +100,6 @@ function ctrl_c() {
fi
exit 1
}

loop_log=/tmp/hammer_loop.log
test_log=/tmp/hammer_loop_test.log
echo "" > ${loop_log}
echo "starting Hammer test on $(date)" | tee ${loop_log}
echo "Tail $test_log for test output"
Expand Down Expand Up @@ -138,12 +157,21 @@ printf "[%03d] %d:%02d ave:%d:%02d total:%d:%02d errors:%d last_run_seconds:%d
"$err" $duration | tee -a ${loop_log}

echo "Stopping dsc"
kill $dsc_pid 2> /dev/null
"$dsc" cmd shutdown
wait $dsc_pid

# Also remove any leftover downstairs
if pgrep -fl -U "$(id -u)" "$cds" > /dev/null; then
pkill -f -U "$(id -u)" "$cds"
fi

if [[ $err -eq 0 ]]; then
# No errors, then cleanup all our logs and the region directories.
rm -r "$TEST_ROOT"
rm -r "$REGION_ROOT"/8810
rm -r "$REGION_ROOT"/8820
rm -r "$REGION_ROOT"/8830
# If empty, remove the region directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where this wouldn't be empty? If not, should we do rm -r "$REGION_ROOT" versus deleting region dirs individually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what I want here was something I don't think this is going to give me, and you are
right to call out what this half measure was doing.

What I want is for the caller to be able to supply a directory and have this (and any
other tests) use that directory as a place where the regions are created.
The problem arises if that directory is not empty. I don't want this test to blindly
destroy whatever exists. My half measure was to only remove directories that I
knew this tests created.

However, I now think a better solution is to, like we do with TEST_ROOT, is to create
a specific subdirectory inside REGION_ROOT, and put all region directories inside that.

This way I can always remove $REGION_ROOT/<my_test_unique_subdir> and not have
to care what is in that directory, or even know what exact directories exist inside it, which
will be a problem when I make these tests have multiple region sets.

I'll refactor this and all the places where we use REGION_ROOT

rmdir "$REGION_ROOT"
fi
exit "$err"

40 changes: 32 additions & 8 deletions tools/test_live_repair.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,22 @@ mkdir -p "$REGION_ROOT"

# Location of logs and working files
WORK_ROOT=${WORK_ROOT:-/tmp}
mkdir -p "$WORK_ROOT"
TEST_ROOT="$WORK_ROOT/test_live_repair"
if [[ ! -d "$TEST_ROOT" ]]; then
mkdir -p "$TEST_ROOT"
if [[ $? -ne 0 ]]; then
echo "Failed to make test root $TEST_ROOT"
exit 1
fi
else
# Delete previous test data
rm -r "$TEST_ROOT"
fi

loop_log="$WORK_ROOT"/test_live_repair_summary.log
test_log="$WORK_ROOT"/test_live_repair.log
verify_log="$WORK_ROOT/test_live_repair_verify.log"
loop_log="$TEST_ROOT"/test_live_repair_summary.log
test_log="$TEST_ROOT"/test_live_repair.log
verify_log="$TEST_ROOT/test_live_repair_verify.log"
dsc_ds_log="$TEST_ROOT/test_live_repair_dsc.log"

ROOT=$(cd "$(dirname "$0")/.." && pwd)
cd "$ROOT" || (echo failed to cd "$ROOT"; exit 1)
Expand Down Expand Up @@ -68,9 +79,8 @@ done

((region_count=region_sets*3))
((region_count+=1))
echo "" > "$loop_log"
echo "" > "$test_log"
echo "starting $(date)" | tee "$loop_log"
echo "Starting $(date)" > "$test_log"
echo "starting $(date)" > "$loop_log"
echo "Tail $test_log for test output"

# No real data was used to come up with these numbers. If you have some data
Expand All @@ -91,14 +101,17 @@ fi
if ! ${dsc} create --cleanup \
--region-dir "$REGION_ROOT" \
--region-count "$region_count" \
--output-dir "$dsc_ds_log" \
--ds-bin "$downstairs" \
--extent-size "$extent_size" \
--extent-count 200 >> "$test_log"; then
--extent-count 200 >> "$test_log"
then
echo "Failed to create downstairs regions"
exit 1
fi
${dsc} start --ds-bin "$downstairs" \
--region-dir "$REGION_ROOT" \
--output-dir "$dsc_ds_log" \
--region-count "$region_count" >> "$test_log" 2>&1 &
dsc_pid=$!
sleep 5
Expand Down Expand Up @@ -148,4 +161,15 @@ ${dsc} cmd shutdown
wait "$dsc_pid"

echo "$(date) Test ends with $result" | tee -a "$test_log"

if [[ $result -eq 0 ]]; then
rm -rf "$REGION_ROOT"/8810
rm -rf "$REGION_ROOT"/8820
rm -rf "$REGION_ROOT"/8830
rm -rf "$REGION_ROOT"/8840
# If empty, remove the region directory
rmdir "$REGION_ROOT"
rm -rf "$TEST_ROOT"
fi

exit $result
62 changes: 41 additions & 21 deletions tools/test_nightly.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ export BINDIR=${BINDIR:-$ROOT/target/release}
echo "Nightly starts at $(date)" | tee "$output_file"
echo "$(date) hammer start" >> "$output_file"
banner hammer
banner loop
./tools/hammer_loop.sh -l 200
res=$?
if [[ "$res" -eq 0 ]]; then
Expand All @@ -25,65 +26,84 @@ else
echo "$(date) hammer fail with: $res" >> "$output_file"
(( err += 1 ))
fi
echo ""

sleep 1
banner test
banner replay
echo "$(date) replay start" >> "$output_file"
echo "$(date) test_replay start" >> "$output_file"
./tools/test_replay.sh -l 200
res=$?
if [[ "$res" -eq 0 ]]; then
echo "$(date) replay pass" >> "$output_file"
echo "$(date) test_replay pass" >> "$output_file"
else
echo "$(date) replay fail with: $res" >> "$output_file"
echo "$(date) test_replay fail with: $res" >> "$output_file"
(( err += 1 ))
fi
echo ""

sleep 1
banner "test"
banner repair
echo "$(date) repair start" >> "$output_file"
echo "$(date) test_repair start" >> "$output_file"
./tools/test_repair.sh -l 500
res=$?
if [[ "$res" -eq 0 ]]; then
echo "$(date) repair pass" >> "$output_file"
echo "$(date) test_repair pass" >> "$output_file"
else
echo "$(date) repair fail with: $res" >> "$output_file"
echo "$(date) test_repair fail with: $res" >> "$output_file"
(( err += 1 ))
exit 1
fi
echo ""

banner restart_repair
echo "$(date) restart_repair start" >> "$output_file"
./tools/test_restart_repair.sh -l 200
sleep 1
banner restart
banner repair
echo "$(date) test_restart_repair start" >> "$output_file"
./tools/test_restart_repair.sh -l 50
res=$?
if [[ "$res" -eq 0 ]]; then
echo "$(date) restart_repair pass" >> "$output_file"
echo "$(date) test_restart_repair pass" >> "$output_file"
else
echo "$(date) restart_repair fail with: $res" >> "$output_file"
echo "$(date) test_restart_repair fail with: $res" >> "$output_file"
(( err += 1 ))
exit 1
fi
echo ""

banner live_repair
echo "$(date) live_repair start" >> "$output_file"
sleep 1
banner live
banner repair
echo "$(date) test_live_repair start" >> "$output_file"
./tools/test_live_repair.sh -l 20
res=$?
if [[ "$res" -eq 0 ]]; then
echo "$(date) live_repair pass" >> "$output_file"
echo "$(date) test_live_repair pass" >> "$output_file"
else
echo "$(date) live_repair fail with: $res" >> "$output_file"
echo "$(date) test_live_repair fail with: $res" >> "$output_file"
(( err += 1 ))
exit 1
fi
echo ""

banner replace_reconcile
echo "$(date) replace_reconcile start" >> "$output_file"
./tools/test_replace_special.sh -l 20
sleep 1
banner replace
banner special
echo "$(date) test_replace_special start" >> "$output_file"
./tools/test_replace_special.sh -l 30
res=$?
if [[ "$res" -eq 0 ]]; then
echo "$(date) replace_reconcile pass" >> "$output_file"
echo "$(date) test_replace_special pass" >> "$output_file"
else
echo "$(date) replace_reconcile fail with: $res" >> "$output_file"
echo "$(date) test_replace_special fail with: $res" >> "$output_file"
(( err += 1 ))
exit 1
fi
duration=$SECONDS

banner results
cat "$output_file"
printf "Tests took %d:%02d errors:%d\n" \
$((duration / 60)) $((duration % 60)) "$err"
$((duration / 60)) $((duration % 60)) "$err" | tee -a "$output_file"

56 changes: 41 additions & 15 deletions tools/test_repair.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ dsc="$BINDIR/dsc"

for bin in $cds $ct $dsc; do
if [[ ! -f "$bin" ]]; then
echo "Can't find crucible binary at $bin" >&2
echo "Can't find required binary at $bin" >&2
exit 1
fi
done
Expand All @@ -43,41 +43,55 @@ done
# For buildomat, the regions should be in /var/tmp
REGION_ROOT=${REGION_ROOT:-/var/tmp/test_repair}
if [[ -d ${REGION_ROOT} ]]; then
rm -rf ${REGION_ROOT}
rm -rf "$REGION_ROOT"/8810
rm -rf "$REGION_ROOT"/8820
rm -rf "$REGION_ROOT"/8830
fi

# Location of logs and working files
WORK_ROOT=${WORK_ROOT:-/tmp}
mkdir -p "$WORK_ROOT"
TEST_ROOT="$WORK_ROOT/test_live_repair"
if [[ ! -d "$TEST_ROOT" ]]; then
mkdir -p "$TEST_ROOT"
if [[ $? -ne 0 ]]; then
echo "Failed to make test root $TEST_ROOT"
exit 1
fi
else
# Delete previous test data
rm -r "$TEST_ROOT"
fi

verify_file="$WORK_ROOT/test_repair_verify.data"
test_log="$WORK_ROOT/test_repair_out.txt"
ds_log_prefix="$WORK_ROOT/test_repair_ds"
dsc_output_dir="$WORK_ROOT/dsc"
verify_file="$TEST_ROOT/test_repair_verify.data"
test_log="$TEST_ROOT/test_repair_out.txt"
ds_log_prefix="$TEST_ROOT/test_repair_ds"
dsc_output_dir="$TEST_ROOT/test_repair_dsc"
loops=100

usage () {
echo "Usage: $0 [-l #] [N]" >&2
echo " -l loops Number of test loops to perform (default 100)" >&2
echo " -N Don't dump color output"
echo " -N Don't dump color output"
}

dump_args=()
while getopts 'l:N' opt; do
case "$opt" in
case "$opt" in
l) loops=$OPTARG
;;
N) echo "Turn off color for downstairs dump"
N) echo "Turn off color for downstairs dump"
dump_args+=(" --no-color")
;;
*) echo "Invalid option"
usage
exit 1
;;
esac
exit 1
;;
esac
done

if ! "$dsc" create --cleanup --ds-bin "$cds" --extent-count 30 --extent-size 20 --region-dir "$REGION_ROOT" --output-dir "$dsc_output_dir"; then
if ! "$dsc" create --cleanup --ds-bin "$cds" --extent-count 30 \
--extent-size 20 --region-dir "$REGION_ROOT" \
--output-dir "$dsc_output_dir"; then
echo "Failed to create region"
exit 1
fi
Expand All @@ -104,6 +118,10 @@ ds1_pid=$!
${cds} run -d "${REGION_ROOT}/8830" -p 8830 &> "$ds_log_prefix"8830.txt &
ds2_pid=$!

# TODO: Some programatic way to wait for all the downstairs to start before we
# continue here.
sleep 20

os_name=$(uname)
if [[ "$os_name" == 'Darwin' ]]; then
# stupid macos needs this to avoid popup hell.
Expand Down Expand Up @@ -199,7 +217,7 @@ while [[ $count -lt $loops ]]; do
then
echo "Exit on verify fail, loop: $count, choice: $choice"
echo "Check $test_log for details"
cleanup
cleanup
exit 1
fi
set +o errexit
Expand All @@ -224,3 +242,11 @@ duration=$SECONDS
printf "%d:%02d Test duration\n" $((duration / 60)) $((duration % 60))
echo "Test completed"
cleanup

# Errors exit directly, so arrival here indicates success.
rm -rf "$TEST_ROOT"
rm -rf "$REGION_ROOT"/8810
rm -rf "$REGION_ROOT"/8820
rm -rf "$REGION_ROOT"/8830
# If empty, remove the region directory
rmdir "$REGION_ROOT"
Loading