-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the Active → Faulted transition #1260
Conversation
@@ -812,7 +807,7 @@ async fn run_live_repair(mut harness: TestHarness) { | |||
let mut ds2_buffered_messages = vec![]; | |||
let mut ds3_buffered_messages = vec![]; | |||
|
|||
for eid in 0..10 { | |||
for eid in 0..25 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably should have been some test global to begin with, so you did not
have to find it and change it to match the default_config
extent size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nudge, done in 9760bad
77faa50
to
1e2b998
Compare
I did a performance sweep on the Madrid cluster, using the new #!/bin/sh
set -e
if [[ -z ${CRUTEST_BIN} ]]; then
print "Error: must run with CRUTEST_BIN set"
exit 1
fi
FLAGS="\
-t [fd00:1122:3344:104::1]:28830 \
-t [fd00:1122:3344:103::1]:28830 \
-t [fd00:1122:3344:101::1]:28830 \
--io-depth=8 -q --time=60 --sample-time=5 --subsample-count=2\
--key tCw7zw0hAsPuxMOTWwnPEFYjBK9qJRtYyGdEXKEnrg0= \
"
${CRUTEST_BIN} rand-write $FLAGS --gen $(date "+%s") --io-size=1024
${CRUTEST_BIN} rand-write $FLAGS --gen $(date "+%s") --io-size=1
${CRUTEST_BIN} rand-write $FLAGS --gen $(date "+%s") --io-size=256
${CRUTEST_BIN} rand-write $FLAGS --gen $(date "+%s") --io-size=1024
${CRUTEST_BIN} rand-read $FLAGS --gen $(date "+%s") --io-size=1
${CRUTEST_BIN} rand-read $FLAGS --gen $(date "+%s") --io-size=256
${CRUTEST_BIN} rand-read $FLAGS --gen $(date "+%s") --io-size=1024
|
afcce41
to
1e2b998
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All makes sense to me.
This PR removes the Active → Faulted transition when too many IO operations are in flight.
See #1252 for details on why this is desirable!
There are a bunch of changes that come along for the ride:
1 / (1 - f)
, i.e. it now goes to infinity at our desired maximum value. These maximum values are set at 2x the fault for the Offline → Faulted transition.BackpressureConfig
, so it can be unit testedOffline
→Faulted
transition happens at a reasonable timescale (> 1 sec, < 2 minutes)