-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed skip flags when calling _ltfs_search_index_wp #493
Changed skip flags when calling _ltfs_search_index_wp #493
Conversation
TEST 1Using file back end Preconditions:
Before code change (current behavior) Result: Tape cannot be mounted
After code change (new behavior) Change on line 1661 in ltfs_mount function: /* Index of IP could be corrupted. So set skip flag to true */
ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol); Result: The mount is now successful! since the index could be found from DP:
|
@juliocelon could you please help me to set the issue to be fixed by this PR automatically and set the reviewers? Looks like I am unable to do it with my current rights,. |
TEST 2Using file backend: Preconditions:
Before code change (current behavior) Result: Tape is successfully mounted if the index on DP can be gotten:
After code change (new behavior) Change on line 1691 at ltfs_mount function: /* Index of DP could be corrupted. So set skip flag to false */
ret = _ltfs_search_index_wp(recover_symlink, false, &seekpos, vol); Result: The tape is not mounted
|
Hello @piste-jp, Please refer to the "TEST 2" for the case when the Generations of the MAM Coherency are the same. The current flags change will make the mount to fail. I have not found the part of the code yet where the Generation (Count Value) of the MAM Coherency could be left with the same value after a write perm error on both partitions but looks like the quicker solution is to modify line 1647 at ltfs.c with if (vol->ip_coh.count <= vol->dp_coh.count) { I will look forward to your comments. |
@piste-jp do you think that by making the modification on line 1647 described in above comment could cause any side issue? Can we really trust on any of both partitions if a write perm error happened on both? In my point of view, allow to search on both partitions in those scenarios, where the Generation of the MAM Coherency could be the same, is safe to be implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for me
I don't think so. Thus, I suggested. Do you think there are side effects?
Yes, I think so.
I cannot agree this. The block starts below shall handle the case DP has a recent index, see the comment in L1644. But I believe DP is not a recent if coherency count is same. Lines 1641 to 1650 in 7271446
So it it is equal, we must treat IP shall have a latest index. Only exception is double write perm case. I believe we need to build the reasonable logic which can be followed by other people. But in my point of view, your code doesn't have it and just hides the problem found in TEST2. |
Ok, we can really trust on any of both partitions if a write perm error happened on both
I did not follow this, if we can really trust on any/either of both partitions if a write permanent error happens on both, then it would be okay to search on both partitions, why you do not agree that allow to search on both partitions in that scenario is safe to be implemented?
I saw the comment in L1644 but it is not the case we are talking here; to be clear, I am talking about a write perm error on both partitions where a MAM coherency count is the same.
I meant a double write perm case
The code that is on the PR does not include the change described in comment #493 (comment), which is the following change: if (vol->ip_coh.count <= vol->dp_coh.count) { so are you seeing that change is not the way to solve the issue on TEST 2? |
See my suggestion. If coherency counts are equal and both partitions if a write perm, allow skip IP. So what is the problem? |
I believe when "a MAM coherency count is the same.", jump into the What is your point? |
Your suggestion,
I don't want to handle 2 cases above in a same condition block. |
Correct, I do not see any problem if we allow skip IP if coherency counts are equals.
The point is that it is not true now with the current change on commit c05bad9, since now, the "else" block will have the skip flag set to false, so the tape will be unable to be mounted
okay, the change in this PR to avoid the problem on TEST2 should be the following: vol->ip_coh.count < vol->dp_coh.count: vol->ip_coh.count == vol->dp_coh.count: vol->ip_coh.count > vol->dp_coh.count (else block): Tape drive wrote down the index of IP, so skip_flag set to false to prevent search on DP. Do you agree? Specially with the vol->ip_coh.count == vol->dp_coh.count case. |
It looks you made a big misunderstanding... Please take a look my suggestion carefully with line numbers. My suggestion is if (vol->ip_coh.count < vol->dp_coh.count) {
if (vollock != PWE_MAM_IP && vollock != PWE_MAM) {
/*
* The index on DP is newer but MAM shows write perm doesn't happen in IP.
* If LTFS failed to write IP with non-medium reason error (like cable pull on locate)
* while write error handling at DP in the previous session, this condition would happen.
*/
ltfsmsg(LTFS_INFO, 17264I, "DP", vl_print);
}
if (volume_change_ref != vol->dp_coh.volume_change_ref) {
/*
* Cannot trust the index info on MAM, search the last indexes
* This would happen when the drive returns an error against acquiring the VCR
* while write error handling.
*/
ltfsmsg(LTFS_INFO, 17283I,
(unsigned long long)vol->dp_coh.volume_change_ref,
(unsigned long long)volume_change_ref);
ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol);
if (ret < 0)
goto out_unlock;
} else {
ltfsmsg(LTFS_INFO, 17286I, "DP", (unsigned long long)volume_change_ref);
seekpos.partition = ltfs_part_id2num(vol->label->partid_dp, vol);
seekpos.block = vol->dp_coh.set_id;
}
} else {
if (vollock != PWE_MAM_DP && vollock != PWE_MAM) {
/*
* The index on IP is newer but MAM shows write perm doesn't happen in DP.
* LTFS already have written an index on DP when it is writing an index on IP,
* so this condition wouldn't happen logically.
*/
ltfsmsg(LTFS_INFO, 17264I, "IP", vl_print);
}
if (volume_change_ref != vol->ip_coh.volume_change_ref) {
/*
* Cannot trust the index info on MAM, search the last indexes
* This would happen when the drive returns an error against acquiring the VCR
* while write error handling.
*/
ltfsmsg(LTFS_INFO, 17283I,
(unsigned long long)vol->dp_coh.volume_change_ref,
(unsigned long long)volume_change_ref);
/* Index of IP could be corrupted. So set skip flag */
if (vollock == PWE_MAM_BOTH) {
/* Index of IP could be corrupted (because of double write perm). So set skip flag to true */
ret = _ltfs_search_index_wp(recover_symlink, true, &seekpos, vol);
} else {
/* Index of DP could be corrupted. So set skip flag to false */
ret = _ltfs_search_index_wp(recover_symlink, false, &seekpos, vol);
}
if (ret < 0)
goto out_unlock;
} else {
ltfsmsg(LTFS_INFO, 17286I, "IP", (unsigned long long)volume_change_ref);
seekpos.partition = ltfs_part_id2num(vol->label->partid_ip, vol);
seekpos.block = vol->ip_coh.set_id;
}
} |
I do not think there is a big misunderstanding since your suggestion follow the idea we have talked before and the last comment I wrote. I really appreciate your support @piste-jp. I will implement the suggestion you wrote and test it. Regards |
TEST 3Same preconditions than in TEST 2:
Now, including change on a4e44c2 the tape with double write perm error is able to be mounted successfully.
Message |
e04297a
to
a4e44c2
Compare
I have added and tested your proposal in this PR, see: The only modification I did was checking if the MAM count for IP is really newer than the MAM count for DP, as follow: Line 1672 in a4e44c2
This is to avoid the message Please let me know if there any comment, I think the change looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Missael! It looks great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved
Merged to the |
Summary of changes
This pull request includes following changes or fixes.
_ltfs_search_index_wp
Description
When a tape cartridge with a write permanent error is trying to be mounted and the MAM (cartridge memory) attribute of the Index Partion (IP) stores a generation number different than the MAM attribute of the Data Partition (DP), the mount process fails even when ltfs can still search for the index in the other partition.
The code that belongs to the logic that handles WP happens on IP, defined "can_skip_ip" flag as false and it shall be true. On the other hand, the code that handles WP happens on DP, defined the "can_skip_ip" as true and it should be false. The logic is now adjusted.
Type of change
Checklist: