Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving spawn and exec syscalls #4785

Merged
merged 16 commits into from
Feb 11, 2025
Merged

Conversation

mohanson
Copy link
Contributor

@mohanson mohanson commented Jan 17, 2025

What problem does this PR solve?

Problem Summary:

The spawn and exec system calls have some usage issues. The overall operation is too complicated and requires additional checks when reading args from the vm. We found that these checks are unnecessary and can be avoided by other means.

Note: This PR is based on ckb-vm develop branch; need to wait for this PR to be merged: nervosnetwork/ckb-vm#450 and then release ckb-vm v0.24.13

What is changed and how it works?

What's Changed:

  • Use ckb-vm's new FlattenedArgsReader to simplify reading args in spawn and exec syscalls.
  • Add exec v2 implementation.

Related changes

  • PR to update owner/repo:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code ci-runs-only: [ quick_checks,linters ]

Side effects

  • Performance regression
  • Breaking backward compatibility

Release note

Note: Add a note under the PR title in the release note.

@mohanson mohanson requested a review from a team as a code owner January 17, 2025 02:00
@mohanson mohanson requested review from quake and removed request for a team January 17, 2025 02:00
Copy link
Collaborator

@eval-exec eval-exec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase onto develop.

@eval-exec eval-exec added the m:vm label Jan 17, 2025
@mohanson mohanson marked this pull request as draft January 17, 2025 02:08
@xxuejie
Copy link
Collaborator

xxuejie commented Jan 17, 2025

Just a note that a significant portion of the changes here, is due to the fact that the PR is now applied directly to the develop branch. Per CKB-VM's release schedule, CKB will for now stick to v0.24.x series of CKB-VM. So when the PR at CKB-VM is merged, cherry-picked, and release to a proper v0.24.x series, many of changes here will be reverted.

@mohanson mohanson marked this pull request as ready for review January 17, 2025 09:01
@mohanson mohanson requested review from xxuejie and eval-exec January 17, 2025 09:22
machine.set_register(A0, Mac::REG::from_u8(SLICE_OUT_OF_BOUND));
return Ok(true);
}
}
Copy link
Collaborator

@eval-exec eval-exec Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should ExecV2#ecall do when length == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will try to load an empty elf file, which will result in elf parsing errors.

Copy link
Collaborator

@eval-exec eval-exec Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle the check length == 0 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please forgive me, I made a mistake earlier, when length == 0, data[offset...] will be read instead of empty bytes;

The current code behavior is correct.

return Ok(true);
}
if length > 0 {
let end = offset.checked_add(length).ok_or(VMError::MemOutOfBound)?;
Copy link
Collaborator

@eval-exec eval-exec Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both offset and length are <= u32::MAX, so offset.checked_add(length) will be always a Some
So I think L92->96 can be omitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both offset and length are <= u32::MAX, so offset.checked_add(length) will be always a Some.

This description is correct, and we can probably remove this redundant check.

L92->L96 is used to determine whether the data offset exceeds the total length of the data. Why can it be omitted?

Copy link
Collaborator

@eval-exec eval-exec Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L92->L96 is used to determine whether the data offset exceeds the total length of the data. Why can it be omitted?

I think

        if offset >= full_length {
            machine.set_register(A0, Mac::REG::from_u8(SLICE_OUT_OF_BOUND));
            return Ok(true);
        }
        if length > 0 {
            let end = offset.checked_add(length).ok_or(VMError::MemOutOfBound)?;
            if end > full_length {
                machine.set_register(A0, Mac::REG::from_u8(SLICE_OUT_OF_BOUND));
                return Ok(true);
            }
        }

is equal to:

        let end = offset.checked_add(length).ok_or(VMError::MemOutOfBound)?;
        if end >= full_length {
            machine.set_register(A0, Mac::REG::from_u8(SLICE_OUT_OF_BOUND));
            return Ok(true);
        }

Is above replacement correct?
And should ExecV2#ecall check length == 0 here?

Copy link
Contributor Author

@mohanson mohanson Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we can combine the two checks and don't need to check the case where length == 0.

We can't combine it. Considering offset == 0 and length == data.len(), we should expect it to succeed, but in the code you gave, it will return SLICE_OUT_OF_BOUND error,

Comment on lines 104 to 105
let argc = machine.registers()[A4].clone();
let argv = machine.registers()[A5].clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L53->L55 is:

        let index = machine.registers()[A0].to_u64();
        let mut source = machine.registers()[A1].to_u64();
        let place = machine.registers()[A2].to_u64();

So how about use .to_u64() too

Suggested change
let argc = machine.registers()[A4].clone();
let argv = machine.registers()[A5].clone();
let argc = machine.registers()[A4].to_u64();
let argv = machine.registers()[A5].to_u64();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't have to clone the data here.

data_piece_id: &DataPieceId,
offset: u64,
length: u64,
args: Option<(u64, u64, u64)>,
Copy link
Collaborator

@eval-exec eval-exec Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about creating a new struct:

struct VmArgs {
    vm_id: u64,
    argc: u64,
    argv: u64,
}

to replace the anonymouse turple (u64, u64, u64)? This would improve code readability and make the parameters more self-explanatory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous internal review, Xuejie suggested that I use Option<(u64, u64, u64)>. I am not sure whether I should make a struct for them now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either.

Copy link
Collaborator

@xxuejie xxuejie Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did re-check offline communication logs, previously this is represented using the following structure:

pub enum BootArgsType {
    Static,
    Stream { vm_id: u64, argc: u64, argp: u64 },
}

Since the enum only had 2 variant: one that has 3 arguments, while the other is empty. I just believe that this 2-value enum, could in fact simply be an Option.

I have no opinion whether it is Option<(u64, u64, u64)> or Option<VmArgs>, both feel the same to me. I never say (u64, u64, u64) is better in readability to VmArgs or ArgvPointer or another name for the structure, both feel the same to me but I do understand a different opinion might occur.

eval-exec
eval-exec previously approved these changes Jan 17, 2025
Copy link
Collaborator

@xxuejie xxuejie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments on the necessity of following old code behaviors

args.location.length,
) {
Ok(val) => val,
Err(Error::SnapshotDataLoadError) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this, for a V2 clean design, can we just return the snapshot data load error here? I think there is no need to mimic v1 behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should return a VMError

let index = machine.registers()[A0].to_u64();
let mut source = machine.registers()[A1].to_u64();
let place = machine.registers()[A2].to_u64();
// To keep compatible with the old behavior. When Source is wrong, a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to ask here: if old behavior is not a concern, what a proper solution here should be?

Same question goes for Place parsing below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing this part of the compatibility code.

xxuejie
xxuejie previously approved these changes Jan 21, 2025
@mohanson
Copy link
Contributor Author

mohanson commented Jan 22, 2025

@eval-exec I have no idea why pthread lock: Invalid argument, can you take a look? https://github.com/nervosnetwork/ckb/actions/runs/12885771772/job/35965802195?pr=4785

@eval-exec
Copy link
Collaborator

eval-exec commented Jan 22, 2025

@eval-exec I have no idea why pthread lock: Invalid argument, can you take a look? https://github.com/nervosnetwork/ckb/actions/runs/12885771772/job/35965802195?pr=4785

Ok, That error occurred before, and it seems to be related to ckb-rocksdb. Investigating...

@eval-exec
Copy link
Collaborator

@eval-exec I have no idea why pthread lock: Invalid argument, can you take a look? https://github.com/nervosnetwork/ckb/actions/runs/12885771772/job/35965802195?pr=4785

Latest unit test failed:

──── STDERR:             ckb-script verify::tests::ckb_2023::features_since_v2021::check_typical_secp256k1_blake160_2_in_2_out_tx_with_state
thread 'verify::tests::ckb_2023::features_since_v2021::check_typical_secp256k1_blake160_2_in_2_out_tx_with_state' panicked at script/src/verify/tests/ckb_latest/features_since_v2021.rs:878:9:
step_cycles 3242342

https://github.com/nervosnetwork/ckb/actions/runs/12885771772/job/35970732805?pr=4785

@mohanson
Copy link
Contributor Author

@eval-exec I have no idea why pthread lock: Invalid argument, can you take a look? https://github.com/nervosnetwork/ckb/actions/runs/12885771772/job/35965802195?pr=4785

Latest unit test failed:

──── STDERR:             ckb-script verify::tests::ckb_2023::features_since_v2021::check_typical_secp256k1_blake160_2_in_2_out_tx_with_state
thread 'verify::tests::ckb_2023::features_since_v2021::check_typical_secp256k1_blake160_2_in_2_out_tx_with_state' panicked at script/src/verify/tests/ckb_latest/features_since_v2021.rs:878:9:
step_cycles 3242342

https://github.com/nervosnetwork/ckb/actions/runs/12885771772/job/35970732805?pr=4785

The incorrect test case has been fixed, please re-execute ci

eval-exec
eval-exec previously approved these changes Jan 22, 2025
@eval-exec eval-exec added the t:enhancement Type: Feature, refactoring. label Jan 22, 2025
xxuejie
xxuejie previously approved these changes Jan 23, 2025
quake
quake previously approved these changes Feb 7, 2025
@zhangsoledad zhangsoledad added this pull request to the merge queue Feb 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 7, 2025
@mohanson mohanson dismissed stale reviews from quake, xxuejie, and eval-exec via 0c31771 February 7, 2025 03:06
@quake quake added this pull request to the merge queue Feb 11, 2025
Merged via the queue into nervosnetwork:develop with commit b24941d Feb 11, 2025
32 checks passed
@doitian doitian mentioned this pull request Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
m:vm t:enhancement Type: Feature, refactoring.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants