Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLPX-93331 stbtrace invocations crash on 24.04 #101

Conversation

palash-gandhi
Copy link

@palash-gandhi palash-gandhi commented Feb 10, 2025

Problem

stbtrace invocations fail with:
1.

delphix@pg-ntp:~$ sudo python3 /usr/bin/stbtrace io -a op -s avgLatency,latency,count,throughput
/virtual/main.c:45:33: error: incomplete definition of type 'struct request'
   45 |     struct gendisk *diskp = reqp->rq_disk;
      |                             ~~~~^
include/linux/blkdev.h:32:8: note: forward declaration of 'struct request'
   32 | struct request;
...
...

In this kernel commit, the struct was moved from linux/blkdev.h to linux/blk-mq.h.

  1. After fixing this, I hit this error:
/virtual/main.c:46:35: error: no member named 'rq_disk' in 'struct request'

In this kernel commit, the rq_disk field of the struct was removed and replaced with q->disk.

  1. After fixing this, I hit this error:
[2025-02-11T09:02:46,006][ERROR][common.TracingScriptExecutor#readLine:50][Thread-11][] There were 1 errors in the last 10 minutes for script [io]. The first <= 50 unique error lines seen were:
cannot attach kprobe, Invalid argument
[2025-02-11T09:02:46,160][WARN][analytics.impl.datacollector.TracingDataCollector#reachedEof:205][Thread-10][] Tracing process for script [io] died without collecting any data. This likely means that the script failed to compile.


delphix@dlpx-palashgandhi-os-upgrade-qar-161994-766f97e2:~$ sudo /usr/bin/stbtrace io -a op -s avgLatency,latency,count,throughput
cannot attach kprobe, Invalid argument
Failed to attach BPF program b'disk_io_done' to kprobe b'blk_account_io_done', it's not traceable (either non-existing, inlined, or marked as "notrace")

According to iovisor/bcc#5124 and the associated fix, this should be replaced with blk_mq_end_request

Solution

  1. Include linux/blk-mq.h in our scripts.
  2. Replace the use of rq_disk with q->disk
  3. Replace blk_account_io_done with blk_mq_end_request

Testing Done

pre_checkin: https://selfservice-jenkins.eng-tools-prd.aws.delphixcloud.com/job/blackbox-self-service/162274/console

11:49:56  test_create_domain - SUCCESS (0:04:37.032303)

storage which will run the analytics_positive suite: https://selfservice-jenkins.eng-tools-prd.aws.delphixcloud.com/job/blackbox-self-service/162276/console

os_tests which run the performance_diagnostics_positive suite to test estat: https://selfservice-jenkins.eng-tools-prd.aws.delphixcloud.com/job/blackbox-self-service/162278/console

@palash-gandhi palash-gandhi force-pushed the dlpx/pr/palash-gandhi/0901cec7-db86-49cc-9bcf-fc828b9c397f branch from 8d6ac9b to cab9ef1 Compare February 10, 2025 16:59
@palash-gandhi palash-gandhi force-pushed the dlpx/pr/palash-gandhi/0901cec7-db86-49cc-9bcf-fc828b9c397f branch from 720a8be to 3f9e363 Compare February 10, 2025 17:23
@palash-gandhi palash-gandhi marked this pull request as ready for review February 12, 2025 20:33
@palash-gandhi palash-gandhi enabled auto-merge (squash) February 12, 2025 20:46
@palash-gandhi palash-gandhi merged commit b031173 into os-upgrade Feb 12, 2025
6 of 7 checks passed
@palash-gandhi palash-gandhi deleted the dlpx/pr/palash-gandhi/0901cec7-db86-49cc-9bcf-fc828b9c397f branch February 12, 2025 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants