Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steam library on ksmbd rdma share - (Disk write failure, state 0x202) #498

Open
dcontiveros opened this issue Dec 31, 2024 · 30 comments
Open

Comments

@dcontiveros
Copy link

dcontiveros commented Dec 31, 2024

I'm running into interesting error messages when I attempt to install/update Steam games. They are always of the type:

Disk Write failed
Update failed

Here is dmesg output (debug):

[Tue Dec 31 14:21:24 2024] ksmbd: converted name = SteamLibrary/steamapps/appmanifest_39140.acf~RF2bd9ff7.TMP
[Tue Dec 31 14:21:24 2024] ksmbd: get query maximal access context
[Tue Dec 31 14:21:24 2024] ksmbd: can not get linux path for SteamLibrary/steamapps/appmanifest_39140.acf~RF2bd9ff7.TMP, rc = -2
[Tue Dec 31 14:21:24 2024] ksmbd: Error response: c0000034
[Tue Dec 31 14:21:24 2024] ksmbd: Failed to process 5 [-9]
[Tue Dec 31 14:21:24 2024] ksmbd: credits: requested[1] granted[1] total_granted[1905]

The issue I am having is this is not always reproducible. I have an alternate samba share that I can test on as well to see if I get these issues. It seems like ksmbd can't seem to find the file written by steam. I've also tested the share with fio with extremely good results so I doubt it could be a ksmbd issue, but possibly a config issue.

Here is my config:

; see ksmbd.conf(5) for details

[global]
        ; global parameters
        rdma_capable = true
        bind interfaces only = no
        deadtime = 0
        guest account = nobody
        ipc timeout = 0
        map to guest = never
        max active sessions = 1024
        max connections = 128
        max open files = 100000
        netbios name = KSMBD IB SERVER
        restrict anonymous = 0
        server max protocol = SMB3_11
        server min protocol = SMB3_11
        server multi channel support = yes
        server signing = disabled
        server string = SMB IB SERVER
        share:fake_fscaps = 64
        smb3 encryption = auto
        tcp port = 445
        workgroup = WORKGROUP
        smbd max io size = 100G
        smb2 max credits = 8192

        ; default share parameters
        browseable = yes
        create mask = 0744
        crossmnt = yes
        directory mask = 0755
        force create mode = 0000
        force directory mode = 0000
        guest ok = no
        hide dot files = yes
        inherit owner = no

[games]
        comment = Game Share
        read only = no
        path = /smb/games
        valid users = REDACTED
        write list = REDACTED
        force group = smbusers
        guest ok = no
        writable = yes
        oplocks = no

Would love to resolve this matter.

Versions:

Kernel: 6.12.7 with SMB Direct enabled
 
$ ksmbd.control -V
[ksmbd.control/25781]: INFO: ksmbd-tools version : 3.5.3

OS: Almalinux 9 (although custom kernel should negate this)
@namjaejeon
Copy link
Owner

The issue I am having is this is not always reproducible.

I need whole packets dump(wireshark dump) to find root-cause. What is local filesystem type? ext4 ?

I've also tested the share with fio with extremely good results

Can you explain more? what is fio ? and what is good results ?

@namjaejeon
Copy link
Owner

And It is not related to RDMA. right ? I am wondering this issue is reproducible without rdma(smb-direct).

@dcontiveros
Copy link
Author

@namjaejeon appreciate the quick response. Happy new year! I hope to become more involved in this project.

My setup consists of the following:

KSMBD server:

  1. Mellanox connectX 4
  2. XFS formatted Intel P5520
  3. ksmbd with the above versions

Client:
Windows 11 for workstations - Mellanox connectX 4

Client:
macOS Sequoia - regular ethernat

I can mount and use both for file transfers. It is just Steam I have issue with.

Fio is a disk benchmarking tool that I use to exhaust SLC cache on regular NVMe drives. It works on network shares as well. It's the only way I have come up with saturating 40gbps of data.

I will take dumps from both OSes. Steam is notorious for not allowing SMB shares, so this may be the issue, but some more information on the errors would be helpful. I will test both OSes today. What would you ideally like to see to be able to troubleshoot this?

I see that RDMA negotiates SMB2. I will try and force SMB3, but I only see SMB2 issues in the log. A quick test via Steam on macOS doesn't run into this issue. This is interesting.

@namjaejeon
Copy link
Owner

I see that RDMA negotiates SMB2. I will try and force SMB3, but I only see SMB2 issues in the log.

I am not sure that this error message cause your issue. smb client try to open file, if it is not found, It will try to create file. It may be normal routine. if you remove "server multi channel support" parameter in ksmbd.conf, RDMA will be disable with windows client. I will try reproduce this issue using Steam also.

@dcontiveros
Copy link
Author

I have disabled RDMA connectivity and have ensured a network map drive has been shared that is accessible only over an ethernet connection. Have also made sure the ksmbd connectivity is originating from an ethernet interface.

It appears I get these Steam logs:

[2025-01-01 19:11:10] AppID 39140 update changed : Running Update,
[2025-01-01 19:11:10] AppID 39140 update changed : Running Update,Validating,
[2025-01-01 19:11:10] AppID 39140 update changed : Running Update,
[2025-01-01 19:11:10] AppID 39140 update changed : Running Update,Preallocating,
[2025-01-01 19:11:11] AppID 39140 update canceled : Failed to preallocate (Disk write failure) "T:\SteamLibrary\steamapps\downloading\39140\data\field\flevel.lgp"
[2025-01-01 19:11:11] AppID 39140 update changed : Running Update,Preallocating,Stopping,
[2025-01-01 19:11:11] AppID 39140 preallocated 263 files (1187 MB)
[2025-01-01 19:11:11] AppID 39140 update changed : Running Update,Stopping,
[2025-01-01 19:11:11] AppID 39140 update changed : None
[2025-01-01 19:11:11] AppID 228980 state changed : Update Required,Fully Installed, (Disk write failure) (Update delayed for 27846 secs)
[2025-01-01 19:11:11] AppID 39140 state changed : Update Required,Update Queued, (Disk write failure)
[2025-01-01 19:11:11] AppID 39140 state changed : Update Required,Update Paused,
[2025-01-01 19:11:11] AppID 39140 scheduler finished : removed from schedule (result Disk write failure, state 0x202)

However, I have a regular samba share on another machine, non rdma, and I am able to install on this network drive.

I have successfully written data to the ksmbd share. Before taking any network dumps, are there any options that indicate to ksmbd on how to report free space? The 0x202 error seems to indicate out of space, but Steam itself show the correct space left on the network share,

@dcontiveros dcontiveros changed the title Steam library on ksmbd rdma share Steam library on ksmbd rdma share - (Disk write failure, state 0x202) Jan 2, 2025
@xhebox
Copy link
Contributor

xhebox commented Jan 2, 2025

I've met some thing very similar.

ksmbd: xhebox332 flags=34816,CREAT=64, name=games/steamlibrary/.temp_write_2cbe3402
Thu Jan  2 13:43:08 2025 kern.info kernel: [84677.495791] ksmbd: xhebox32 rc=-9
Thu Jan  2 13:43:08 2025 kern.info kernel: [84677.499157] ksmbd: Error response: c0000034
Thu Jan  2 13:43:08 2025 kern.info kernel: [84677.503353] ksmbd: Failed to process 5 [-9]
--- a/smb2pdu.c	2024-12-02 21:19:09.000000000 +0800
+++ a/smb2pdu.c	2024-12-02 21:19:09.000000000 +0800
@@ -2633,6 +2633,7 @@
 	umode_t mode;
 	int rc;

+	ksmbd_debug(SMB, "xhebox332 flags=%d,CREAT=%d, name=%s\n", open_flags, O_CREAT, name);
 	if (!(open_flags & O_CREAT))
 		return -EBADF;

@@ -3336,6 +3347,7 @@
 #endif
 				posix_mode,
 				req->CreateOptions & FILE_DIRECTORY_FILE_LE);
+		ksmbd_debug(SMB, "xhebox32 rc=%d\n", rc);
 		if (rc) {
 			if (rc == -ENOENT) {
 				rc = -EIO;

I can confirm that, ** in my case **, smb2_creat is called without O_CREATE. Something may be wrong in _create_open_flags.

EDIT: I found another different error log, that may be the real problem of this.

Thu Jan  2 14:02:14 2025 kern.info kernel: [85823.603347] ksmbd: xhebox12 fp=games/steamlibrary/steamapps/downloading/881100/data,fn=data,err=0
Thu Jan  2 14:02:14 2025 kern.info kernel: [85823.608472] ksmbd: xhebox39 rc=0
Thu Jan  2 14:02:14 2025 kern.info kernel: [85823.608476] ksmbd: xhebox40 rc=0
Thu Jan  2 14:02:14 2025 kern.info kernel: [85823.611699] ksmbd: Error response: c000000d
Thu Jan  2 14:02:14 2025 kern.info kernel: [85823.614923] ksmbd: xhebox41 rc=0
Thu Jan  2 14:02:14 2025 kern.info kernel: [85823.627054] ksmbd: Failed to process 5 [-22]

@namjaejeon
Copy link
Owner

@xhebox

I think that windows client of stream library doesn't add create flags in smb2 create request.

If you can dump packets at the timing, I can confirm if it is ksmbd issue or not.

Thanks!

@xhebox
Copy link
Contributor

xhebox commented Jan 2, 2025

OK, I can confirm the root cause on my side. The real failure of game update should come from -22 which fails to open the download temp file.

When I check the tmp file game_down_dir\data\data.wak, I found that there is no such file. And game_down_dir\data is created as a regular file!

Back to the log, I found this is due to vfs_kern_path did not set next[0] = '/' at error. Thus game_down_dir\data is created instead of game_down_dir\data\data.wak. this is a bug and it corrupt steam.

And this error in vfs_kern_path tracked back to one of the -9:

Thu Jan  2 17:06:45 2025 kern.info kernel: [96894.212455] ksmbd: xhebox291 pre=0,dispos=1,copts=0
...
Thu Jan  2 17:06:45 2025 kern.info kernel: [96894.257808] ksmbd: xhebox311 flags=208800,CREAT=40, name=games/steamlibrary/steamapps/downloading/881100/data
Thu Jan  2 17:06:45 2025 kern.info kernel: [96894.257809] ksmbd: xhebox32 rc=-9,create=0

According to the log, dispos is FILE_OPEN not FILE_OPEN_IF/FILE_CREATE. I guess that steam will try to open the parent dir, then create the child file direct, fallbacks to mkdir -p and touch at the end.

Finally, the patch that works for me:

--- a/vfs.c	2024-12-02 21:19:09.000000000 +0800
+++ a/vfs.c	2024-12-02 21:19:09.000000000 +0800
@@ -2853,6 +2853,7 @@
 					      filepath,
 					      flags,
 					      path);
+			if (!is_last) next[0] = '/';
 			if (err)
 				goto out2;
			else if (is_last)

@namjaejeon
Copy link
Owner

namjaejeon commented Jan 2, 2025

Okay, Good catch:)
I have clean-up your patch. we don't need to fill '/' again.

Does it work fine with steam library ?

diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 88d167a5f8b7..40f08eac519c 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -1264,6 +1264,8 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                                              filepath,
                                              flags,
                                              path);
+                       if (!is_last)
+                               next[0] = '/';
                        if (err)
                                goto out2;
                        else if (is_last)
@@ -1271,7 +1273,6 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                        path_put(parent_path);
                        *parent_path = *path;
 
-                       next[0] = '/';
                        remain_len -= filename_len + 1;
                }

@dcontiveros
Copy link
Author

dcontiveros commented Jan 2, 2025

How would I apply this patch? I have it staged inside the linux kernel I cloned from git.

Update: LMAO apologies I think you changed the patch a while ago. let me attempt to apply your latest. I have your patch with the 88d1 hash

@namjaejeon
Copy link
Owner

namjaejeon commented Jan 2, 2025

@dcontiveros
Yes, I have update it again. Please apply it and try to test it.
Can you build kernel source and install kernel image to your target ?
If yes, You can apply this patch with this command.

patch -p1 < steam_fix.patch.

@dcontiveros
Copy link
Author

dcontiveros commented Jan 2, 2025

Oh interesting you are using the patch command vs. git apply:

$ cat vfs.diff
diff --git a/vfs.c b/vfs.c
index e1e9d9c..cae4330 100644
--- a/vfs.c
+++ b/vfs.c
@@ -2852,6 +2852,8 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                                              filepath,
                                              flags,
                                              path);
+                       if (!is_last)
+                               next[0] = '/';
                        if (err)
                                goto out2;
                        else if (is_last)
@@ -2859,7 +2861,6 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                        path_put(parent_path);
                        *parent_path = *path;

-                       next[0] = '/';
                        remain_len -= filename_len + 1;
                }

$ git apply vfs.diff
error: patch failed: vfs.c:2852
error: vfs.c: patch does not apply

Let me try the patch method now,

@namjaejeon
Copy link
Owner

Ah, Please apply it.

diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 88d167a5f8b7..40f08eac519c 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -1264,6 +1264,8 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                                              filepath,
                                              flags,
                                              path);
+                       if (!is_last)
+                               next[0] = '/';
                        if (err)
                                goto out2;
                        else if (is_last)
@@ -1271,7 +1273,6 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                        path_put(parent_path);
                        *parent_path = *path;
 
-                       next[0] = '/';
                        remain_len -= filename_len + 1;
                }

@dcontiveros
Copy link
Author

Ok let me try the git apply on your latest update.

I pulled down ksmbd via git, and already staged the changes to the kernel build. Let me try getting this patch in,

@namjaejeon
Copy link
Owner

Okay, Let me know if you have the issue while applying the patch.

@dcontiveros
Copy link
Author

dcontiveros commented Jan 2, 2025

I pulled down ksmbd from git. I am in master branch, but I don;'t see the git hash in your path:

# log entry
$ git log -1
commit 73903472b33391ffa760ef23c8dae2ee61f23ade (HEAD -> master, origin/master, origin/HEAD)
Author: Wentao Liang <liangwentao@iscas.ac.cn>
Date:   Tue Dec 24 08:58:47 2024 +0900

# check vfs.c hash entries for patch
$  git log --pretty=format:"%H" -- vfs.c | grep -i 88d1
88d1f1ef2f91b1f19530a44e5e7c4f2aabbe6d44

If I reconcile with the linux latest from 6.12.8:

 $ git log -1
commit 77f85ccd3618f324d221f0faaed6d9cdc118c74a (HEAD, tag: v6.12.8, origin/linux-6.12.y)
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Jan 2 10:34:26 2025 +0100

$ git log --pretty=format:"%H" -- fs/smb/server/vfs.c| grep 88d
$ 

(no matches)

Feels like I'm missing something. Are the index lines supposed to match up with a specific commit on vfs.c ?

@dcontiveros
Copy link
Author

dcontiveros commented Jan 2, 2025

Here is my latest attempt:

### cat the patch
$ cat vfs.diff
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 88d167a5f8b7..40f08eac519c 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -1264,6 +1264,8 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                                              filepath,
                                              flags,
                                              path);
+                       if (!is_last)
+                               next[0] = '/';
                        if (err)
                                goto out2;
                        else if (is_last)
@@ -1271,7 +1273,6 @@ int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name,
                        path_put(parent_path);
                        *parent_path = *path;

-                       next[0] = '/';
                        remain_len -= filename_len + 1;
                }

### check patch first
$ git apply --check vfs.diff
error: patch failed: fs/smb/server/vfs.c:1264
error: fs/smb/server/vfs.c: patch does not apply

@namjaejeon
Copy link
Owner

@dcontiveros
Okay, You have tried to check github ksmbd. You can check it with next branch. I have applied the patch to #next branch. Please clone and check it.

git clone https://github.com/namjaejeon/ksmbd --branch=next

@xhebox
Copy link
Contributor

xhebox commented Jan 3, 2025

I haven't notice steam failures anymore.

Just a reminder, I think this bug exists in previous ksmbd releases(before v640, e.g.).

@namjaejeon
Copy link
Owner

@xhebox can you submit the patch to the mailing list(with rdma patch also) ?

@xhebox
Copy link
Contributor

xhebox commented Jan 3, 2025

@xhebox can you submit the patch to the mailing list(with rdma patch also) ?

OK, maybe tomorrow. I am on vacation. BTW, do we need to fix other versions...? I have no experience on backporting.

@namjaejeon
Copy link
Owner

Ah, You can make the patch based on the latest linux kernel version.(linux-6.13-rc5). If this patch is merged into mainline, it will be propagated to other versions(stable kernels).

@dcontiveros
Copy link
Author

I am still experiencing errors when installing Counter-Strike2. I do see the same behavior where some games install and other don't. Should I be using an ext4 format? it is currently xfs.

I did manage to build against the next branch as I built kernel 6.12.8-dirty.

@dcontiveros
Copy link
Author

I do not believe I have built this properly, as I cannot seem to get this working on either stable or mainline branches.

I am attempting the following procedure:

Installing as a part of the kernel

Here is validation:

# post copy check
$ cd fs/ksmbd
$ git branch
  master
* next

# check KConfig/MakeFile
$ grep ksmbd fs/Kconfig
source "fs/ksmbd/Kconfig"
$ grep ksmbd fs/Makefile
obj-$(CONFIG_SMB_SERVER)        += ksmbd/

# check SMB Direct in config
$ grep SMBDIRECT .config
CONFIG_SMB_SERVER_SMBDIRECT=y

# make output
fs/ksmbd/smb1pdu.c: In function ‘smb_locking_andx’:
fs/ksmbd/smb1pdu.c:1715:30: error: ‘struct file_lock’ has no member named ‘fl_type’
fs/ksmbd/smb1pdu.c:1725:30: error: ‘struct file_lock’ has no member named ‘fl_type’
fs/ksmbd/smb1pdu.c:1726:30: error: ‘struct file_lock’ has no member named ‘fl_flags’
fs/ksmbd/smb1pdu.c:1788:60: error: ‘struct file_lock’ has no member named ‘fl_file’
fs/ksmbd/smb1pdu.c:1789:64: error: ‘struct file_lock’ has no member named ‘fl_file’
fs/ksmbd/smb1pdu.c:1918:22: error: ‘struct file_lock’ has no member named ‘fl_type’
fs/ksmbd/smb1pdu.c:1954:60: error: ‘struct file_lock’ has no member named ‘fl_file’
fs/ksmbd/smb1pdu.c:1955:57: error: ‘struct file_lock’ has no member named ‘fl_file’
fs/ksmbd/smb1pdu.c:2024:22: error: ‘struct file_lock’ has no member named ‘fl_type’
make[4]: *** [scripts/Makefile.build:229: fs/ksmbd/smb1pdu.o] Error 1
make[3]: *** [scripts/Makefile.build:478: fs/ksmbd] Error 2

Feels like I may be doing something wrong here.

@namjaejeon
Copy link
Owner

Can you turn SMB_INSECURE_SERVER off in kernel config ?

@dcontiveros
Copy link
Author

dcontiveros commented Jan 3, 2025

I had to do a few more thing to compile against 6.12.8 tag:

  1. Comment out fs/smb entries in both KConfig and Makefile
  2. Ensure that SMB_INSECURE_SERVER was disabled in kernel config.
  3. Disable BTL debugging
  4. make clean

I then was able to actually install CounterStrike 2 successfully. Will test some more, but it seems we are good with this patch.

I have not tested mainline tree. What is your typical testing workflow? I was a bit confused by the patch, and would like to build latest against my kernel for future reference.

Thanks!

@namjaejeon
Copy link
Owner

@dcontiveros Thanks for your confirmation:) I will apply this patch to mainline. You need to wait a bit more for this.

@xhebox If you don't have the time, I will directly make the patch and apply it to mainline. Let me know your mail address and your real name to add signed-off-by tag to the patch.

@xhebox
Copy link
Contributor

xhebox commented Jan 6, 2025

@xhebox If you don't have the time, I will directly make the patch and apply it to mainline. Let me know your mail address and your real name to add signed-off-by tag to the patch.

I got time today. Let me try.

@dcontiveros
Copy link
Author

Did this patch get sent upstream? wondering what I need to do to try to get it into Mainline.

@xhebox
Copy link
Contributor

xhebox commented Jan 9, 2025

Did this patch get sent upstream? wondering what I need to do to try to get it into Mainline.

Yes, it is merged into upstream repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants