Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible obsolete driver version when installing VCK5000 platform with recent linux kernel #12

Open
jbelot opened this issue Jul 8, 2024 · 3 comments

Comments

@jbelot
Copy link

jbelot commented Jul 8, 2024

Hi,

As mentioned in this issue, I have troubles trying to build drivers for the VCK5000 platform. I encountered several issues, so in order to be as exhaustive as possible, I will detail all the error messages that I get and the workarounds I used to "fix" them.

My guess is that the driver is no longer compatible with the more recent linux kernel (mine is 6.5.0-41-generic).

First issue

Indeed, when running a make in the driver directory, I have the following error:

make

make -C /usr/src/linux-headers-`uname -r` M=$PWD
make[1]: Entering directory '/usr/src/linux-headers-6.5.0-41-generic'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
  You are using:           gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
  CC [M]  ~/ROCm-air-platforms/driver/amdair_chardev.o
In file included from ./include/linux/linkage.h:7,
                 from ./arch/x86/include/asm/cache.h:5,
                 from ./include/linux/cache.h:6,
                 from ./include/linux/time.h:5,
                 from ./include/linux/compat.h:10,
                 from ~/ROCm-air-platforms/driver/amdair_chardev.c:4:
~/ROCm-air-platforms/driver/amdair_chardev.c: In function ‘amdair_chardev_init’:
./include/linux/export.h:29:22: error: passing argument 1 of ‘class_create’ from incompatible pointer type [-Werror=incompatible-pointer-types]
   29 | #define THIS_MODULE (&__this_module)
      |                     ~^~~~~~~~~~~~~~~
      |                      |
      |                      struct module *
~/ROCm-air-platforms/driver/amdair_chardev.c:208:37: note: in expansion of macro ‘THIS_MODULE’
  208 |         amdair_class = class_create(THIS_MODULE, amdair_dev_name());
      |                                     ^~~~~~~~~~~
In file included from ./include/linux/device.h:31,
                 from ~/ROCm-air-platforms/driver/amdair_chardev.c:5:
./include/linux/device/class.h:230:54: note: expected ‘const char *’ but argument is of type ‘struct module *’
  230 | struct class * __must_check class_create(const char *name);
      |                                          ~~~~~~~~~~~~^~~~
~/ROCm-air-platforms/driver/amdair_chardev.c:208:24: error: too many arguments to function ‘class_create’
  208 |         amdair_class = class_create(THIS_MODULE, amdair_dev_name());
      |                        ^~~~~~~~~~~~
./include/linux/device/class.h:230:29: note: declared here
  230 | struct class * __must_check class_create(const char *name);
      |                             ^~~~~~~~~~~~
....
cc1: some warnings being treated as errors
make[3]: *** [scripts/Makefile.build:251: ~/ROCm-air-platforms/driver/amdair_chardev.o] Error 1
make[2]: *** [/usr/src/linux-headers-6.5.0-41-generic/Makefile:2039: ~/ROCm-air-platforms/driver] Error 2
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-6.5.0-41-generic'
make: *** [Makefile:14: default] Error 2

First workaround

It seems that the error comes from the mention of THIS_MODULE in the line 208 of the file amdair_chardev.c, so I removed it, from:

	amdair_class = class_create(THIS_MODULE, amdair_dev_name());

into:

	amdair_class = class_create(amdair_dev_name());

Now the compilation seems to succeed, with these warnings though (which were also present before, but hidden in the ....):

make -C /usr/src/linux-headers-`uname -r` M=$PWD
make[1]: Entering directory '/usr/src/linux-headers-6.5.0-41-generic'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
  You are using:           gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
  CC [M]  ~/ROCm-air-platforms/driver/amdair_chardev.o
~/ROCm-air-platforms/driver/amdair_chardev.c: In function ‘address_store’:
~/ROCm-air-platforms/driver/amdair_chardev.c:651:9: warning: ignoring return value of ‘kstrtoul’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  651 |         kstrtoul(buf, 0, &address);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
~/ROCm-air-platforms/driver/amdair_chardev.c: In function ‘value_store’:
~/ROCm-air-platforms/driver/amdair_chardev.c:686:17: warning: ignoring return value of ‘kstrtouint’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  686 |                 kstrtouint(buf, 0, (uint32_t*)(&arg[1]));
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~/ROCm-air-platforms/driver/amdair_chardev.c: In function ‘create_aie_mem_sysfs’:
~/ROCm-air-platforms/driver/amdair_chardev.c:734:9: warning: ignoring return value of ‘sysfs_create_groups’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  734 |         sysfs_create_groups(&priv->kobj_aie, aie_sysfs_groups);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  LD [M]  ~/ROCm-air-platforms/driver/amdair.o
  MODPOST ~/ROCm-air-platforms/driver/Module.symvers
  LD [M]  ~/ROCm-air-platforms/driver/amdair.ko
  BTF [M] ~/ROCm-air-platforms/driver/amdair.ko
Skipping BTF generation for ~/ROCm-air-platforms/driver/amdair.ko due to unavailability of vmlinux
make[1]: Leaving directory '/usr/src/linux-headers-6.5.0-41-generic'

Second issue

But now, the load of the driver with sudo insmod amdair.ko does not seem to produce anything, as the /dev/amdair is not created. Note that the command sudo dmesg | grep amdair does not provide anything either.

Second workaround

I remarked the message:
Skipping BTF generation for ~/ROCm-air-platforms/driver/amdair.ko due to unavailability of vmlinux
when building the driver, so I tried to fix it following this Ubuntu forum.

Now the build succeed without this warning, but I still have the same problems as in second issue.

Are my workarounds relevant? Do you have any idea on how to fix my problem?

Thank you :)

@muwyse-amd
Copy link
Collaborator

muwyse-amd commented Jul 9, 2024

Hi, thanks for the questions and for giving the platform a try!

My guess is that the driver is no longer compatible with the more recent linux kernel (mine is 6.5.0-41-generic).

Correct. We developed the driver and platform using kernel 5.4; at least that is the version on my machine. The kernel headers APIs changed between 5.4 and 6.5, which is the root cause of the first issue. If I recall correctly, your proposed "First Workaround" was sufficient in my testing to compile, load, and run the weather stencil test on my machine with kernel 6.4 (with the same warn_unused_result warnings present).

I have not encountered your second issue and kernel loading worked after the first fix for me on kernel 6.4. To confirm, did you program the VCK5000 (and warm reboot) and then confirm the card was visible (lspci -vd 10ee:) prior to doing the driver load?

@eddierichter-amd
Copy link
Collaborator

Hi @jbelot, were you able to resolve this issue?

Also, because you mentioned it in your other issue on mlir-aie, we just added the ability to connect PL components directly to the AIEs via the PLIO in this commit. We provided some documentation but this feature is very fresh, would love to get some initial feedback on the documentation and the functionality if that is something you are intersted in.

@jbelot
Copy link
Author

jbelot commented Jul 29, 2024

Hi @muwyse-amd, @eddierichter-amd, and thank you for your answers.

I managed to get the weather stencil test to work (but only once!). The problem came from the fact that the card was not correctly programmed before loading the drivers (I hadn't run a warm reboot).
I then tried the vector vector add example, but without success, and since then I've been unable to get it working properly again.

I have to admit that I quickly abandoned this flow since my team works mainly with XRT drivers and it's not viable to have to reboot the server every time you switch from a “standard” project to MLIR AIE.

So I was more interested in the flow proposed by https://github.com/nqdtan/vck5000_vivado_ulp and modified it a little to insert .elf files generated by MLIR AIE. I have something that works on a very simple example, but I was wondering how to use PLIOs in this case.

Your recent changes seem particularly interesting, but I guess I'll have to reprogram the board to allow the use of PLIOs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants