-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add KOKKOS_FUNCTION etc. documentation #578
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,206 @@ | ||||||||||
|
||||||||||
``Function Annotation Macros`` | ||||||||||
============================== | ||||||||||
|
||||||||||
.. role::cpp(code) | ||||||||||
:language: cpp | ||||||||||
|
||||||||||
Defined in header ``<Kokkos_Macros.hpp>`` | ||||||||||
|
||||||||||
Usage: | ||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
KOKKOS_FUNCTION void foo(); | ||||||||||
KOKKOS_INLINE_FUNCTION void foo(); | ||||||||||
KOKKOS_FORCEINLINE_FUNCTION void foo(); | ||||||||||
auto l = KOKKOS_LAMBDA(int i) { ... }; | ||||||||||
auto l = KOKKOS_CLASS_LAMBDA(int i) { ... }; | ||||||||||
|
||||||||||
These macros deal with the management of split compilation for device and host code. | ||||||||||
They fullfill the same purpose as the ``__host__ __device__`` markup in CUDA and HIP. | ||||||||||
Generally only functions marked with one of these macros can be used inside of parallel | ||||||||||
Kokkos code - i.e. all code executed in parallel algorithms must be marked up by one | ||||||||||
of these macros. | ||||||||||
|
||||||||||
``KOKKOS_FUNCTION`` | ||||||||||
------------------- | ||||||||||
|
||||||||||
This macro is the equivalent of ``__host__ __device__`` markup in CUDA and HIP. | ||||||||||
Use it primarily on inline-defined member functions of classes and templated | ||||||||||
free functions | ||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
class Foo { | ||||||||||
public: | ||||||||||
// inline defined constructor | ||||||||||
KOKKOS_FUNCTION Foo() { ... }; | ||||||||||
|
||||||||||
// inline defined member function | ||||||||||
template<class T> | ||||||||||
KOKKOS_FUNCTION void bar() const { ... } | ||||||||||
}; | ||||||||||
|
||||||||||
template<class T> | ||||||||||
KOKKOS_FUNCTION void foo(T v) { ... } | ||||||||||
|
||||||||||
This macro is also used for non-templated free functions in conjunction with relocatable device code - | ||||||||||
i.e. if one wants to compile functions in some compilation unit A but call them from Kokkos | ||||||||||
parallel constructs defined in compilation unit B. | ||||||||||
|
||||||||||
|
||||||||||
``KOKKOS_INLINE_FUNCTION`` | ||||||||||
-------------------------- | ||||||||||
|
||||||||||
This macro is the equivalent of ``__host__ __device__ inline`` markup in CUDA and HIP. | ||||||||||
Use it primarily for non-templated free functions: | ||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
KOKKOS_INLINE_FUNCTION void foo() {} | ||||||||||
|
||||||||||
Note that it is NOT a bug to use this macro for inline-defined member function of classes, or | ||||||||||
templated free functions. It is simply redundant since they are by default inline. | ||||||||||
|
||||||||||
``KOKKOS_FORCEINLINE_FUNCTION`` | ||||||||||
------------------------------- | ||||||||||
|
||||||||||
This macro is the equivalent of ``__host__ __device__`` markup in CUDA and HIP, but also uses | ||||||||||
compiler dependent hints (if available) to enforce inlining. | ||||||||||
This can help with some functions which are often used, but it may also hurt compilation time, | ||||||||||
as well as runtime performance due to code-bloat. In some instances using ``KOKKOS_FORCEINLINE_FUNCTION`` | ||||||||||
excessively can even cause compilation errors due to compiler specific limits of maximum inline limits. | ||||||||||
Use this macro only in conjunction with performing extensive performance checks. | ||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
class Foo { | ||||||||||
public: | ||||||||||
KOKKOS_FORCEINLINE_FUNCTION | ||||||||||
Foo() { ... }; | ||||||||||
|
||||||||||
template<class T> | ||||||||||
KOKKOS_FORCEINLINE_FUNCTION | ||||||||||
void bar() const { ... } | ||||||||||
}; | ||||||||||
|
||||||||||
template<class T> | ||||||||||
KOKKOS_FORCEINLINE_FUNCTION | ||||||||||
void foo(T v) { ... } | ||||||||||
|
||||||||||
This macro is also used for non-templated free functions in conjunction with relocatable device code - | ||||||||||
i.e. if one wants to compile functions in some compilation unit A but call them from Kokkos | ||||||||||
parallel constructs defined in compilation unit B. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly here.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need some warning here, because otherwise people will pepper everything with the KOKKOS_FUNCTION instead of KOKKOS_INLINE_FUNCTION and it will fail depending on how stuff is set up. This didn't used to be much of a problem because everyone was using KOKKOS_INLINE_FUNCTION by default. But for a while now some team members have strenuously complained about that so folks are moving away from it :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But currently there is nothing special about relocatable device functions; they behave the same as free functions that might need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you want a function which uses relocatable device code capability you can't mark it KOKKOS_INLINE_FUNCTION. Only if you mark it KOKKOS_FUNCTION will it use that compilation/link technique or? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it's the same as for a regular |
||||||||||
|
||||||||||
|
||||||||||
``KOKKOS_LAMBDA`` | ||||||||||
----------------- | ||||||||||
|
||||||||||
This macro provides default capture clause and host device markup for lambdas. It is the equivalent of | ||||||||||
``[=] __host__ __device__`` in CUDA and HIP. | ||||||||||
It is used than creating C++ lambdas to be passed to Kokkos parallel dispatch mechanisms such as | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sentence is weird. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually its not because if you generate inside a KOKKOS_FUNCTION you can't use KOKKOS_LAMBDA currently |
||||||||||
``parallel_for``, ``parallel_reduce`` and ``parallel_scan``. | ||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
void foo(...) { | ||||||||||
... | ||||||||||
parallel_for("Name", N, KOKKOS_LAMBDA(int i) { | ||||||||||
... | ||||||||||
}); | ||||||||||
... | ||||||||||
parallel_reduce("Name", N, KOKKOS_LAMBDA(int i, double& v) { | ||||||||||
... | ||||||||||
}, result); | ||||||||||
... | ||||||||||
} | ||||||||||
|
||||||||||
.. warning:: Do not use ``KOKKOS_LAMBDA`` inside functions marked as ``KOKKOS_FUNCTION`` etc. or within a lambda marked with ``KOKKOS_LAMBDA``. Specifically do not use ``KOKKOS_LAMBDA`` to define lambdas for nested parallel calls. CUDA does not support that. Use plain C++ syntax instead: ``[=] (int i) {...}``. | ||||||||||
|
||||||||||
.. warning:: When creating lambdas inside of class member functions you may need to use ``KOKKOS_CLASS_LAMBDA`` instead. | ||||||||||
|
||||||||||
``KOKKOS_CLASS_LAMBDA`` | ||||||||||
----------------------- | ||||||||||
|
||||||||||
This macro provides default capture clause and host device markup for lambdas created inside of class member functions. It is the equivalent of | ||||||||||
``[=, *this] __host__ __device__`` in CUDA and HIP, capturing the parent class by value instead of by reference. | ||||||||||
dalg24 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
class Foo { | ||||||||||
public: | ||||||||||
Foo() { ... }; | ||||||||||
int data; | ||||||||||
|
||||||||||
KOKKOS_FUNCTION print_data() const { | ||||||||||
printf("Data: %i\n",data); | ||||||||||
} | ||||||||||
void bar() const { | ||||||||||
parallel_for("Name", N, KOKKOS_CLASS_LAMBDA(int i) { | ||||||||||
... | ||||||||||
print_data(); | ||||||||||
printf("%i %i\n",i,data); | ||||||||||
}); | ||||||||||
} | ||||||||||
}; | ||||||||||
|
||||||||||
Note: If one wants to avoid capturing a copy of the entire class in the lambda, one has to create local | ||||||||||
copies of any accessed data members, and can not use non-static member functions inside the lambda: | ||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
class Foo { | ||||||||||
public: | ||||||||||
Foo() { ... }; | ||||||||||
int data; | ||||||||||
|
||||||||||
KOKKOS_FUNCTION print_data() const { | ||||||||||
printf("Data: %i\n",data); | ||||||||||
} | ||||||||||
void bar() const { | ||||||||||
int data_copy = data; | ||||||||||
parallel_for("Name", N, KOKKOS_LAMBDA(int i) { | ||||||||||
... | ||||||||||
// can't call member functions | ||||||||||
// print_data(); | ||||||||||
// use the copy of data | ||||||||||
printf("%i %i\n",i,data_copy); | ||||||||||
}); | ||||||||||
} | ||||||||||
}; | ||||||||||
|
||||||||||
|
||||||||||
``KOKKOS_DEDUCTION_GUIDE`` | ||||||||||
----------------------- | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
This macro is used to annotate deduciont guides. | ||||||||||
masterleinad marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
|
||||||||||
.. code-block:: cpp | ||||||||||
|
||||||||||
template<class T, size_t N> | ||||||||||
class Foo { | ||||||||||
T data[N]; | ||||||||||
public: | ||||||||||
template<class ... Args> | ||||||||||
KOKKOS_FUNCTION | ||||||||||
Foo(Args ... args):data{static_cast<T>(args)...} {} | ||||||||||
|
||||||||||
KOKKOS_FUNCTION void print(int i) const { | ||||||||||
printf("%i\n",static_cast<int>(data[i])); | ||||||||||
} | ||||||||||
}; | ||||||||||
|
||||||||||
template<class T, class ... Args> | ||||||||||
KOKKOS_DEDUCTION_GUIDE | ||||||||||
Foo(T, Args...) -> Foo<T, 1+sizeof...(Args)>; | ||||||||||
|
||||||||||
void bar() { | ||||||||||
Kokkos::parallel_for(1, KOKKOS_LAMBDA(int) { | ||||||||||
Foo f(1, 2., 3.2f); | ||||||||||
f.print(0); | ||||||||||
f.print(1); | ||||||||||
f.print(2); | ||||||||||
}); | ||||||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that supposed to work with
SYCL
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in conjunction" implies its not the only thing you need to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I still find the last sentence more confusing then helpful and would wait with saying anything for functions functions used in separable compilarion/relocatable device code until after discussing kokkos/kokkos#5993.