- Fix the package version #2460
- Implement batched serial laswp #2395
- implement batched serial iamax #2399
- Implement batched serial pbtrs #2330
- Implement batched serial pbtrf #2322
- Implement batched serial pttrs #2277
- gemm perf_test: print matrix sizes #2362
- Modify validity checks for output views sizes in svd #2350
- Improved convergence and robustness of Runge-Kutta integrators #2229
- Don't use bulk sort in KokkosSparse::sort_crs_matrix sometimes #2353
: throw if posix_memalign fails #2368
- Eti extern marking #2292
- Add KokkosKernels::eager_initialize() to common #2317
- Put default types in KokkosKernels namespace #2341
- Add MAGMA TPL support for GESV on HIP backend #2326
- BLAS - gemv: using fallback when mode is 't' or 'c' and onemkl is used #2272
- SerialInverseLU: fix overflow in integer multiplication #2410
- Fix potential overflow issue in spiluk #2409
- Mult result conversion #2405
- Blas1 asum: workaround for openblas error with short vectors #2384
- Set
variables to value instead of variable name #2380 - Block Sptrsv fixes #2376
- Fix set-but-unused in Test_ODE_BDF #2355
- sparse_sort_crs: fix column shuffle indices #2346
- Fix #2344: SVD hanging #2345
- Some compilers throw shadow warnings in static functions #2297
- A couple platforms do not correctly handle static complexes #2285
- Help gcc/8.3 with ctad issue #2265
- Clean and replace forbidden names for macros and symbols (see identifiers)
- Update atomic function usage ahead of Kokkos deprecation and removal
- Deprecate redundant team-level sort functions #2306
- Free allocated
#2407 - Reduce duplicated code in trsv #2388
- perf_tests: remove false dependence on google test #2385
#2354- remove unneeded volatile qualifier for Kokkos::Single #2333
- CI:
sanitizer and most ofundefined
sanitizer #2408 - Workflow volta70 #2356
- AT-2: adding non-TPL build for HIP backend #2329
- Workflows: Add remaining spr and bdw checks #2321
- Remove review trigger and group github-{BDW,H100,MI201} under github-AT2 #2320
- Don't error out if graph unit tests disabled #2305
- Restore size_t as default offset, in Tribits builds #2313
- Improve crs/bsr sorting performance #2293
- SpAdd handle: delete sort_option getter/setter #2296
- Improve GH action to produce release artifacts #2312
- coo2csr: add parens to function calls #2318
- Add support for BSRs to sptrsv #2281
- clang-format version update, increase column limit to 120. #2255
- Add big reformat commits to ignore revs for blame #2286
- c++17: add
attribute #1493
- Performance improvement: disable cuBLAS dot wrapper #2206
- SPMV TPLs: improve profile region labels #2219
- cusparse spgemm: provide non-null row-ptr #2213
- spmv_mv wrappers for rocsparse #2233
- Update rocsparse algo defaults #2245
- cmake: add CMake language support for CUDA/HIP #2173
- FindTPLROC*: updates to fix export of import targets #2250
Enable 3 at2 builds #2210
At2 ROCM+TPL fixes, remove volta70 too #2182
Add AutoTester2 CI Configs (Sans Power9 & ROCM w/ TPLS) #2174
Kokkos Kernels: initial security policy #2220
Sparse - BsrMatrix: adding new wiki example for documentation #2228
Add testing for transpose corner cases #2234
spgemm unit test: change matrix value distribution #2241
docs.yml: change kokkos version to latest release #2199
- Bigger sptrsv cleanup #2280
- Sparse - SpGEMM: labeling spgemm_symbolic in TPL layer #2193
- A little sptrsv cleanup before the main block effort #2247
- sparse: replace macros with constexpr bools #2260
- spgemm: add profiling regions to native implementations #2253
- Sparse - SpMV: removing calls to unsuported oneapi - MKL functions #2274
- Sycl gemv beta #2276
- Unify alignPtrTo implementation #2275
- SpMV: Test NaN, fix NaN handling when beta=0 #2188
- KokkosLapack_svd_tpl_spec_decl: defer to MKL spec when LAPACK also enabled #2171
- Fix spmv regressions #2204
- Sparse - CrsToBsr: fix type mismatch #2242
- Fix logic around merge path with TPLs #2240
- In deprecated spmv, fix Controls algorithm mapping #2246
- kokkoskernels_tpls.cmake: remove duplicates arguments when creating a… #2244
- sparse: spadd_symbolic fences before device values used on host #2259
- Fix warning about memcpy #2252
- sycl: use alternative
when SYCL is enabled (SpGEMM) #2262 - Rename
, allow it to infer argument type #2261 - Workarounds for removed cusparse functions #2270
- handle_t* -> std::unique_ptr<handle_t> in Bsr SpMV unit tests #2269
- sparse: block spiluk fixes #2172
- magma: tpl interaction fixes #2176, #2178, #2181
- trsv: Add early return if numRows == 0 in trsv to avoid integer divide-by-zero error #2180
- blas tpl: resolve potential duplicate symbol #2183
- spmv: permformance fix, add back special path for rank-2 x/y with 1 column #2164, #2168
- BsrMatrix: Fix HostMirror typedef #2196
- GA: Fix macOS docs build #2190
4.3.00 (2024-03-19)
- Syr2 #1942
- Adding cuSOLVER #2038
- Fix for MAGMA with CUDA #2044
- Adding rocSOLVER #2034
- Fix rocSOLVER issue with Trilinos dependency #2037
- Lapack - SVD #2092
- Add block support to all SPILUK algorithms #2064
- Sptrsv improvements
- GMRES: Add support for BSR matrices #2097
- Spmv handle #2126
- Option to apply RCM reordering to extracted CRS diagonal blocks #2125
- Adding adaptive BDF methods #1930
- Add HIPManagedSpace support #2079
- Axpby: improvement on unification attempt logic and on the execution of a diversity of situations #1895
- Use execution space operator== #2136
- Add TPL support for KokkosBlas::dot #1949
- Add CUDA/HIP TPL support for KokkosSparse::spadd #1962
- Don't call optimize_gemv for one-shot MKL spmv #2073
- Async matrix release for MKL >= 2023.2 in SpMV #2074
- BLAS - MKL: fixing HostBlas calls to handle MKL_INT type #2112
- Link std::filesystem for IntelLLVM in perf_test/sparse #2055
- Fix Cuda TPL finding #2098
- CMake: error out in certain case #2115
- par_ilut: Update documentation for fill_in_limit #2001
- Wiki examples for BLAS2 functions are added #2122
- github workflows: update to v4 (use Node 20) #2119
- gemm3 perf test: user CUDA, SYCL, or HIP device for kokkos:initialize #2058
- Lapack: adding svd benchmark #2103
- Benchmark: modifying spmv benchmark to fix interface and run range of spmv tests #2135
- Experimental hip cleanup #1999
- iostream clean-up in benchmarks #2004
- Update: implicit capture of 'this' via '[=]' is deprecated in C++20 warnings #2076
- Remove all mentions of HBWSpace #2101
- Change name of yaml-cpp to yamlcpp (trilinos/Trilinos#12710) #2099
- Hands off namespace Kokkos::Impl - cleanup couple violations that snuck in #2094
- Kokkos Kernels: update version guards to drop old version of Kokkos #2133
- Sparse MKL: changing the location of the MKL_SAFE_CALL macro #2134
- Bspgemm cusparse hang #2008
- bhalf_t fix for isnan function #2007
- Fence Kokkos before timed iterations #2066
- CUDA 11.2.1 / cuSPARSE 11.4.0 changed SpMV enums #2011
- Fix the spadd API #2090
- Axpby reduce deep copy calls #2081
- Correcting BLAS test failures with cuda when ETI_ONLY = OFF (issue #2061) #2077
- Fix weird Trilinos compiler error #2117
- Fix for missing STL inclusion #2113
- Fix build error in trsv on gcc8 #2111
- Add a workaround for compilation errors with cuda-12.2.0 + gcc-12.3 #2108
- Increase tolerance on gesv test (Fix #2123) #2124
- Fix usage of RAII to set cusparse/rocsparse stream #2141
- Spmv bsr matrix fix missing matrix descriptor (rocsparse) #2138
4.2.01 (2024-01-17)
- LAPACK: magma tpl fixes #2044
- BLAS: fix bug in TPL layer of
#2052 - ROCm 6 deprecation fixes for rocsparse #2050
4.2.00 (2023-11-06)
- Implement BLAS2 syr() and her() functionalities under kokkos-kernels syr() #1837
- New component added for the implementation of LAPACK algorithms and to support associated TPLs #1985
- Fix some issue with unit-test definition for SYCL backend in the new LAPACK component #2024
- Extract diagonal blocks from a CRS matrix into separate CRS matrices #1947
- Adding exec space instance to spmv #1932
- Add merge-based SpMV #1911
- Stream support for Gauss-Seidel: Symbolic, Numeric, Apply (PSGS and Team_PSGS) #1906
- Add a MergeMatrixDiagonal abstraction to KokkosSparse #1780
- Newton solver #1924
- MDF performance improvements exposing more parallelism in the implementation
- Improvements to the Block Crs Matrix-Vector multiplication algorithm
- Only deep_copy from device to host if supernodal sptrsv algorithms are used #1993
- Improve KokkosSparse_kk_spmv #1979
- Add 5 warm-up calls to get accurate, consistent timing
- Print out the matrix dimensions correctly when loading from disk
- sparse/impl: Make PSGS non-blocking #1917
- ODE: changing layout of temp mem in RK algorithms #1908
- ODE: adding adaptivity test for RK methods #1896
- Common: remove half and bhalf implementations (now in Kokkos Core) #1981
- KokkosKernels: switching from printf macro to function #1977
- OrdinalTraits: constexpr functions #1976
- Parallel prefix sum can infer view type #1974
- BSPGEMM: removing cusparse testing for version older than 11.4.0 #1996
- Revise KokkosBlas::nrm2 TPL implementation #1950
- Add TPL oneMKL GEMV support #1912
- oneMKL spmv #1882
- CMakeLists.txt: Update Kokkos version to 4.2.99 for version check #2003
- CMake: Adding logic to catch bad Kokkos version #1990
- Remove calling tribits_exclude_autotools_files() #1888
Update create_gs_handle docs #1958
docs: Add testing table #1876
docs: Note which builds have ETI disabled #1934
Generate HTML docs #1921
github/workflows: Pin sphinx version #1948
github/workflows/docs.yml: Use up-to-date doxygen version #1941
Unit-Test: adding specific test for block sparse functions #1944
Update SYCL docker image to Cuda 11.7.1 #1939
Remove printouts from the unit tests of ger() and syr() #1933
update testing scripts #1960
Speed up BSR spmv tests #1945
Test_ODE_Newton: Add template parameters for Kokkos::pair #1929
par_ilut: Update documentation for fill_in_limit #2001
- perf_test/sparse: Update GS perf_test for streams #1963
- Batched sparse perf_tests: Don't write to source tree during build #1904
- ParILUT bench: fix unused IS_GPU warning #1900
- BsrMatrix SpMV Google Benchmark #1886
- Use extraction timestamps for fetched Google Benchmark files #1881
- Improve help text in perf tests #1875
- iostream clean-up in benchmarks #2004
- Rename TestExecSpace to TestDevice #1970
- remove Intel 2017 code (no longer supported) #1920
- clean-up implementations for move of HIP outside of experimental #1999
- upstream iostream removal fix #1991, #1995
- Test and fix gemv stream interface #1987
- Test_Sparse_spmv_bsr.hpp: Workaround cuda 11.2 compiler error #1983
- Fix improper use of execution space instances in ODE tests. Better handling of CudaUVMSpaces during build. #1973
- Don't assume the default memory space is used #1969
- MDF: set default verbosity explicitly to avoid valgrind warnings #1968
- Fix sort_and_merge functions for in-place case #1966
- SPMV_Struct_Functor: initialize numExterior to 0 #1957
- Use rank-1 impl types when rank-2 vector is dynamically rank 1 #1953
- BsrMatrix: Check if CUDA is enabled before checking architecture #1955
- Avoid enum without fixed underlying type to fix SYCL #1940
- Fix SpAdd perf test when offset/ordinal is not int #1928
- Add KOKKOSKERNELS_CUDA_INDEPENDENT_THREADS definition for architectures with independent thread scheduling #1927
- Fix cm_generate_makefile --boundscheck #1926
- Bsr compatibility #1925
- BLAS: fix assignable check in gemv and gemm #1914
- mdf: fix initial value in select pivot functor #1916
- add missing headers, std::vector -> std::vector<...> #1909
- Add missing include to Test_Sparse_MergeMatrix.hpp #1907
- Remove non-existant dir from CMake include paths #1892
- cusparse 12 spmv: check y vector alignment #1889
- Change 'or' to '||' to fix compilation on MSVC #1885
- Add missing KokkosKernels_Macros.hpp include #1884
- Backward-compatible fix with kokkos@4.0 #1874
- Fix for rocblas builds #1871
- Correcting 'syr test' bug causing compilation errors with Trilinos #1870
- Workaround for spiluk and sptrsv stream tests with OMP_NUM_THREADS of 1, 2, 3 #1864
- bhalf_t fix for isnan function #2007
4.1.00 (2023-06-16)
- Adding interface with execution space instance argument to support execution of BLAS on stream
- Improving BLAS level 2 support by adding native implementation and TPL for GER, HER and SYR
- Optimizing algorithms for single input data
- Adding stream support to ILUK/SPTRSV and sort/merge
- Add BsrMatrix SpMV in rocSparse TPL, rewrite BsrMatrix SpMV unit tests #1769
- sparse: Add coo2crs, crs2coo and CooMatrix #1686
- Adds team- and thread-based lower-bound and upper-bound search and predicates #1711
- Adds KokkosKernels::Impl::Iota, a view-like where iota(i) = i + offset #1710
- ODE: explicit integration methods #1754
- refactor blas3 tests to use benchmark library #1751
- batched/eti: ETI host-level interfaces #1783
- batched/dense: Add gesv DynRankView runtime checks #1850
- Add support for complex data types in MDF #1776
- Sort and merge improvements #1773
- spgemm handle: check that A,B,C graphs never change #1742
- Fix/enhance backend issues on spadd perftest #1672
- Spgemm perf test enhancements #1664
- add explicit tests of opt-in algorithms in SpMV #1712
- Added TplsVersion file and print methods #1693
- Add basis skeleton for KokkosKernels::print_configuration #1665
- Add git information to benchmark context #1722
- Test mixed scalars: more fixes related to mixed scalar tests #1694
- PERF TESTS: adding utilities and instantiation wrapper #1676
- Refactor MKL TPL for both CPU and GPU usage #1779
- MKL: support indices properly #1868
- Use rocsparse_spmv_ex for rocm >= 5.4.0 #1701
- Do not change memory spaces instantiation defaults based on Kokkos_ENABLE_CUDA_UVM #1835
- KokkosKernels: Remove TriBITS Kokkos subpackages (trilinos/Trilinos#11545) #1817
- CMakeLists.txt: Add alias to match what is exported from Trilinos #1855
- KokkosKernels: Don't list include for non-existant 'batched' build dir (trilinos/Trilinos#11966) #1867
- Remove non-existant subdir kokkos-kernels/common/common (#11921, #11863) #1854
- KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos/Trilinos#11545) #1844
- Enable sphinx werror #1856
- Update cmake option naming in docs/comments #1849
- docs/developer: Add Experimental namespace #1852
- docs: Add profiling for compile times #1843
- Ger: adding documentation stubs in apidocs #1822
- .github/workflows: Summarize github-DOCS errors and warnings #1814
- Blas1: docs update for PR #1803 #1805
- apt-get update in hosted runner docs check #1797
- scripts: Fix github-DOCS #1796
- Add --enable-docs option to cm_generate_makefile #1785
- docs: Add stubs for some sparse APIs #1768
- .github: Update to actions/checkout@v3 #1767
- docs: Include BatchedGemm #1765
- .github: Automation reminder #1726
- Allow an HTML-only docs build #1723
- SYCL CI: Specify the full path to the compiler #1670
- Add github DOCS ci check & disable Kokkos tests #1647
- Add rocsparse,rocblas, to enabled TPLs in cm_test_all_sandia when --spot-check-tpls #1841
- cm_test_all_sandia: update to add caraway queues for MI210, MI250 #1840
- Support rocSparse in rocm 5.2.0 #1833
- Add KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 support, only enable KokkosBlas::gesv where supported #1816
- scripts: Include OMP settings #1801
- Print the patch that clang-format-8 wants to apply #1714
- Benchmark cleanup for par_ilut and spmv #1853
- SpMV: adding benchmark for spmv #1821
- New performance test for par_ilut, ginkgo::par_ilut, and spill #1799
- Include OpenMP environment variables in benchmark context #1789
- Re-enable and clean up triangle counting perf test #1752
- Include google/benchmark lib version in benchmark output #1750
- Refactor blas2 test for benchmark feature #1733
- Adds a better parilut test with gmres #1661
- Refactor blas1 test for benchmark feature #1636
- Drop outdated workarounds for backward compatibility with Kokkos #1836
- Remove dead code guarded #1834
- Remove decl ETI files #1824
- Reorganize par_ilut performance test #1818
- Deprecate Kokkos::Details::ArithTraits #1748
- Drop obsolete workaround #ifdef KOKKOS_IF_ON_HOST #1720
- Drop pre Kokkos 3.6 workaround #1653
- View::Rank -> View::rank #1703
- Prefer Kokkos::View::{R->r}ank #1679
- Call concurrency(), not impl_thread_pool_size() #1666
- Kokkos moves ALL_t out of Impl namespace #1658
- Add KokkosKernels::Impl::are_integral_v helper variable template and quit using Kokkos::Impl::are_integral trait #1652
- Kokkos 4 compatibility: modifying the preprocessor logic #1827
- blas/tpls: Fix gemm include guard typo #1848
- spmv cusparse version check modified for cuda/11.1 #1828
- Workaround for #1777 - cusparse spgemm test hang #1811
- Fix 1798 #1800
- BLAS: fixes and testing for LayoutStride #1794
- Fix 1786: check that work array is contiguous in SVD #1793
- Fix unused variable warnings #1790
- Use KOKKOS_IMPL_DO_NOT_USE_PRINTF in Test_Common_UpperBound.hpp #1784
- Batched Gesv: initializing variable to make compiler happy #1778
- perf test utils: fix device ID parsing #1739
- Fix OOB and improve comments in BsrMatrix COO constructor #1732
- batched/unit_test: Disable simd dcomplex4 test in for intel > 19.05 and <= 2021. #1857
- rocsparse spmv tpl: Fix rocsparse_spmv call for rocm < 5.4.0 #1716
- compatibility with 4.0.0 #1709
- team mult: fix type issue in max_error calculation #1706
- cast Kokkos::Impl::integral_constant to int #1697
4.0.01 (2023-04-19)
- Kokkos Kernels version: need to use upper case variables #1707
- CUSPARSE_MM_ALG_DEFAULT deprecated by cuSparse 11.1 #1698
- blas1: Fix a couple documentation typos #1704
- CUDA 11.4: fixing some -Werror #1727
- Remove unused variable in KokkosSparse_spgemm_numeric_tpl_spec_decl.hpp #1734
- Reduce BatchedGemm test coverage time #1737
- Fix kk_generate_diagonally_dominant_sparse_matrix hang #1689
- Temporary spgemm workaround matching Trilinos 11663 #1757
- MDF: Minor changes to interface for ifpack2 impl #1759
- Rocm TPL support upgrade #1763
- Fix BLAS cmake check for complex types #1762
- ParIlut: Adds a better parilut test with gmres #1661
- GMRES: fixing some type issues related to memory space instantiation (partial) #1719
- ParIlut: create and destroy spgemm handle for each usage #1736
- ParIlut: remove par ilut limitations #1755
- ParIlut: make Ut_values view atomic in compute_l_u_factors #1781
4.0.0 (2023-21-02)
- ROTG: implementation of BLAS level1 rotg #1529
- ROT: adding function to rotate two vector using Givens rotation coefficients #1581
- ROTMG: adding rotmg implementation to KokkosBlas #1560
- ROTM: adding blas 1 function for modified rotation #1583
- SWAP: adding implementation of level 1 BLAS function #1612
- Add utility
#1681 - Add spgemm TPL support for cuSparse and rocSparse #1513
- Add csr2csc #1446
- Adding my weighted graph coarsening code into kokkos-kernels #1043
- VBD/VBDBIT D1 coloring: support distributed graphs #1598
- New tests for mixed-precision GEMM, some fixes for BLAS tests with non-ETI types #1615
- Spgemm non-reuse: unification layer and TPLs #1678
- Remove "slow mem space" device ETI #1619
- First phase of SpGEMM TPL refactor #1582
- Spgemm TPL refactor #1618
- cleaned messages printed at configuration time #1616
- Batched dense tests: splitting batched dense unit-tests #1608
- sparse/unit_test: Use native spmv impl in bsr unit tests #1606
- ROT* HIP: testing and improving rocBLAS support for ROT* kernels #1594
- Add main functions for batched sparse solver performance tests #1554
- Batched sparse kernels update #1546
- supernodal SpTRSV : require invert-diag option to use SpMV #1518
- Update --verbose option in D2 coloring perftest #1486
- Modular build: allowing to build components independently #1504
- Move GMRES from example to sparse experimental #1620
- Remove Experimental::BlockCrsMatrix (replaced with Experimental::BsrMatrix) #1458
- Move {Team,TeamVector}Gemv to KokkosBlas #1435
- Move SerialGEMV to KokkosBlas #1433
- CMake: export version and subversion to config file #1680
- CMake: update package COMPATIBILITY mode in anticipation of release 4.0 #1645
- FindTPLMKL.cmake: fix naming of mkl arg to FIND_PACKAGE_HANDLE_STANDARD_ARGS #1644
- Fix docs build #1569
- KokkosKernels: Remove listing of undefined TPL deps (trilinos/Trilinos#11152) #1568
- Update nightly SYCL setup #1660
- Add github DOCS ci check & disable Kokkos tests #1647
- docs: Fix RTD build #1490
- sparse/unit_test: Disable spmv_mv_heavy for all A64FX builds #1555
- ROTMG: rocblas TPL turned off #1603
- Fix HIP nightly build on ORNL Jenkins CI server #1544
- Turn on cublas and cusparse in CLANG13CUDA10 CI check #1584
- Add clang13+cuda10 PR build #1524
- .githob/workflows: Fix redundant workflow triggers #1527
- Add GCC test options for C++17 and disable perftests for INTEL19 #1511
- Add INTEL19 and CUDA11 CI settings #1505
- .github/workflows: use c++17 #1484
- Workaround for array_sum_reduce if scalar is half_t and N is 3, 5 or 7 #1675
- Fix the nondeterministic issue in SPILUK numeric #1683
- Fix an error in Krylov Handle documentation #1659
- ROTMG: loosen unit-test tolerance for Host TPLs #1638
- SWAP: fixing obvious mistake in TPL layer : ( #1637
- Fix 1631: Use Kokkos::LayoutRight with CrsMatrix values_type (Trilinos compatibility) #1633
- Cuda/12 with CuSPARSE updates #1632
- Fix 1627: cusparse 11.0-11.3 spgemm symbolic wrapper #1628
- Make sure to call ExecutionSpace::concurrency() from an object #1614
- SPGEMM: fixing the rocsparse interface #1607
- Fix Trilinos issue 11033: remove compile time check to allow compilation with non-standard scalar types #1591
- SPMM: fixing cuSPARSE issue with incompatible compute type and op #1587
- ParILUT: convert two lambdas to functors #1580
- Update kk_get_free_total_memory for SYCL #1579
- SYCL: Use KOKKOS_IMPL_DO_NOT_USE_PRINTF instead of printf in kernels #1567
- Rotg fixes for issue 1577 #1578
- Rotg update: fixing the interface #1566
- Fix rotg eti #1534
- Fix to include KokkosBatched_Util.hpp #1565
- TeamGemvInternal: reintroduce 12-arg invoke method #1561
- Rename component options to avoid overloaded usage in Trilinos #1641
- Avoid the SIMD code branch if the batched size is not a multiple of the vector length #1552
- SYCL: Fix linking with ze_loader in Trilinos #1551
- ARMPL Fixes and Workarounds #1543
- Test_Graph_coarsen: replace HostMirror usage with auto #1538
- Fix spgemm cusparse #1535
- Warning fixes: Apple Clang complains about [-Werror,-Wunused-but-set-variable] #1532
- In src/batched/dense: Barrier after broadcast #1520
- Graph coarsen: fix test #1517
- KokkosGraph_CoarsenHeuristics: remove volatile qualifier from join #1510
- Replace capture #1502
- utils: implicit copy-assign deprecated in array_sum_reduce #1494
3.7.01 (2022-12-01)
- Use CRS matrix sort, instead of Kokkos::sort on each row #1553
- Change template type for StaticCrsGraph in BsrMatrix #1531
- Remove listing of undefined TPL deps #1568
- Fix using SpGEMM with nonstandard scalar type, with MKL enabled #1591
- Move destroying dense vector descriptors out of cuSparse sptrsv handle #1590
- Fix
to returnCUDA_C_64F
#1604 - Disable compile-time check in cuda_data_type_from on supported scalar types for cuSPARSE #1605
- Reduce register pressure in batched dense algorithms #1588
- Use new cusparseSpSV TPL for SPTRSV when cuSPARSE is enabled with CUDA >= 11.3 #1574
3.7.00 (2022-08-18)
- Add csc2csr #1342
- csc2csr: update Kokkos_Numeric.hpp header inclusion #1449
- sparse: Remove csc2csr copy #1375
- Added https://kokkos-kernels.readthedocs.io #1451
- Restructure docs #1368
- Add cuSparse TPL files for CrsMatrix-multivector product #1427
- Add template params to forwarding calls in deprecated KokkosKernels::… #1441
- SPILUK: Move host allocations to symbolic #1480
- trsv: remove assumptions about entry order within rows #1463
- Blas serial axpy and nrm2 #1460
- Move Set/Scale unit test to KokkosBlas #1455
- Move {Serial,Team,TeamVector} Set to KokkosBlas #1454
- Move {Serial,Team,TeamVector}Scale to KokkosBlas #1448
- Common Utils: removing dependency on Sparse Utils in Common Utils #1436
- Common cleanup #1431
- Clean-up src: re-organizing the src directory #1398
- Sparse utils namespace #1439
- dot perf test: adding support for HIP and SYCL backend #1453
- Add verbosity parameter to GMRES example. Turn off for testing. #1385
- KokkosSparse_spiluk.cpp perf test: add int-int guards to cusparse codes #1369
- perf_test/blas: Check ARMPL build version #1352
- Clean-up batched block tridiag perf test #1343
- Reduce lots of macro duplication in sparse unit tests #1340
- sycl: re-enabling test now that dpcpp has made progress #1473
- Only instantiate Kokkos's default Cuda mem space #1361
- Sparse and CI updates #1411
- Newer sparse tests were not following the new testing pattern #1356
- Add ETI for D1 coloring #1401
- Add ETI to SpAdd (symbolic and numeric) #1399
- Reformat example/fenl files changed in 1382 #1464
- Change Controls::getParameter error message from stdout to stderr #1416
- Arith traits integral nan #1438
- Kokkos_ArithTraits: re-implementation using Kokkos Core #1406
- Value-initialize result of MaxLoc reduction to avoid maybe uninitialized warning #1383
- Remove volatile qualifiers in reducer join(), init(), and operator+= methods #1382
- Update Batched GMRES #1392
- GEMV: accumulate in float for scalar = bhalf_t #1360
- Restore BLAS-1 MV paths for 1 column #1354
- Minor updates to cluster Gauss-Seidel #1372
- Add unit test for BsrMatrix and BlockCrsMatrix spmv #1338
- Refactor SPGEMM MKL Impl #1244
- D1 coloring: remove unused but set variable #1403
- Minor changes for half precision paper #1429
- Add benchmarks for us-rse escience 2022 half precision paper #1422
- TPLs: adding CUBLAS in the list of dependencies #1482
- Fix MKL build errors #1478
- Fixup drop layout template param in rank-0 views #1476
- BLAS: fixing test that access results before synching #1472
- Fix D1 color ETI with both CudaSpace and UVM #1471
- Fix arithtraits warning #1468
- Fix build when double not instantiated #1467
- Fix -Werror #1466
- Fix GitHub CI failing on broken develop #1461
- HIP: fix warning from ExecSpaceUtils and GEMV #1459
- Removes a duplicate cuda_data_type_from when KOKKOS_HALF_T_IS_FLOAT #1456
- Fix incorrect function call in KokkosBatched::TeamGEMV unit test #1444
- Fix SYCL nightly test #1419
- Fix issues with cuSparse TPL availability for BsrMatrix SpMV #1418
- SpMV: fixing issues with unit-tests tolerance #1412
- Address 1409 #1410
- Fix colliding include guards (copy-paste mistake) #1408
- src/sparse: Fix & check for fence post errors #1405
- Bspgemm fixes #1396
- Fix unused parameter warnings in GEMM test. #1381
- Fixes code deprecation warnings. #1379
- Fix sign-compare warning in SPMV perf test #1371
- Minor MKL fixes #1365
- perf_test/batched: Temporarily disable tests #1359
- Fix nightly builds following promotion of the math functions in Kokkos #1339
3.6.01 (2022-05-23)
- Improve spiluk numeric phase to avoid race conditions and processing in chunks #1390
- Improve sptrsv symbolic phase performance (level scheduling) #1380
- Restore BLAS-1 MV paths for 1 column #1354
- Fix check that view has const type #1370
- Fix check that view has const type part 2 #1394
3.6.00 (2022-02-18)
Kokkos Kernels is adding a new component to the library: batched sparse linear algebra.
Similarly to the current dense batched algorithms, the new algorithms are called from
the GPU and provide Team and TeamVector level of parallelism, SpMV also provides a Serial
call on GPU.
Add Batched CG and Batched GMRES #1155
Add Jacobi Batched preconditioner #1219
After introducing the BsrMatrix in release 3.5.0 new algorithms are now supporting this format.
For release 3.6.0 we are adding matrix-vector (matvec) multiplication and Gauss-Seidel as well as an
implementation of matvec that leverages tensor cores on Nvidia GPUs. More kernels are expected to
support the Bsr format in future releases.
Add Spmv for BsrMatrix #1255
Add BLAS to SpMV operations for BsrMatrix #1297
BSR format support in block Gauss-Seidel #1232
Experimental tensor-core SpMV for BsrMatrix #1090
rocBLAS and rocSPARSE TPLs are now officially supported, they can be enabled at configure time.
Initial kernels that can call rocBLAS are GEMV, GEMM, IAMAX and SCAL, while rocSPARSE can be
called for matrix-vector multiplication. Further support for TPL calls can be requested on slack
and by GitHub issues.
Tpl rocBLAS and rocSPARSE #1153
Add rocBLAS GEMV wrapper #1201
Add rocBLAS wrappers for GEMM, IAMAX, and SCAL #1230
SpMV: adding support for rocSPARSE TPL #1221
- bhalf: Unit test Batched GEMM #1251
- and demostrate GMRES example convergence with bhalf_t (#1300)
- Stream interface: adding stream support in GEMV and GEMM #1131
- Improve double buffering batched gemm performance #1217
- Allow choosing coloring algorithm in multicolor GS #1199
- Batched: Add armpl dgemm support #1256
- Deprecation warning: SpaceAccessibility move out of impl, see #1140 #1141
- Full Blas support on SYCL #1270
- Get sparse tests enabled and working for SYCL #1269
- Changes to make graph run on SYCL #1268
- Allow querying free/total memory for SYCL #1225
- Use KOKKOS_IMPL_DO_NOT_USE_PRINTF instead of printf in kernels #1162
- Work around hipcc size_t/int division with remainder bug #1262
- Replace std::abs with ArithTraits::abs #1312
- Batched/dense: Add Gemm_DblBuf LayoutLeft operator #1299
- KokkosKernels: adding variable that returns version as a single number #1295
- Add KOKKOSKERNELS_FORCE_SIMD macro (Fix #1040) #1290
- Algo::Level{2,3}::Blocked::mb() #1265
- Batched: Use SerialOpt2 for 33 to 39 square matrices #1261
- Prune extra dependencies #1241
- Improve double buffering batched gemm perf for matrix sizes >64x64 #1239
- Improve graph color perf test #1229
- Add custom implementation for strcasecmp #1227
- Replace restrict with KOKKOS_RESTRICT #1223
- Replace array reductions in BLAS-1 MV reductions #1204
- Update MIS-2 and aggregation #1143
- perf_test/blas/blas3: Update SHAs for benchmarking #1139
- Bump ROCm version 4.2 -> 4.5 in nightly Jenkins CI build #1279
- scripts/cm_test_all_sandia: Add A64FX ci checks #1276
- github/workflows: Add osx CI #1254
- Update SYCL compiler version in CI #1247
- Do not set Kokkos variables when exporting CMake configuration #1236
- Add nightly CI check for SYCL #1190
- Update cmake minimum version to 3.16 #866
- Kokkos::Impl: removing a few more instances of throw_runtime_exception #1320
- Remove Kokkos::Impl::throw_runtime_exception from Kokkos Kernels #1294
- Remove unused memory space utility #1283
- Clean up Kokkos header includes #1282
- Remove private Kokkos header include (Cuda/Kokkos_Cuda_Half.hpp) #1281
- Avoid using #ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macro guards #1266
- Rename enumerator Impl::Exec_{PTHREADS -> THREADS} #1253
- Remove all references to the Kokkos QThreads backend #1238
- Replace more occurences of Kokkos::Impl::is_view #1234
- Do not use Kokkos::Impl::is_view #1214
- Replace Kokkos::Impl::if_c -> std::conditional #1213
- Fix bug in spmv_mv_bsrmatrix() for Ampere GPU arch #1315
- Fix std::abs calls for rocBLAS/rocSparse #1310
- cast literal 0 to fragment scalar type #1307
- Fix 1303: maintain correct #cols on A in twostage #1304
- Add dimension checking to generic spmv interface #1301
- Add missing barriers to TeamGMRES, fix vector len #1285
- Examples: fixing some issues related to type checking #1267
- Restrict BsrMatrix specialization for AMPERE and VOLTA to CUDA #1242
- Fix compilation errors for multi-vectors in kk_print_1Dview() #1231
- src/batched: Fixes #1224 #1226
- Fix SpGEMM crashing on empty rows #1220
- Fix issue #1212 #1218
- example/gmres: Specify half_t namespace #1208
- Check that ordinal types are signed #1188
- Fixing a couple of small issue with tensor core spmv #1185
- Fix #threads setting in pcg for OpenMP #1182
- SpMV: fix catch all case to avoid compiler warnings #1179
- using namespace should be scoped to prevent name clashes #1177
- using namespace should be scoped to prevent name clashes, see issue #1170 #1171
- Fix bug with mkl impl of spgemm #1167
- Add missing $ to KOKKOS_HAS_TRILINOS in sparse_sptrsv_superlu check #1160
- Small fixes to spgemm, and plug gaps in testing #1159
- SpMV: mismatch in #ifdef check and kernel specialization #1151
- Fix values dimension for block sparse matrices #1147
3.5.00 (2021-10-19)
- Batched serial SVD #1107
- Batched: Add BatchedDblBufGemm #1095
- feature/gemv rps test -- RAJAPerf Suite Version of the BLAS2 GEMV Test #1085
- Add new bsrmatrix #1077
- Adding Kokkos GMRES example #1028
- Add fast two-level mode N GEMV (#926) #939
- Batched: Add BatchedGemm interface #935
- OpenMPTarget: adding ETI and CMake logic for OpenMPTarget backend #886
Implemented enhancements Algorithms and Archs:
- Use float as accumulator for GEMV on half_t (Fix #1081) #1082
- Supernodal SpTRSV: add option to use MAGMA TPL for TRTRI #1069
- Updates for running GMRES example with half precision #1067
- src/blas/impl: Explicitly cast to LHS type for ax #1073
- Update BatchedGemm interface to match design proposal #1054
- Move dot-based GEMM out of TPL CUBLAS #1050
- Adding ArmPL option to spmv perf_test #1038
- Add (right) preconditioning to GMRES #1078
- Supernodal SpTRSV: perform TRMM only if TPL CuBLAS is enabled #1027
- Supernodal SpTRSV: support SuperLU version < 5 #1012
- perf_test/blas/blas3: Add dgemm armpl experiment #1005
- Supernodal SpTRSV: run TRMM on device for setup #983
- Merge pull request #951 from vqd8a/move_sort_ifpack2riluk #972
- Point multicolor GS: faster handling of long/bulk rows #993
- Make CRS sorting utils work with unmanaged #963
- Add sort and make sure using host mirror on host memory in kspiluk_symbolic #951
- GEMM: call GEMV instead in certain cases #948
- SpAdd performance improvements, better perf test, fix mtx reader columns #930
Implemented enhancements BuildSystem:
- Automate documentation generation #1116
- Move the batched dense files to specific directories #1098
- cmake: Update SUPERLU tpl option for Tribits #1066
- cmake/Modules: Allow user to use MAGMA_DIR from env #1007
- Supernodal SpTRSV: update TPLs requirements #997
- cmake: Add MAGMA TPL support #982
- Host only macro: adding macro to check for any device backend #940
- Prevent redundant spmv kernel instantiations (reduce library size) #937
- unit-test: refactor infrastructure to remove most *.cpp #906
Implemented enhancements Other:
- Allow reading integer mtx files into floating-point matrices #1100
- Warnings: remove -Wunused-parameter warnings in Kokkos Kernels #962
- Clean up CrsMatrix raw pointer constructor #949
- unit_test/batched: Remove *_half fns from gemm unit tests #943
- Move sorting functionality out of Impl:: #932
- Deprecation warning: SpaceAccessibility move out of impl #1141
- Workaround error with intel #1128
- gmres: disable examples for builds with ibm/xl #1123
- CrsMatrix: deprecate constructor without ncols input #1115
- perf_test/blas/blas3: Disable simd verify for cuda/10.2.2 #1093
- Replace impl/Kokkos_Timer.hpp includes with Kokkos_Timer.hpp #1074
- Remove deprecated ViewAllocateWithoutInitializing #1058
- src/sparse: spadd resolve deprecation warnings #1053
- Give full namespace path for D2 coloring #999
- Fix -Werror=deprecated errors with c++20 standard #964
- Deprecation: a deprecated function is called in the SpADD perf_test #954
Enabled tests:
- HIP: enabling all unit tests #968
- Fix build and add CI coverage for LayoutLeft=OFF #965
- Enable SYCL tests #927
- Fixup HIP nightly builds #907
Fixed Bugs:
- Fix SpGEMM for Nvidia Turing/Ampere #1118
- Fix #1111: spmv tpl instantiations #1112
- Fix C's numCols in spadd simplified interface #1102
- Fix #1089 (failing batched UTV tests) #1096
- Blas GEMM: fix early exit logic, see issue #1088 #1091
- Fix #1048: handle mode C spmv correctly in serial/openmp #1084
- src/batched: Fix multiple definitions of singleton #1072
- Fix host accessing View in non-host space #1057
- Fix559: Intel 18 has trouble with pointer in ternary expr #1042
- Work around team size AUTO issue on kepler #1020
- Supernodal SpTrsv: fix out-of-bound error #1019
- Some fixes for MAGMA TPL and gesv #1008
- Merge pull request #981 from Tech-XCorp/4005-winllvmbuild #984
- This is a PR for 4005 vs2019build, which fixes a few things on Windows #981
- Fix build for no-ETI build #977
- Fix invalid mem accesses in new GEMV kernel #961
- Kokkos_ArithTraits.hpp: Fix isInf and isNan with complex types #936
3.4.01 (2021-05-19)
Fixed Bugs:
- Windows: Fixes for Windows #981
- Sycl: ArithTraits fixes for Sycl #959
- Sparse: Added code to allow KokkosKernels coloring to accept partial colorings #938
- Sparse: Include sorting within spiluk #972
- Sparse: Fix CrsMatrix raw pointer constructor #971
- Sparse: Fix spmv Serial beta==-1 code path #947
3.4.00 (2021-04-25)
- SYCL: adding ETI and CMake logic for SYCL backend #924
Implemented enhancements Algorithms and Archs:
- Two-stage GS: add damping factors #921
- Supernodal SpTRSV, improve symbolic performance #899
- Add MKL SpMV wrapper #895
- Serial code path for spmv #893
Implemented enhancements BuildSystem:
- Cmake: Update ArmPL support #901
- Cmake: Add ARMPL TPL support #880
- IntelClang guarding __assume_aligned with !defined(clang) #878
Implemented enhancements Other:
- Add static_assert/throw in batched eigendecomp #931
- Workaround using new/delete in kernel code #925
- Blas perf_test updates #892
Fixed bugs:
- Fix ctor CrsMat mirror with CrsGraph mirror #918
- Fix nrm1, removed cublas nrminf, improved blas tests #915
- Fix and testing coverage mainly in graph coarsening #910
- Fix KokkosSparse for nightly test failure #898
- Fix view types across ternary operator #894
- Make work_view_t typedef consistent #885
- Fix supernodal SpTRSV build with serial+openmp+cuda #884
- Construct SpGEMM C with correct ncols #883
- Matrix Converter: fixing issue with deallocation after Kokkos::fininalize #882
- Fix >1024 team size error in sort_crs_* #872
- Fixing seg fault with empty matrix in kspiluk #871
3.3.01 (2021-01-18)
Fixed Bugs:
- With CuSparse enabled too many variants of SPMV were instantiated even if not requested. Up to 1GB executable size increase.
3.3.00 (2020-12-16)
Implemented enhancements:
- Add permanent RCM reordering interface, and a basic serial implementation #854
- Half_t explicit conversions #849
- Add batched gemm performance tests #838
- Add HIP support to src and perf_test #828
- Factor out coarsening #827
- Allow enabling/disabling components at configuration time #823
- HIP: CMake work on tests and ETI #820
- HIP: KokkosBatched - hip specialization #812
- Distance-2 maximal independent set #801
- Use batched TRTRI & TRMM for Supernode-sptrsv setup #797
- Initial support for half precision #794
Fixed bugs:
- Fix issue with HIP and Kokkos_ArithTraits #844
- HIP: fixing round of issues on AMD #840
- Throw an exception if BLAS GESV is not enabled #837
- Fixes -Werror for gcc with c++20 #836
- Add fallback condition to use spmv_native when cuSPARSE does not work #834
- Fix install testing refactor for inline builds #811
- HIP: fix ArithTraits to support HIP backend #809
- cuSPARSE 11: fix spgemm and spmv_struct_tunning compilation error #804
- Remove pre-3.0 deprecated code #825
3.2.01 (2020-11-17)
Fixed bugs:
- Cpp14 Fixes: #790
3.2.00 (2020-08-19)
Implemented enhancements:
- Add CudaUVMSpace specializations for cuBLAS IAMAX and SCAL #758
- Add wiki examples #735
- Support complex_float, complex_double in cuSPARSE SPMV wrapper #726
- Add performance tests for trmm and trtri #711
- SpAdd requires output values to be zero-initialized, but this shouldnt be needed #694
- SpAdd doesnt merge entries correctly #685
- cusparse SpMV merge algorithm #670
- TPL support for SpMV #614
- Add two BLAS/LAPACK calls needed by: Sptrsv supernode #552 #589
- HashmapAccumulator has several unused members, misnamed parameters #508
Fixed bugs:
- Nightly test failure: spgemm unit tests failing on White (Power8) #780
- supernodal does not build with UVM enabled #633
3.1.01 (2020-05-04)
** Fixed bugs:**
- KokkosBatched QR PR breaking nightly tests #691
3.1.00 (2020-04-14)
Implemented enhancements:
- Two-stage & Classical Gauss-Seidel #672
- Test transpose utilities #664
- cuSPARSE spmv wrapper doesn't actually use 'mode' #650
- Distance-2 improvements #625
- FindMKL module: which mkl versions to prioritize #480
- Add SuperLU as optional CMake TPL #545
- Revamp the ETI system #460
Fixed bugs:
- 2-stage GS update breaking cuda/10+rdc build #673
- Why CrsMatrix::staticcrsgraph_type uses execution_space and not device_type? #665
- TRMM and TRTRI build failures with clang/7+cuda9+Cuda_OpenMP and gcc/5.3+OpenMP #657
- cuSPARSE spmv wrapper doesn't actually use 'mode' #650
- Block Gauss-Seidel test fails when cuSPARSE is enabled #648
- cuda uvm test failures without launch blocking - expected behavior? #636
- graph_color_d2_symmetric_double_int_int_TestExecSpace seg faults in cuda/10.1 + Volta nightly test on kokkos-dev-2 #634
- Build failures on kokkos-dev with clang/7.0.1 cuda/9.2 and blas/cublas/cusparse tpls #629
- Distance-2 improvements #625
- trsv - internal compiler error with intel/19 #607
- complex_double misalignment still breaking SPGEMM #598
- PortableNumericCHASH can't align shared memory #587
- Remove all references to Kokkos::Impl::is_same #586
- Can I run KokkosKernels spgemm with float or int32 type? #583
- Kokkos Blas: gemv segfaults #443
- Generated kokkos-kernels file names are too long and are crashing cloning Trilinos on Windows #395
3.0.00 (2020-01-27)
Implemented enhancements:
- BuildSystem: Standalone Modern CMake support #491
- Cluster GS and SGS: add cluster gauss-seidel implementation #455
- spiluk: Add sparse ILUK implementation #459
- BLAS gemm: Dot-based GEMM Cuda optimization for C = betaC + alphaA^TB - [#490]#490)
- Sorting utilities: #461
- SGS: Support multiple rhs in SGS efficiently #488
- BLAS trsm: Add support and interface for trsm #513
- BLAS iamax: Implement iamax #87
- BLAS gesv: #449
- sptrsv supernodal: Add supernodal sparse triangular solver #552
- sptrsv: Add cusparse tpl support for sparse triangular solve, cudagraphs to fallback #555
- KokkosGraph: Output colors assigned during graph coloring #444
- MatrixReader: Full matrix market support #466
Fixed bugs:
- gemm: Fix bug for complex types in fallback impl #550
- gemv: Fix degenerate matrix cases #514
- spgemm: Fix cuda build with complex_double misaligned shared memory access #500
- spgemm: Wrong team size heuristic used for SPGEMM when Kokkos deprecated=OFF #474
- dot: Improve accuracy for float and complex_float #574
- SpMV Struct: Fix bug with intel_17_0_1 #456
- readmtx: Fix invalid read due to loop condition #453
- spgemm: Fix hashmap accumulator bug yielding crashes and wrong results #402
- KokkosGraph: Fix distance-1 graph coloring segfault #275
- UniformMemoryPool: does not re-initialize chunks that are freed #530
2.9.00 (2019-06-24)
Implemented enhancements:
- KokkosBatched: Add specialization for float2, float4 and double4 #427
- KokkosBatched: Reduce VectorLength (16 to 8) #432
- KokkosBatched: Remove experimental name space for batched blas #371
- Capability: Initial sparse triangular solve capability #435
- Capability: Add support for MAGMA GESV TPL #409
- cuBLAS: Add CudaUVMSpace specializations for GEMM #397
Fixed bugs:
2.8.00 (2019-02-05)
Implemented enhancements:
- Capability, Tests: C++14 Support and Testing #351
- Capability: Batched getrs #332
- More Kernel Labels for KokkosBlas #239
- Name all parallel kernels and regions #124
Fixed bugs:
- BLAS TPL: BLAS underscore mangling #369
- BLAS TPL, Complex: Promotion 2.7.24 broke MV unit tests in Tpetra with complex types #360
- GEMM: GEMM uses wrong function for computing shared memory allocation size #368
- BuildSystem: BLAS TPL macro not properly enabled with MKL BLAS #347
- BuildSystem: make clean - errors #353
- Compiler Workaround: Internal compiler error in KokkosBatched::Experimental::TeamGemm #349
- KokkosBlas: Some KokkosBlas kernels assume default execution space #14
2.7.24 (2018-11-04)
Implemented enhancements:
- Enhance test_all_sandia script to set scalar and ordinal types #315
- Batched getri need #305
- Deterministic Coloring #271
- MKL - guard minor version for MKL v. 18 #268
- TPL Support for all BLAS functions using CuBLAS #247
- Add L1 variant to multithreaded Gauss-Seidel #240
- Multithreaded Gauss-Seidel does not support damping #221
- Guard 1-phase SpGEMM in Intel MKL #217
- generate makefile with-spaces option #98
- Add MKL version check #7
Fixed bugs:
- Perf test failures w/ just CUDA enabled #257
- Wrong signature for axpy blas functions #329
- Failing unit tests with float - unit test error checking issue #322
- cuda.graph_graph_color* COLORING_VBD test failures with cuda/9.2 + gcc/7.2 on White #317
- KokkosBatched::Experimental::SIMD<T> does not build with T=complex<float> #316
- simple test program fails using 3rdparty Eigen library #309
- KokkosBlas::dot is broken for complex, due to incorrect assumptions about Fortran ABI #307
- strides bug in kokkos tpl interface. #292
- Failing spgemm unit test with MKL #289
- Fix the block_pcg perf-test when offsets are size_t #287
- spotcheck warnings from kokkos #284
- Linking error in tpl things #282
- Build failure with clang 3.9.0 #281
- CMake modification for TPLs. #276
- KokkosBatched warnings #259
- KokkosBatched contraction length bug #258
- Small error in KokkosBatched_Gemm_Serial_Imp.hpp with SerialGemm<Trans::Transpose,*,*> #147
2.7.00 (2018-05-24)
Implemented enhancements:
- Tests: add capability to build a unit test standalone #233
- Make KokkosKernels work without KOKKOS_ENABLE_DEPRECATED_CODE #223
- Add team-based scal, mult, update, nrm2 #214
- Add team based abs #209
- Generated CPP files moving includes inside the ifdef's #199
- Implement BlockCRS in Kokkoskernels #184
- Spgemm hash promotion #171
- Batched BLAS enhancement #170
- Document & check CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS=ON in CUDA build #148
Fixed bugs:
- Update drivers in perf_tests/graph to use Kokkos::initialize() #200
- unit tests failing/hanging on Volta #188
- Inner TRSM: SIMD build error; manifests in Ifpack2 #183
- d2_graph_color doesn't have a default coloring mechanism #168
- Unit tests do not build with Serial backend #154
2.6.00 (2018-03-07)
Implemented enhancements:
Fixed bugs:
- d2_graph_color doesn't have a default coloring mechanism #168
- Build error when MKL TPL is enabled #135
2.5.00 (2017-12-15)
Implemented enhancements:
- KokkosBlas: Add GEMM interface #105
- KokkosBlas: Add GEMM default Kernel #125
- KokkosBlas: Add GEMV that wraps BLAS (and cuBLAS) #16
- KokkosSparse: Make SPMV test not print GBs of output if something goes wrong. #111
- KokkosSparse: ETI SpGEMM and Gauss Seidel and take it out of Experimental namespace #74
- BuildSystem: Fix Makesystem to correctly build library after aborted install #104
- BuildSystem: Add option ot generate_makefile.bash to define memoryspaces for instantiation #89
- BuildSystem: generate makefile tpl option #66
- BuildSystem: Add a simpler compilation script, README update etc #96
Fixed bugs:
- Internal Compiler Error GCC in GEMM #129
- Batched Team LU: bug for small team_size #110
- Compiler BUG in IBM XL pragma unrolling #92
- Fix Blas TPL enables build #77
- Batched Gemm Failure #73
- CUDA 7.5 (GCC 4.8.4) build errors #72
- Cuda BLAS tests fail with UVM if CUDA_LAUNCH_BLOCKING=1 is not defined on Kepler #51
- CrsMatrix: sumIntoValues and replaceValues incorrectly count the number of valid column indices. #11
- findRelOffset test assumes UVM #32
0.10.03 (2017-09-11)
Implemented enhancements:
- KokkosSparse: Fix unused variable warnings in spmv_impl_omp, spmv Test and graph color perf_test #63
- KokkosBlas: dot: Add unit test #15
- KokkosBlas: dot: Add special case for multivector * vector (or vector * multivector) #13
- BuildSystem: Make KokkosKernels build independently of Trilinos #1
- BuildSystem: Fix ETI System not to depend on Tpetra ETI #5
- BuildSystem: Change CMake to work with new ETI system #19
- BuildSystem: Fix TpetraKernels names to KokkosKernels #4
- BuildSystem: Trilinos/KokkosKernels reports no ETI in almost any circumstance #29
- General: Kokkos::ArithTraits<double>::nan() is very slow #35
- General: Design and Define New UnitTest infrastructure #28
- General: Move Tpetra::Details::OrdinalTraits to KokkosKernels #22
- General: Rename files and NameSpace to KokkosKernels #12
- General: PrepareStandalone: Get rid of Teuchos usage #2
- General: Fix warning with char being either signed or unsigned in ArithTraits #60
- Testing: Make all tests run with -Werror #68
Fixed bugs:
- SPGEMM Test Fails for Cuda when compiled through Trilinos #49
- Fix ArithTraits min for floating points #47
- Pthread ETI error #25
- Fix CMake Based ETI for Threads backend #46
- KokkosKernels_ENABLE_EXPERIMENTAL causes build error #59
- ArithTraits warnings in CUDA build #71
- Graph coloring build warnings #3
* This Change Log was automatically generated by github_changelog_generator