Skip to content

Algorithms for matrix matrix multiplication, dgemm, AVX-256, AVX-512

License

Notifications You must be signed in to change notification settings

romz-pl/matrix-matrix-multiply

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Algorithms for matrix matrix multiplication, dgemm

The algorithms are taken form the books:

  1. David A. Patterson, John L. Hennessy "Computer Organization and Design. The hardware software interface. RISK-V Edition",
  2. David A. Patterson, John L. Hennessy "Computer Organization and Design. The hardware software interface. MIPS Edition"

There are the following algorithms implemented:

  1. Basic, unoptimized, dgemm_basic.cpp
  2. Using AVX 256-bit intrinsics, dgemm_avx256.cpp
  3. Using AVX 512-bit intinsics, dgemm_avx512.cpp
  4. Using AVX 512-bit intinsics with loop unrolling, dgemm_unrolled.cpp
  5. Blocked version, unoptimized, dgemm_basic_blocked.cpp
  6. Blocked version with AVX 512-bit intinsics with loop unrolling dgemm_blocked.cpp

How to build?

To build the system, execute the following commands:

  1. git clone https://github.com/romz-pl/matrix-matrix-multiply
  2. cd matrix-matrix-multiply
  3. mkdir build
  4. cd build
  5. cmake ..
  6. make
  7. ./src/dgemm

The command ./src/dgemm executes the programm.

Results

  1. For Core i7 CPU, with matrix size equal to 128, I obtained the following results averaged over 1000 randomly generated matrices:
         dgemm_basic:  elapsed-time=      1661
 dgemm_basic_blocked:  elapsed-time=      1260     speed-up=    1.31
        dgemm_avx256:  elapsed-time=       443     speed-up=    3.74
        dgemm_avx512:  elapsed-time=       233     speed-up=    7.12
      dgemm_unrolled:  elapsed-time=       106     speed-up=   15.66
       dgemm_blocked:  elapsed-time=       100     speed-up=   16.61
  1. For Core i7 CPU, with matrix size equal to 640, I obtained the following results averaged over 10 randomly generated matrices:
         dgemm_basic:  elapsed-time=    241958
 dgemm_basic_blocked:  elapsed-time=    162224     speed-up=   1.49
        dgemm_avx256:  elapsed-time=     66246     speed-up=   3.65
        dgemm_avx512:  elapsed-time=     35604     speed-up=   6.79
      dgemm_unrolled:  elapsed-time=     16634     speed-up=  14.54
       dgemm_blocked:  elapsed-time=     12981     speed-up=  18.63
  1. For Core i7 CPU, with matrix size equal to 1280, I obtained the following results averaged over 5 randomly generated matrices:
         dgemm_basic:  elapsed-time=   4592295
 dgemm_basic_blocked:  elapsed-time=   1626700     speed-up=   2.82
        dgemm_avx256:  elapsed-time=   1227037     speed-up=   3.74
        dgemm_avx512:  elapsed-time=    637091     speed-up=   7.20
      dgemm_unrolled:  elapsed-time=    558080     speed-up=   8.22
       dgemm_blocked:  elapsed-time=    181634     speed-up=  25.28
  1. For Core i7 CPU, with matrix size equal to 2560, I obtained the following results for one randomly generated matrices:
         dgemm_basic:  elapsed-time=  62731813
 dgemm_basic_blocked:  elapsed-time=  16474759     speed-up=   3.80
        dgemm_avx256:  elapsed-time=  17050012     speed-up=   3.67
        dgemm_avx512:  elapsed-time=   9012450     speed-up=   6.96
      dgemm_unrolled:  elapsed-time=   5958033     speed-up=  10.52
       dgemm_blocked:  elapsed-time=   1837494     speed-up=  34.13
  1. For Core i7 CPU, with matrix size equal to 5120, I obtained the following results for one randomly generated matrices:
        dgemm_basic:   elapsed-time= 1154120417
 dgemm_basic_blocked:  elapsed-time=  137582063    speed-up=   8.38
        dgemm_avx256:  elapsed-time=  297156247    speed-up=   3.88
        dgemm_avx512:  elapsed-time=  144941094    speed-up=   7.96
      dgemm_unrolled:  elapsed-time=   97428303    speed-up=  11.84
       dgemm_blocked:  elapsed-time=   18558107    speed-up=  62.18

About

Algorithms for matrix matrix multiplication, dgemm, AVX-256, AVX-512

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published