Skip to content
Akash Kothari edited this page Jun 13, 2022 · 12 revisions

Hydride: A Retargetable Compiler for Modern Hardware Architectures

Introduction

Domain-specific hardware accelerators and hardware extensions to existing architectures are emerging to efficiently support modern tensor and image processing workloads. These custom accelerators and CPU/GPU extensions provide high performance and energy efficiency for important operations like matrix multiplication, tensor convolution, and so on. Accelerators such as Qualcomm Hexagon DSP and x86 vector extensions such as Intel AVX-512 provide specialized instructions that help optimize tensor and stencil computations. These specialized instructions are targeted using domain-specific libraries such as OneDNN, Hexagon NN, and so on, and DSL (Domain-Specific Language) compilers such as Halide, TVM, XLA, and so on.

High-level operators in tensor and image processing applications have to be lowered to complex, low-level instructions to fully leverage the capabilities of domain-specific hardware architectures. These complex instructions include vector instruction sets such as Intel VNNI, HVX (Hexagon Vector eXtensions), ARM Neon, and so on which provide variety of SIMD and non-SIMD (cross-lane) operations. These include specialized instructions to perform dot products and various reduction operations, and change data layout by moving data across lanes. The mappings between high-level tensor and stencil operators and ISA instructions are manually implemented engineers and researchers in high-performance libraries and backends of DSL compilers by pattern-matching high-level expressions to low-level instructions. This process requires lot of manual engineering effort, and is often cumbersome, time-consuming and error-prone.