Skip to content

Latest commit

 

History

History
97 lines (71 loc) · 2.97 KB

Flink.md

File metadata and controls

97 lines (71 loc) · 2.97 KB
layout title nav_order parent
page
Gluten For Flink with Velox Backend
1
Getting-Started

Supported Version

Type Version
Flink 1.20
OS Ubuntu20.04/22.04, Centos7/8
jdk openjdk11/jdk17
scala 2.12

Prerequisite

Currently, with static build Gluten+Flink+Velox backend supports all the Linux OSes, but is only tested on Ubuntu20.04/Ubuntu22.04/Centos7/Centos8. With dynamic build, Gluten+Velox backend support Ubuntu20.04/Ubuntu22.04/Centos7/Centos8 and their variants.

Currently, the officially supported Flink versions are 1.20.*.

We need to set up the JAVA_HOME env. Currently, Gluten supports java 11 and java 17.

For x86_64

## make sure jdk8 is used
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH

For aarch64

## make sure jdk8 is used
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64
export PATH=$JAVA_HOME/bin:$PATH

Get gluten

## config maven, like proxy in ~/.m2/settings.xml

## fetch gluten code
git clone https://github.com/apache/incubator-gluten.git

Build Gluten Flink with Velox Backend

cd /path/to/gluten/gluten-flink
mvn clean package 

Dependency library deployment

Gluten for Flink depends on Velox4j to call velox. So you need to get the Velox4j packages and used them with gluten. Velox4j jar available now is velox4j-0.1.0-SNAPSHOT.jar.

Submit the Flink SQL job

Submit test script from flink run. You can use the StreamSQLExample as an example.

Flink local cluster

var parquet_file_path = "/PATH/TO/TPCH_PARQUET_PATH"
var gluten_root = "/PATH/TO/GLUTEN"

After deploying flink binaries, please add gluten-flink jar to flink library path, including gluten-flink-runtime-1.4.0.jar, gluten-flink-loader-1.4.0.jar and Velox4j jars above. And make them loaded before flink libraries. Then you can go to flink binary path and use the below scripts to submit the example job.

bin/start-cluster.sh
bin/flink run -d -m 0.0.0.0:8080 \
    -c org.apache.flink.table.examples.java.basics.StreamSQLExample \
    lib/flink-examples-table_2.12-1.20.1.jar

Then you can get the result in log/flink-*-taskexecutor-*.out. And you can see an operator named gluten-cal from the web frontend of your flink job.

Flink Yarn per job mode

TODO

Notes:

Now both Gluten for Flink and Velox4j have not a bundled jar including all jar depends on. So you may have to add these jars by yourself, which may including guava-33.4.0-jre.jar, jackson-core-2.18.0.jar, jackson-databind-2.18.0.jar, jackson-datatype-jdk8-2.18.0.jar, jackson-annotations-2.18.0.jar, arrow-memory-core-18.1.0.jar, arrow-memory-unsafe-18.1.0.jar, arrow-vector-18.1.0.jar, flatbuffers-java-24.3.25.jar, arrow-format-18.1.0.jar, arrow-c-data-18.1.0.jar. We will supply bundled jars soon.