layout | title | nav_order | parent |
---|---|---|---|
page |
Gluten For Flink with Velox Backend |
1 |
Getting-Started |
Type | Version |
---|---|
Flink | 1.20 |
OS | Ubuntu20.04/22.04, Centos7/8 |
jdk | openjdk11/jdk17 |
scala | 2.12 |
Currently, with static build Gluten+Flink+Velox backend supports all the Linux OSes, but is only tested on Ubuntu20.04/Ubuntu22.04/Centos7/Centos8. With dynamic build, Gluten+Velox backend support Ubuntu20.04/Ubuntu22.04/Centos7/Centos8 and their variants.
Currently, the officially supported Flink versions are 1.20.*.
We need to set up the JAVA_HOME
env. Currently, Gluten supports java 11 and java 17.
For x86_64
## make sure jdk8 is used
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
For aarch64
## make sure jdk8 is used
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64
export PATH=$JAVA_HOME/bin:$PATH
Get gluten
## config maven, like proxy in ~/.m2/settings.xml
## fetch gluten code
git clone https://github.com/apache/incubator-gluten.git
cd /path/to/gluten/gluten-flink
mvn clean package
Gluten for Flink depends on Velox4j to call velox. So you need to get the Velox4j packages and used them with gluten. Velox4j jar available now is velox4j-0.1.0-SNAPSHOT.jar.
Submit test script from flink run
. You can use the StreamSQLExample
as an example.
var parquet_file_path = "/PATH/TO/TPCH_PARQUET_PATH"
var gluten_root = "/PATH/TO/GLUTEN"
After deploying flink binaries, please add gluten-flink jar to flink library path, including gluten-flink-runtime-1.4.0.jar, gluten-flink-loader-1.4.0.jar and Velox4j jars above. And make them loaded before flink libraries. Then you can go to flink binary path and use the below scripts to submit the example job.
bin/start-cluster.sh
bin/flink run -d -m 0.0.0.0:8080 \
-c org.apache.flink.table.examples.java.basics.StreamSQLExample \
lib/flink-examples-table_2.12-1.20.1.jar
Then you can get the result in log/flink-*-taskexecutor-*.out
.
And you can see an operator named gluten-cal
from the web frontend of your flink job.
TODO
Now both Gluten for Flink and Velox4j have not a bundled jar including all jar depends on. So you may have to add these jars by yourself, which may including guava-33.4.0-jre.jar, jackson-core-2.18.0.jar, jackson-databind-2.18.0.jar, jackson-datatype-jdk8-2.18.0.jar, jackson-annotations-2.18.0.jar, arrow-memory-core-18.1.0.jar, arrow-memory-unsafe-18.1.0.jar, arrow-vector-18.1.0.jar, flatbuffers-java-24.3.25.jar, arrow-format-18.1.0.jar, arrow-c-data-18.1.0.jar. We will supply bundled jars soon.