Skip to content

Commit

Permalink
[SEDONA-715] Add apache zeppelin notebook support for the docker image (
Browse files Browse the repository at this point in the history
#1826)

* [SEDONA-715] Add apache zeppelin notebook support for the docker image

Did you read the Contributor Guide?
Yes, I have read Contributor Rules and Contributor Development Guide

Is this PR related to a JIRA ticket?
Yes, the URL of the associated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-715. The PR name follows the format [SEDONA-XXX] my subject.

What changes were proposed in this PR?
Added Zeppelin notebook support for the sedona docker image along with visualization helium plugin

How was this patch tested?
Tested locally by building the image and running it. Visualized with the sample notebook that has been bundled with the commit.

Did this PR include necessary documentation updates?
Yes, added additional information that may help the users.

* [SEDONA-715] Modified documentation and scripts

1. Modified dockerfile to use the new format for ENV
2. Added config files for zeppelin to improve user experience
3. Modified documentation to support the changes.

* Modified files based on pre-commit lint rules

* Marked sh files as executable
  • Loading branch information
sshiv012 authored Feb 28, 2025
1 parent 714bc52 commit 5e6a673
Show file tree
Hide file tree
Showing 12 changed files with 125,652 additions and 28 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ dependency-reduced-pom.xml
__pycache__
/.bsp
/.scala-build
.DS_Store
31 changes: 23 additions & 8 deletions docker/sedona-spark-jupyterlab/sedona-jupyterlab.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,30 +25,44 @@ ARG spark_xml_version=0.16.0
ARG sedona_version=1.5.1
ARG geotools_wrapper_version=1.5.1-28.2
ARG spark_extension_version=2.11.0
ARG zeppelin_version=0.12.0

# Set up envs
ENV SHARED_WORKSPACE=${shared_workspace}
ENV SPARK_HOME /usr/local/lib/python3.10/dist-packages/pyspark
ENV SEDONA_HOME /opt/sedona
ENV SPARK_HOME=/usr/local/lib/python3.10/dist-packages/pyspark
ENV SEDONA_HOME=/opt/sedona
ENV ZEPPELIN_HOME=/opt/zeppelin
RUN mkdir ${SEDONA_HOME}

ENV SPARK_MASTER_HOST localhost
ENV SPARK_MASTER_PORT 7077
ENV SPARK_MASTER_HOST=localhost
ENV SPARK_MASTER_PORT=7077
ENV PYTHONPATH=$SPARK_HOME/python
ENV PYSPARK_PYTHON python3
ENV PYSPARK_DRIVER_PYTHON jupyter

ENV PYSPARK_PYTHON=python3
ENV PYSPARK_DRIVER_PYTHON=jupyter
COPY ./ ${SEDONA_HOME}/

RUN chmod +x ${SEDONA_HOME}/docker/spark.sh
RUN chmod +x ${SEDONA_HOME}/docker/sedona.sh
RUN chmod +x ${SEDONA_HOME}/docker/zeppelin/install-zeppelin.sh
RUN ${SEDONA_HOME}/docker/spark.sh ${spark_version} ${hadoop_s3_version} ${aws_sdk_version} ${spark_xml_version}

# Install Python dependencies
COPY docker/sedona-spark-jupyterlab/requirements.txt /opt/requirements.txt
RUN pip3 install -r /opt/requirements.txt
RUN pip3 install --default-timeout=100 -r /opt/requirements.txt

RUN ${SEDONA_HOME}/docker/sedona.sh ${sedona_version} ${geotools_wrapper_version} ${spark_version} ${spark_extension_version}
RUN ${SEDONA_HOME}/docker/zeppelin/install-zeppelin.sh ${zeppelin_version} /opt
# Set up Zeppelin configuration
COPY docker/zeppelin/conf/zeppelin-site.xml ${ZEPPELIN_HOME}/conf/
COPY docker/zeppelin/conf/helium.json ${ZEPPELIN_HOME}/conf/
COPY docker/zeppelin/conf/interpreter.json ${ZEPPELIN_HOME}/conf/
RUN mkdir ${ZEPPELIN_HOME}/helium
RUN mkdir ${ZEPPELIN_HOME}/leaflet
RUN mkdir ${ZEPPELIN_HOME}/notebook/sedona-tutorial
COPY zeppelin/ ${ZEPPELIN_HOME}/leaflet
COPY docker/zeppelin/conf/sedona-zeppelin.json ${ZEPPELIN_HOME}/helium/
COPY docker/zeppelin/examples/*.zpln ${ZEPPELIN_HOME}/notebook/sedona-tutorial/
COPY docker/zeppelin/examples/arealm.csv /opt/workspace/examples/data/

COPY docs/usecases/*.ipynb /opt/workspace/examples/
COPY docs/usecases/*.py /opt/workspace/examples/
Expand All @@ -69,6 +83,7 @@ EXPOSE 8888
EXPOSE 8080
EXPOSE 8081
EXPOSE 4040
EXPOSE 8085

WORKDIR ${SHARED_WORKSPACE}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

# Allow files and folders with a pattern starting with !
!docker/**
!zeppelin/**
!docs/usecases/**
!python/**
!spark-shaded/target/**
1 change: 1 addition & 0 deletions docker/sedona-spark-jupyterlab/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -77,5 +77,6 @@ echo "spark.executor.memory $EXECUTOR_MEM" >> "${SPARK_HOME}"/conf/spark-default
service ssh start
"${SPARK_HOME}"/sbin/start-all.sh

"${ZEPPELIN_HOME}"/bin/zeppelin-daemon.sh start
# Start jupyter lab
exec jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root --NotebookApp.token=
9 changes: 9 additions & 0 deletions docker/zeppelin/conf/helium.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"enabled": {
"sedona-zeppelin": "/opt/zeppelin/leaflet"
},
"packageConfig": {},
"bundleDisplayOrder": [
"sedona-zeppelin"
]
}
Loading

0 comments on commit 5e6a673

Please sign in to comment.