apache · jiayuasu · Feb 27, 2025 · Feb 27, 2025
@@ -0,0 +1,198 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# Apache Sedona GeoPackage with Spark
+
+This page shows how to read GeoPackage files with Apache Sedona and Spark.
+
+You’ll learn about the advantages and disadvantages of the GeoPackage file format and how to use them in production settings.
+
+Let’s start by creating a GeoPackage file and then reading it.
+
+## Reading a GeoPackage file with Sedona and Spark
+
+Let’s create a GeoPackage file with a few rows of data.
+
+Start by creating a GeoPandas DataFrame:
+
+```python
+point1 = Point(0, 0)
+point2 = Point(1, 1)
+polygon1 = Polygon([(5, 5), (6, 6), (7, 5), (6, 4)])
+
+data = {
+    "name": ["Point A", "Point B", "Polygon A"],
+    "value": [10, 20, 30],
+    "geometry": [point1, point2, polygon1],
+}
+gdf = gpd.GeoDataFrame(data, geometry="geometry")
+```
+
+Now write the GeoPandas DataFrame to a GeoPackage file:
+
+```python
+gdf.to_file("/tmp/my_file.gpkg", layer="my_layer", driver="GPKG")
+```
+
+GeoPandas knows to write this to a GeoPackage file because the code sets the driver to `GPKG`.
+
+You can think of the layer as the table name.
+
+Now let’s read the GeoPackage file Apache Sedona and Spark:
+
+```python
+df = (
+    sedona.read.format("geopackage")
+    .option("tableName", "my_layer")
+    .load("/tmp/my_file.gpkg")
+)
+df.show()
+```
+
+Here are the contents of the DataFrame:
+
+```
++---+--------------------+---------+-----+
+|fid|                geom|     name|value|
++---+--------------------+---------+-----+
+|  1|         POINT (0 0)|  Point A|   10|
+|  2|         POINT (1 1)|  Point B|   20|
+|  3|POLYGON ((5 5, 6 ...|Polygon A|   30|
++---+--------------------+---------+-----+
+```
+
+The geometry column can contain many different geometric objects like points, polygons, and many more.
+
+You can also see the metadata of the GeoPackage file:
+
+```python
+df = (
+    sedona.read.format("geopackage")
+    .option("showMetadata", "true")
+    .load("/tmp/my_file.gpkg")
+)
+df.show()
+```
+
+Here are the contents:
+
+```
++----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+
+|table_name|data_type|identifier|description|         last_change|min_x|min_y|max_x|max_y|srs_id|
++----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+
+|  my_layer| features|  my_layer|           |2025-02-25 06:28:...|  0.0|  0.0|  7.0|  6.0| 99999|
++----------+---------+----------+-----------+--------------------+-----+-----+-----+-----+------+
+```
+
+## Reading many GeoPackage files with Sedona and Spark
+
+You can also read many GeoPackage files with Sedona.  Suppose you have the following GeoPackage files:
+
+```
+gpkgs/
+  my_file1.gpkg
+  my_file2.gpkg
+```
+
+Here’s how you can read all the files:
+
+```python
+df = (
+    sedona.read.format("geopackage")
+    .option("tableName", "my_layer")
+    .load("/tmp/gpkgs")
+)
+df.show()
+```
+
+Here are the results:
+
+```
++---+--------------------+---------+-----+
+|fid|                geom|     name|value|
++---+--------------------+---------+-----+
+|  1|         POINT (5 5)|  Point C|   30|
+|  2|POLYGON ((5 5, 6 ...|Polygon A|   40|
+|  1|         POINT (0 0)|  Point A|   10|
+|  2|         POINT (1 1)|  Point B|   20|
++---+--------------------+---------+-----+
+```
+
+You just need to supply the directory containing the GeoPackage files, and Sedona can read all of them into a DataFrame.
+
+Sedona is an excellent option for analyzing many GeoPackage files because it can read and process them in parallel.
+
+## Load raster data stored in GeoPackage files
+
+You can also load data from raster tables in the GeoPackage file. To load raster data, you can use the following code.
+
+```python
+df = sedona.read.format("geopackage").option("tableName", "raster_table").load("/path/to/geopackage")
+```
+
+Here are the contents of the DataFrame:
+
+```
++---+----------+-----------+--------+--------------------+
+| id|zoom_level|tile_column|tile_row|           tile_data|
++---+----------+-----------+--------+--------------------+
+|  1|        11|        428|     778|GridCoverage2D["c...|
+|  2|        11|        429|     778|GridCoverage2D["c...|
+|  3|        11|        428|     779|GridCoverage2D["c...|
+|  4|        11|        429|     779|GridCoverage2D["c...|
+|  5|        11|        427|     777|GridCoverage2D["c...|
++---+----------+-----------+--------+--------------------+
+```
+
+Known limitations (v1.7.0):
+
+* webp rasters are not supported
+* ewkb geometries are not supported
+* filtering based on geometries envelopes are not supported
+
+All points above should be resolved soon; stay tuned!
+
+## Advantages of the GeoPackage file format
+
+The GeoPackage file format has many advantages:
+
+* Any engine can support GeoPackage because it’s an open format.
+* It’s mutable, unlike many other formats.
+* It saves CRS information, unlike some other formats.
+* It can store spatial and raster data.
+* It can be read by many engines like GeoPandas, Sedona, and SQLite, of course.
+
+However, the GeoPackage format also has many downsides.
+
+## Disadvantages of GeoPackage
+
+The GeoPackage file format has the following disadvantages:
+
+* It’s row-oriented, so it can’t take advantage of column pruning like columnar file formats.
+* It does not support multi-engine concurrency transactions.
+* SQLite transactions are supported, but building reliable transactions with other engines would be hard.
+* All engines do not fully support it.
+
+## Conclusion
+
+GeoPackage is a solid file format if you’re using SQLite.
+
+It’s excellent that Sedona can read GeoPackage files created by SQLite analyses. This allows you to read GeoPackage files in parallel and analyze massive datasets. You can also run Sedona on a cluster.
+
+If you don’t already use GeoPackage, you should probably use file formats like GeoParquet or Iceberg.
@@ -673,7 +673,7 @@ For Postgis there is no need to add a query to convert geometry types since it's
 		.withColumn("geom", f.expr("ST_GeomFromWKB(geom)")))
 	```
 
-## Load from geopackage
+## Load from GeoPackage
 
 Since v1.7.0, Sedona supports loading Geopackage file format as a DataFrame.
 
@@ -695,67 +695,7 @@ Since v1.7.0, Sedona supports loading Geopackage file format as a DataFrame.
 	df = sedona.read.format("geopackage").option("tableName", "tab").load("/path/to/geopackage")
 	```
 
-Geopackage files can contain vector data and raster data. To show the possible options from a file you can
-look into the metadata table by adding parameter showMetadata and set its value as true.
-
-=== "Scala/Java"
-
-	```scala
-	val df = sedona.read.format("geopackage").option("showMetadata", "true").load("/path/to/geopackage")
-	```
-
-=== "Java"
-
-	```java
-	Dataset<Row> df = sedona.read().format("geopackage").option("showMetadata", "true").load("/path/to/geopackage")
-	```
-
-=== "Python"
-
-	```python
-	df = sedona.read.format("geopackage").option("showMetadata", "true").load("/path/to/geopackage")
-
-Then you can see the metadata of the geopackage file like below.
-
-```
-+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
-|          table_name|data_type|          identifier|description|         last_change|     min_x|            min_y|     max_x|     max_y|srs_id|
-+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
-|gis_osm_water_a_f...| features|gis_osm_water_a_f...|           |2024-09-30 23:07:...|-9.0257084|57.96814069999999|33.4866675|80.4291867|  4326|
-+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
-```
-
-You can also load data from raster tables in the geopackage file. To load raster data, you can use the following code.
-
-=== "Scala/Java"
-
-	```scala
-	val df = sedona.read.format("geopackage").option("tableName", "raster_table").load("/path/to/geopackage")
-	```
-
-=== "Java"
-
-	```java
-	Dataset<Row> df = sedona.read().format("geopackage").option("tableName", "raster_table").load("/path/to/geopackage")
-	```
-
-=== "Python"
-
-	```python
-	df = sedona.read.format("geopackage").option("tableName", "raster_table").load("/path/to/geopackage")
-	```
-
-```
-+---+----------+-----------+--------+--------------------+
-| id|zoom_level|tile_column|tile_row|           tile_data|
-+---+----------+-----------+--------+--------------------+
-|  1|        11|        428|     778|GridCoverage2D["c...|
-|  2|        11|        429|     778|GridCoverage2D["c...|
-|  3|        11|        428|     779|GridCoverage2D["c...|
-|  4|        11|        429|     779|GridCoverage2D["c...|
-|  5|        11|        427|     777|GridCoverage2D["c...|
-+---+----------+-----------+--------+--------------------+
-```
+See [this page](../files/geopackage-sedona-spark) for more information on loading GeoPackage.
 
 ## Load from OSM PBF
 

@@ -62,6 +62,7 @@ nav:
           - Work with GeoPandas and Shapely: tutorial/geopandas-shapely.md
           - Files:
               - CSV: tutorial/files/csv-geometry-sedona-spark.md
+              - GeoPackage: tutorial/files/geopackage-sedona-spark.md
               - GeoParquet: tutorial/files/geoparquet-sedona-spark.md
               - GeoJSON: tutorial/files/geojson-sedona-spark.md
           - Map visualization SQL app: