-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Geography user-defined type #1811
Changes from all commits
492ad69
26be123
fa7b1cd
b788df4
3ab7945
5fd0fff
df46d14
1d33c4d
36ebf17
3a0c714
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,7 @@ | |
import java.util.stream.Collectors; | ||
import org.apache.commons.lang3.tuple.Pair; | ||
import org.apache.sedona.common.geometryObjects.Circle; | ||
import org.apache.sedona.common.geometryObjects.Geography; | ||
import org.apache.sedona.common.sphere.Spheroid; | ||
import org.apache.sedona.common.subDivide.GeometrySubDivider; | ||
import org.apache.sedona.common.utils.*; | ||
|
@@ -784,6 +785,10 @@ public static byte[] asEWKB(Geometry geometry) { | |
return GeomUtils.getEWKB(geometry); | ||
} | ||
|
||
public static byte[] geogAsEWKB(Geography geography) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not call is asEWKB? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I add that it fails to compile with:
(That said, I can't any call to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a (quite complex syntax) for referring to a specific overload of functions in Scala. Even if we support argument type based overloading in Inferred expression, we still have to list all overloads we want to delegate to. For instance InferredExpression(
(g: Geography) => Functions.asEWKB(g),
(g: Geometry) => Functions.asEWKB(g)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having Geometry and Geography from different packages / repositories looks a bit strange to me, though I understand the tradeoff here. Is it a good opportunity to formally introduce the sedona geo types for this in this effort? |
||
return asEWKB(geography.getGeometry()); | ||
} | ||
|
||
public static String asHexEWKB(Geometry geom, String endian) { | ||
if (endian.equalsIgnoreCase("NDR")) { | ||
return GeomUtils.getHexEWKB(geom, ByteOrderValues.LITTLE_ENDIAN); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.sedona.common.geometryObjects; | ||
|
||
import org.locationtech.jts.geom.Geometry; | ||
|
||
public class Geography { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Dewey that we'd better define Geography differently to avoid possible misuse. The internal representation of Geography may change as we integrate with libraries that supports spherical geometry, then the JTS Geometry representation of Geography will become optional. This design gives us the flexibility of opting out JTS Geometry when it is not needed. |
||
private Geometry geometry; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there any extra overhead to using a Geometry internally? like time or space cost in the constructor. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There might be (e.g., we use some other library to do an overlay operation like intersection and have to convert from that library's representation back to JTS). Keeping the field private I think should at least provide a route to changing the internal implementation if there are performance issues in the future (but also happy to hear suggestions otherwise!) |
||
|
||
public Geography(Geometry geometry) { | ||
this.geometry = geometry; | ||
} | ||
|
||
public Geometry getGeometry() { | ||
return this.geometry; | ||
} | ||
|
||
public String toString() { | ||
return this.geometry.toText(); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
import pickle | ||
|
||
from sedona.utils.decorators import require | ||
|
||
|
||
class Geography: | ||
|
||
def __init__(self, geometry): | ||
self._geom = geometry | ||
self.userData = None | ||
|
||
def getUserData(self): | ||
return self.userData | ||
|
||
@classmethod | ||
def from_jvm_instance(cls, java_obj): | ||
return Geography(java_obj.geometry) | ||
|
||
@classmethod | ||
def serialize_for_java(cls, geogs): | ||
return pickle.dumps(geogs) | ||
|
||
@require(["Geography"]) | ||
def create_jvm_instance(self, jvm): | ||
return jvm.Geography(self._geom) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.spark.sql.sedona_sql.UDT | ||
|
||
import org.apache.sedona.common.geometrySerde.GeometrySerializer; | ||
import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData} | ||
import org.apache.spark.sql.types._ | ||
import org.json4s.JsonDSL._ | ||
import org.json4s.JsonAST.JValue | ||
import org.apache.sedona.common.geometryObjects.Geography; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly we probably don't even need to have a new Java Geography in sedona-common because the storage model of Geography is identical to Geometry (unless we want to annotate on the edge interpolation algorithm?). So I would say we just have a GeographyUDT and it uses JTS Geometry out of the box. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for taking a while to circle back here! I am worried that a strict subclass will lead to functions that accept a Apologies if I'm missing something obvious here! |
||
|
||
class GeographyUDT extends UserDefinedType[Geography] { | ||
override def sqlType: DataType = BinaryType | ||
|
||
override def pyUDT: String = "sedona.sql.types.GeographyType" | ||
|
||
override def userClass: Class[Geography] = classOf[Geography] | ||
|
||
override def serialize(obj: Geography): Array[Byte] = | ||
GeometrySerializer.serialize(obj.getGeometry()) | ||
|
||
override def deserialize(datum: Any): Geography = { | ||
datum match { | ||
case value: Array[Byte] => new Geography(GeometrySerializer.deserialize(value)) | ||
} | ||
} | ||
|
||
override private[sql] def jsonValue: JValue = { | ||
super.jsonValue mapField { | ||
case ("class", _) => "class" -> this.getClass.getName.stripSuffix("$") | ||
case other: Any => other | ||
} | ||
} | ||
|
||
override def equals(other: Any): Boolean = other match { | ||
case _: UserDefinedType[_] => other.isInstanceOf[GeographyUDT] | ||
case _ => false | ||
} | ||
|
||
override def hashCode(): Int = userClass.hashCode() | ||
} | ||
|
||
case object GeographyUDT | ||
extends org.apache.spark.sql.sedona_sql.UDT.GeographyUDT | ||
with scala.Serializable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put the Geography related functions / constructors ... to separate files, instead of mixing with the Geometry function.
Can you also put Geography functions into individual files? The old
Functions.java
/Constructors.java
are too large so it is probably better to put them into individual files such as "GeogFromWKB"?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally! I would like to solve the runtime overload problem first...I can try to look harder at how one registers a UDF with more than one signature (maybe it is not possible in Spark?), since that is the part that is currently causing an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible in Spark, the way to do this is quite flexible but awkward.
The implementation of
inputTypes
can inspect what is the types of the actual expressions passed into it, and return a suitable function signature. Examples aresedona/spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterFunctions.scala
Lines 140 to 150 in a8da3cd
We can also inspect the types of inputs and run different code depending on input types in eval function:
sedona/spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterFunctions.scala
Lines 105 to 131 in a8da3cd
Inferred expression encapsulates the above function overloading mechanism of Spark and supports delegating the Spark expression to Java functions according to their arity. It is possible to extend it to support more complex function overloading rules.