-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flattening Tuples, finding Variables by name, and converting Time Tuples #115
Changes from 46 commits
86bc418
ceba70a
6167a76
48fbf3d
4785239
4bdb686
1208888
826059a
e3acd0a
5735070
3b3defe
98dd45b
d8b4f40
f13984c
bda9a92
d61fe0e
b346aab
ba33ab8
87c7a85
9dfd4c0
f50ba54
5ef2935
e1c61bd
15ce30a
dcdbc06
16b40da
c1d84ab
9bb3fa6
3e570c8
57b5890
05d652d
e9ea660
93a1ec0
914e0bd
6eca9ca
084226f
f7b3896
adb53c2
8b9135b
76aa7af
d2ef547
67bd76c
0437435
e0ed484
c3acb0f
e815266
85eff89
2103e76
b54c9cc
560da2f
9ee7f76
afb3f25
d119980
3c3b034
ebc368f
dab039a
e6020a5
c4bba7a
0ba05be
64040c0
221b656
d47fd09
e4929d4
106e505
1748012
06170eb
96b2deb
f52d4dd
9623da5
95475e4
3d90a71
b20d39c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,17 +5,19 @@ import latis.metadata._ | |
import latis.util.DefaultDatumOrdering | ||
import latis.util.LatisException | ||
|
||
import scala.collection.mutable.ArrayBuffer | ||
|
||
/** | ||
* Define the algebraic data type for the LaTiS implementation of the | ||
* Functional Data Model. | ||
*/ | ||
sealed trait DataType extends MetadataLike with Serializable { | ||
|
||
// Apply f to Scalars only, for now | ||
/** Recursively apply f to this DataType. */ | ||
def map(f: DataType => DataType): DataType = this match { | ||
case s: Scalar => f(s) | ||
case Tuple(es @ _*) => Tuple(es.map(f)) | ||
case Function(d, r) => Function(d.map(f), r.map(f)) | ||
case t @ Tuple(es @ _*) => f(Tuple(t.metadata, es.map(f))) | ||
case func @ Function(d, r) => f(Function(func.metadata, d.map(f), r.map(f))) | ||
} | ||
|
||
/** | ||
|
@@ -44,6 +46,33 @@ sealed trait DataType extends MetadataLike with Serializable { | |
def findVariable(variableName: String): Option[DataType] = | ||
getVariable(variableName) | ||
//TODO: support aliases | ||
//TODO: could do findAllVariables(variableName).head like in v2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm leaning towards doing this: have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll do that then. |
||
|
||
/** | ||
* Find all Variables within this Variable by the given name. | ||
* TODO: support aliases | ||
*/ | ||
def findAllVariables(variableName: String): Seq[DataType] = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit uneasy about this approach from v2 since it implies that there can be multiple variables with the same name. Maybe this is OK in the context of name (as opposed to id) given aliases. Though, ambiguous aliases smells like a potential problem. The idea is that a user could use an alias in a selection. Should it apply to all variables with that alias? Probably not. This is relevant in v2 where any variable of type We also add the original variable name to the alias list when it is renamed. These uses of alias are convenient, except when they aren't. Mathematical rigor would suggest that we never have this ambiguity problem, but aliases are convenient. Hmm. Maybe we better punt on aliases for now and hide this from the public API. |
||
variableName.split('.') match { | ||
case Array(n) => { | ||
sapols marked this conversation as resolved.
Show resolved
Hide resolved
|
||
val vbuf = ArrayBuffer[DataType]() | ||
if (this.id == variableName) vbuf += this //TODO: use hasName to cover aliases? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of these days the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that change is coming. I'm sure you're right, in theory, but as far as I'm aware, |
||
this match { | ||
case _: Scalar => | ||
case Tuple(es @ _*) => | ||
val matchingVars = es.flatMap(_.findAllVariables(variableName)) | ||
vbuf ++= matchingVars | ||
sapols marked this conversation as resolved.
Show resolved
Hide resolved
|
||
case Function(d, r) => | ||
vbuf ++= d.findAllVariables(variableName) | ||
vbuf ++= r.findAllVariables(variableName) | ||
} | ||
vbuf.toSeq | ||
} | ||
case Array(n1, n2 @ _*) => { | ||
findAllVariables(n1).flatMap(_.findAllVariables(n2.mkString("."))) | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Return the function arity of this DataType. | ||
|
@@ -73,34 +102,42 @@ sealed trait DataType extends MetadataLike with Serializable { | |
* Return this DataType with all nested Tuples flattened to a single Tuple. | ||
* A Scalar will remain a Scalar. | ||
* This form is consistent with Samples which don't preserve nested Functions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just noticing this comment. I know this line isn't part of the PR, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should say "... Samples which don't preserve nested Tuples" |
||
* Flattened Tuples retain the ID of the outermost Tuple. | ||
*/ | ||
def flatten: DataType = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a question that's more for @dlindhol and @lindholc . There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Point well taken, but I think the point is to eliminate nested tuples because (1) they cause problems and (2) they aren't preserved on the Data side anyway. So that's definitely what the method should do; maybe the question is more if "flatten" is an appropriate name or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree flattening everything is more useful and it's what we want here. I wouldn't dare suggest a new name for it because that's hard to do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eew. I didn't realize that |
||
var tupIds = "" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before I try to wrap my head around the current logic, it smells like this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like I mentioned in the standup, making |
||
// Recursive helper function that uses an accumulator (acc) | ||
// to accumulate types while in the context of a Tuple | ||
// while the recursion results build up the final type. | ||
def go(dt: DataType, acc: Seq[DataType]): Seq[DataType] = dt match { | ||
case s: Scalar => acc :+ s | ||
case Tuple(es @ _*) => es.flatMap(e => acc ++ go(e, Seq())) | ||
case Function(d, r) => acc :+ Function(d.flatten, r.flatten) | ||
case s: Scalar if tupIds.isEmpty => acc :+ s | ||
//prepend Tuple ID(s) with "." and drop leading "." when needed | ||
case s: Scalar => acc :+ s.rename(s"$tupIds.${s.id}".replaceFirst("^\\.", "")) | ||
case Function(d, r) => acc :+ Function(d.flatten, r.flatten) | ||
case t @ Tuple(es @ _*) => if (tupIds.isEmpty && !t.id.isEmpty) tupIds += t.id else tupIds += s".${t.id}" | ||
es.flatMap(e => acc ++ go(e, Seq())) | ||
} | ||
|
||
val types = go(this, Seq()) | ||
types.length match { | ||
case 1 => types.head | ||
case _ => Tuple(types) | ||
case _ => | ||
if (tupIds.split('.').isEmpty) Tuple(types) | ||
else Tuple(Metadata(tupIds.split('.').head), types) | ||
} | ||
} | ||
|
||
/** | ||
* Return the path within this DataType to a given variable ID | ||
* as a sequence of SamplePositions. | ||
* Note that length of the path reflects the number of nested Functions. | ||
* When searching a Tuple's ID, the path to the first Scalar in the Tuple is returned. | ||
*/ | ||
def getPath(id: String): Option[SamplePath] = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't about the changes you made but about this function in general. It looks like this is intended to be called on a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is where the idea of a |
||
|
||
// Recursive function to try paths until it finds a match | ||
def go(dt: DataType, id: String, currentPath: SamplePath): Option[SamplePath] = | ||
if (id == dt.id) Some(currentPath) //found it //TODO: use hasName to cover aliases? | ||
if (dt.id.split('.').contains(id)) Some(currentPath) //found it //TODO: use hasName to cover aliases? | ||
else | ||
dt match { //recurse | ||
case _: Scalar => None //dead end | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
package latis.ops | ||
|
||
import cats.implicits._ | ||
import latis.data._ | ||
import latis.metadata.Metadata | ||
import latis.model._ | ||
import latis.time.Time | ||
import latis.util.LatisException | ||
|
||
/** | ||
* Defines an Operation that converts a Tuple of time values to a single time Scalar. | ||
* | ||
* @param name the name of the Tuple that stores the time values | ||
*/ | ||
case class TimeTupleToTime(name: String = "time") extends MapOperation { | ||
|
||
override def applyToModel(model: DataType): DataType = { | ||
val timeTuple = model.findAllVariables(name) match { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if this is a case where we should be returning a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or even better, refactor and use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You may have seen the //TODO: could do findAllVariables(variableName).head like in v2 That would replace what getScalars.find(_.id == variableName) I personally think the new approach would be better, which is why I added the |
||
case Nil => throw new LatisException(s"Cannot find variable: $name") | ||
case vs: Seq[DataType] => vs.head | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any chance There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically |
||
} | ||
|
||
val time: Scalar = timeTuple match { | ||
case Tuple(es @ _*) => | ||
//build up format string | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Style question: Should this kind of comment be elevated a bit with a space and capitol letter? Maybe include the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eh, personally, I like it better lowercase and without a space. Lowercase and no space is what we normally do for comments, and I don't see this comment as anything special in need of "elevating." I'd be cool with adding |
||
val format: String = es.toList.traverse(_("units")) | ||
.fold(throw new LatisException("A time Tuple must have units defined for each element."))(_.mkString(" ")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm imagining a better multi-line expression here. Then again, folds still don't look natural to me, yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong opinion. @lindholc actually wrote this line like this in a PR comment then I copied it into this code. I don't mind the long one-liner here. |
||
//make the time Scalar | ||
val metadata = Metadata("id" -> "time", "units" -> format, "type" -> "string") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we preserve the tuple's ID instead of forcing it to be "time" here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's smart. We know the time Tuple will have a name because we have to search that name to get the variable. Since someone already went through the trouble of naming the time Tuple—and since, as the default param suggests, the name of the time Tuple will usually be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
Time(metadata) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There might be some metadata in the original time tuple worth preserving here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} | ||
|
||
model.map { | ||
case t: Tuple if t.id == name => time //TODO: support aliases with hasName | ||
case v => v | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like we should be able to traverse with map first. Then when we stumble upon desired time tuple, construct the time (via a function?) and replace it. Otherwise it feels rather imperative. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😮 I didn't think of this! That lets me remove so much code. Beautiful. |
||
} | ||
} | ||
|
||
override def mapFunction(model: DataType): Sample => Sample = { | ||
val timePos: SamplePosition = model.getPath(name) match { | ||
case Some(List(sp)) => sp | ||
case _ => throw new LatisException(s"Cannot find path to variable: $name") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it technically can; the scaladoc for
So if the time Tuple was inside a nested Function or two, we'd get our error Maybe it'd be appropriate to add a case between these two cases that catches a List of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
} | ||
val timeLen: Int = model.findAllVariables(name).head match { | ||
case t: Tuple => t.flatten match { | ||
case tf: Tuple => tf.elements.length //TODO: is this "dimensionality"? Should it be a first class citizen? | ||
sapols marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
case _ => throw new LatisException(s"Cannot find variable: $name") | ||
sapols marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
(sample: Sample) => sample match { | ||
case Sample(dd, rd) => | ||
//extract text values and join with space | ||
//TODO: join with delimiter, problem when we use regex? | ||
timePos match { | ||
//TODO: reduce code duplication? Only difference is dd vs rd (types don't match) and replacing domain vs range | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I put this TODO here because the 8 lines in these two cases are nearly identical... But this is the cleanest implementation I could come up. I tried making it so that all the pattern match does is store either I feel like a cleaner refactor is possible but it's not coming to me. If anyone has ideas on how to combine the logic of these two cases, I'm all ears. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could define a function
I think that would allow you to pass
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Brilliant! This is so much better and your solution comes from an angle my brain didn't wanna consider. Thank you. |
||
case DomainPosition(n) => | ||
val time: Datum = { | ||
val timeData = dd.slice(n, n+timeLen) | ||
Data.StringValue( | ||
timeData.map { case d: Datum => d.asString }.mkString(" ") | ||
) | ||
} | ||
val domain = dd.slice(0, n) ++ Seq(time) ++ dd.slice(n+timeLen, dd.length) | ||
Sample(domain, rd) | ||
case RangePosition(n) => | ||
val time: Datum = { | ||
val timeData = rd.slice(n, n+timeLen) | ||
Data.StringValue( | ||
timeData.map { case d: Datum => d.asString }.mkString(" ") | ||
) | ||
} | ||
val range = rd.slice(0, n) ++ Seq(time) ++ rd.slice(n+timeLen, dd.length) | ||
Sample(dd, range) | ||
} | ||
} | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we can always include metadata munging as part of the function
f
. This simply carries the original metadata through so it can do so. That makes me feel better aboutDataType.map
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! I was pretty convinced I nailed the logic here. It's pretty simple actually.