Protocol v1 ben #19

bgaidioz · 2025-01-13T10:29:20Z

predicates being considered false when not supported (in cache filtering),
edited cache logic to accept a cache that implements the exact same predicates,
edited MessageAdapter related code because it appeared to be a singleton reused for all tasks,
missing filter on table name,
cleared two warning stack traces indicating bad practices or missing calls to close,
made server read its parameters from config,
handle new cache and max-size related message fields in Query,
added test framework Postgres DAS.

Added a test postgres das

wip

miguelbranco80 · 2025-01-22T15:26:15Z

src/main/resources/reference.conf

+    server {
+        port = 50051 # the port the server listens on
+        monitoring-port = 8080 # http port for monitoring
+        max-chunk-size = 1000 # the maximum number of rows that can be returned in a single chunk


What is this one, and why different from batch-size below?
Is it necessary or can be done 'time-based' chunking only?

We have to be careful to not "over-fetch" from the source systems.

So now the reference.conf has two settings:

server { batch-latency = 100 millis # how long we wait for more rows before we send a batch } cache { batch-size = 1000 # how many rows of data are produced per producerInterval tick }

And within the protocol we also get a max buffer size as part of execute requests. Setting server.batch-latency impacts how long one waits for filling that max buffer the client specified. The cache.batch-size is the number of rows we pick per producer tick, from the DAS. I agree if DAS => cache queue => gRPC were properly pipelines, we'd be reading as many rows as needed to filling the client's buffer, or until the batch-latency would be hit.

src/main/resources/reference.conf

bgaidioz · 2025-01-27T09:43:22Z

src/main/scala/com/rawlabs/das/server/cache/iterator/ExpressionEvaluator.scala

@@ -281,7 +281,7 @@ object ExpressionEvaluator {
          evalEquals(x, y)
        }

-      case _ => false
+      case _ => true


Predicates that aren't implemented are considered as true since they need to be evaluated in Postgres.

I don't agree with this part. (And in any case, this comment would have belonged in the source code, not in the PR).

But this is an ExpressionEvaluator component, so I'd expect it to act as one. This has nothing to do with Postgres, as this is its own standalone component, which is in no way depending or relating to how Postgres or FDW happen to use it.

If the predicates are not implemented, we either say we don't support it (return an option, exception, etc) or something else along those lines. In the interaction point between components, then we can assume that not being implemented provides an explicit semantic; but that's for the composition component/layer to do, not the implementation. Providing a semantic as this one here is crosscutting two components and concerns that are not related to each other.

If patched the ExpressionEvaluator to throw when an expression isn't supported. The layer above, QualEvaluator, is the one who logs the warning, ignores the failure and interprets the predicate as true (meaning it ignores the predicate as if it's not supported). It's also the one that interprets null as false. The QualEvaluator was changed to evaluate predicates one by one instead of as an "and", so that one can skip failing ones.

bgaidioz · 2025-01-27T09:46:06Z

src/main/scala/com/rawlabs/das/server/cache/iterator/QualSelectivityAnalyzer.scala

@@ -28,12 +30,14 @@ object QualSelectivityAnalyzer {
  /**
   * @param oldQuals Existing qualifiers
   * @param newQuals New qualifiers
-   * @return None if `newQuals` is NOT strictly more selective than `oldQuals` Some(difference) if it is, where
-   *   'difference' is the subset of `newQuals` that imposes stricter constraints than already in `oldQuals`.
+   * @return None if `newQuals` is NOT as selective as `oldQuals`, Some(difference) if it is, where 'difference' is the


In the original implementation the logic would discard a cache if the newQuals weren't strictly more selective. The exact same query having produced that cache wasn't reusing it's own cache.

src/main/scala/com/rawlabs/das/server/cache/manager/CacheManager.scala

bgaidioz · 2025-01-27T09:51:32Z

src/main/scala/com/rawlabs/das/server/cache/manager/CacheManager.scala

@@ -149,7 +147,12 @@ private class CacheManagerBehavior[T](

  // We'll keep track of child data source actors by cacheId
  private var dataSourceMap = Map.empty[UUID, ActorRef[ChronicleDataSource.ChronicleDataSourceCommand]]
-
+  private val dataSourceEventAdapter: ActorRef[ChronicleDataSource.DataSourceLifecycleEvent] =


Apparently a MessageAdapter is a singleton per type it handles. In the original implementation, a MessageAdapter that handles the messages and passes them to the CacheManager, was created per data source and would encapsulate the cacheId to be passed to CacheManager.

But since internally it's a singleton, the same message handler was used for two different data sources. For example if both would hit the grace-period, we'd get twice the same cacheId in both messages to CacheManager.

Here's it's created once, but messages are parameters with the cacheId to propagate.

Ok, didn't know.

bgaidioz · 2025-01-27T09:52:17Z

src/main/scala/com/rawlabs/das/server/cache/manager/CacheManager.scala

@@ -371,16 +375,13 @@ private class CacheManagerBehavior[T](

    ctx.spawn(
      ChronicleDataSource[T](
+        cacheId = cacheId,


Pass the cacheId to the data source since it has to be eventually sent in its messages.

bgaidioz · 2025-01-27T09:53:16Z

src/main/scala/com/rawlabs/das/server/cache/manager/CacheManager.scala

@@ -416,7 +416,10 @@ private class CacheManagerBehavior[T](
    val reader = archivedStore.newReader()

    Source.fromIterator(() => reader).watchTermination() { (mat, doneF) =>
-      doneF.onComplete(_ => reader.close())(executionContext)
+      doneF.onComplete { _ =>


Missing store close that leads to a stack trace.

bgaidioz · 2025-01-27T09:55:07Z

src/main/scala/com/rawlabs/das/server/cache/manager/CacheManager.scala

@@ -395,10 +396,9 @@ private class CacheManagerBehavior[T](
    dsRef
      .ask[ChronicleDataSource.SubscribeResponse](replyTo => ChronicleDataSource.RequestSubscribe(replyTo))
      .map {
-        case ChronicleDataSource.Subscribed(consumerId) =>
+        case ChronicleDataSource.Subscribed(consumerId, tailer) =>


Creating the tailer outside instead of passing the storage and let it create the tailer inside. That fixes the stack trace warning showing an object is garbage collected before being closed.

bgaidioz · 2025-01-27T09:55:59Z

src/main/scala/com/rawlabs/das/server/cache/queue/AkkaChronicleDataSource.scala

-  final case class DataProductionComplete(sizeInBytes: Long) extends DataSourceLifecycleEvent
-  final case class DataProductionError(msg: String) extends DataSourceLifecycleEvent
-  final case object DataProductionVoluntaryStop extends DataSourceLifecycleEvent
+  final case class DataProductionComplete(cacheId: UUID, sizeInBytes: Long) extends DataSourceLifecycleEvent


cacheId part of the message to fix the fact the singleton message handler knows what to send to the CacheManager.

bgaidioz · 2025-01-27T09:56:51Z

src/main/scala/com/rawlabs/das/server/cache/queue/AkkaChronicleDataSource.scala

@@ -322,6 +327,10 @@ private class ChronicleDataSourceBehavior[T](
        case ConsumerTerminated(cid) =>
          activeConsumers -= 1
          log.info(s"Consumer $cid terminated; activeConsumers=$activeConsumers")
+          tailerMap.remove(cid).foreach { tailer =>


Closing the tailer that was open after the consumer is terminated (fixes the stack trace warning).

bgaidioz · 2025-01-27T09:57:39Z

src/main/scala/com/rawlabs/das/server/grpc/SizeBasedBatcher.scala

@@ -0,0 +1,111 @@
+/*


Unused now. Will delete.

miguelbranco80 · 2025-01-27T12:33:46Z

src/test/scala/com/rawlabs/das/postgresql/DASPostgresqlTable.scala

+
+  private def quoteIdentifier(ident: String): String = {
+    // naive approach
+    s""""$ident""""


Yes, needs a regex and adding quotes.

I fixed a number of things in PostgresDAS (like it was ignoring the schema), and about the ident, we wrap it in double quotes because we're internally using the postgres identifier, the one that can be wrapped in double quotes. I added the logic to escape double quotes if ever they're part of the identifier.

Fields and tables are advertised such that double quotes is safe. Clients are sending idenfiers that follow those (DAS protocol isn't case insensitive). I think that works like that.

miguelbranco80 · 2025-01-27T12:36:12Z

src/test/scala/com/rawlabs/das/mock/DASMockTestApp.scala


-  // (Can be used to pass custom parameters/settings in the future?)
-  DASServer.main(Array())
+  test("Run the main code with mock services")(DASServer.main(Array()))


Will this not hang the CI and leave it running?
Why make it a test? It's not testing anything (?)

Yes it was left by accident as I run live tests like that. I rewrote it as a test as you used to have because DASServer.main wouldn't run within test.

It's missing the service declaration file. I'll switch it back to an App and add the service file too.

src/test/scala/com/rawlabs/das/postgresql/DASPostgresqlTable.scala

src/test/scala/com/rawlabs/das/postgresql/PostgresqlBackend.scala

miguelbranco80 · 2025-01-27T12:49:20Z

src/main/scala/com/rawlabs/das/server/cache/iterator/ExpressionEvaluator.scala

@@ -297,9 +297,15 @@ object ExpressionEvaluator {
      // Strings
      case (StringVal(s1), StringVal(s2)) =>
        s1 < s2
-
-      // (Add date/time if needed)
-      case _ => false


Same comment applies elsewhere.
Options may be cumbersome - but cleaner -, so exceptions would be easier.
That said, o1pro can write the option code if we do that route.
But this is not to be assumed in this component as it is mixing two different meanings.

Yes it's using Exceptions now.

miguelbranco80 · 2025-01-27T12:50:44Z

src/main/scala/com/rawlabs/das/server/cache/iterator/QualSelectivityAnalyzer.scala

+          val days = localDate.toEpochDay
+          Some(BigDecimal(days))
+        } catch {
+          case _: Throwable => None


Throwable is the wrong choice here (and below), as it is way too broad and captures interruption exceptions, out of memory, and other things hat have nothing to do with date conversions. At the very least use NonFatal.

Catching DateTimeException now. If ever we'd get wrong numbers from our Value, LocalDateTime and company would throw.

miguelbranco80 · 2025-01-27T12:50:57Z

src/main/scala/com/rawlabs/das/server/cache/iterator/QualSelectivityAnalyzer.scala

+          val micros = totalSeconds * 1_000_000L + (t.getNano.toLong / 1000L)
+          Some(BigDecimal(micros))
+        } catch {
+          case _: Throwable => None


miguelbranco80 · 2025-01-27T12:51:07Z

src/main/scala/com/rawlabs/das/server/cache/iterator/QualSelectivityAnalyzer.scala

+          val micros = epochSec * 1_000_000L + (ts.getNano.toLong / 1000L)
+          Some(BigDecimal(micros))
+        } catch {
+          case _: Throwable => None


miguelbranco80 · 2025-01-27T12:51:32Z

src/main/scala/com/rawlabs/das/server/cache/manager/CacheManager.scala

@@ -149,7 +147,12 @@ private class CacheManagerBehavior[T](

  // We'll keep track of child data source actors by cacheId
  private var dataSourceMap = Map.empty[UUID, ActorRef[ChronicleDataSource.ChronicleDataSourceCommand]]
-
+  private val dataSourceEventAdapter: ActorRef[ChronicleDataSource.DataSourceLifecycleEvent] =


Ok, didn't know.

wip

bgaidioz added 8 commits January 9, 2025 16:51

wip

c066fd0

src/test/scala/com/rawlabs/das/postgresql

5b2e4b1

Added a test postgres das

postgres-das

fee14a1

Fixes to ExpressionEvaluator

f4ca287

format

f426481

Support for dates, times, timestamps in selectivity analysis

a379049

SizeBasedBatcher

6f9d7c7

Fixed comment about filtering by table

121f309

bgaidioz force-pushed the protocol-v1-ben branch 2 times, most recently from eea2573 to e7d793b Compare January 13, 2025 14:29

bgaidioz added 15 commits January 13, 2025 15:36

SizeBasedBatcher

78fd2e0

format

fda28d7

format

310655a

Added date/time/timestamps selectivity tests

45ccc22

format

d914681

Fixed test

8a132f4

Missing calls to close

9e1a45f

kill switch

f428118

Added config

3a96892

more config

5635f1a

removed extra close

37fb286

more config

b128d1b

more config

b1bb693

Fixed bad ref to cache ID

c34d417

Clean import (scalafmtAll)

bff5e75

bgaidioz force-pushed the protocol-v1-ben branch from e7d793b to e8ffdd0 Compare January 17, 2025 15:28

bgaidioz added 2 commits January 17, 2025 16:35

simpler better batching

0dd4001

some error reporting

2e51e54

bgaidioz force-pushed the protocol-v1-ben branch from e8ffdd0 to 54b6561 Compare January 17, 2025 15:35

fixup! Added dependency to protocol-das-1.0.0-beta

f08ff43

wip

miguelbranco80 reviewed Jan 22, 2025

View reviewed changes

bgaidioz added 2 commits January 24, 2025 16:57

Added missing args to "query" and fixed batching into bytes

3f4cc43

Clear the stacktrace warning hit when cancelling a consumer early

5ec7438

bgaidioz commented Jan 27, 2025

View reviewed changes

bgaidioz added 2 commits January 27, 2025 11:00

Cleaning

13a430c

Deleted SqliteDAS

4642dc2

bgaidioz commented Jan 27, 2025

View reviewed changes

reformat

cebe921

miguelbranco80 reviewed Jan 27, 2025

View reviewed changes

bgaidioz added 2 commits January 28, 2025 14:26

fixup! reformat

b6bb303

wip

Addressed format related comments

2d025a5

bgaidioz force-pushed the protocol-v1-ben branch from 66437d8 to 292dff7 Compare January 29, 2025 10:00

bgaidioz added 4 commits January 30, 2025 17:30

Addressed comment about the expression evaluator

90bc3f1

Addressed comment about DASMockTestApp

678e812

Default batch to 100 millis

867bda2

Handle optional arguments

ad2a8db

bgaidioz force-pushed the protocol-v1-ben branch from 9009ee7 to 9f37ea6 Compare January 30, 2025 16:38

bgaidioz added 8 commits January 30, 2025 18:18

Fixed DATE/TIMESTAMP conversion to make them comparable

f7ba10c

Add support for DATE/TIMESTAMP comparison in expression comparison

a3e62d8

Fixed LIKE and 3-valued logic AND/OR

de79fe6

Simplify duplicate code

b365579

Deleted weird test (duplicate anyway)

a58d332

Working test

85a2008

Fixed PostgresDAS

106a66f

Added ref to protocol-das beta version

34b3eba

bgaidioz force-pushed the protocol-v1-ben branch from 9f37ea6 to 34b3eba Compare January 30, 2025 17:21

miguelbranco80 approved these changes Feb 11, 2025

View reviewed changes

miguelbranco80 marked this pull request as ready for review February 11, 2025 12:20

miguelbranco80 merged commit 068af19 into protocol-v1 Feb 11, 2025
1 of 2 checks passed

miguelbranco80 deleted the protocol-v1-ben branch February 11, 2025 12:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protocol v1 ben #19

Protocol v1 ben #19

bgaidioz commented Jan 13, 2025 •

edited

Loading

miguelbranco80 Jan 22, 2025

bgaidioz Jan 30, 2025

bgaidioz Jan 27, 2025

miguelbranco80 Jan 27, 2025

bgaidioz Jan 30, 2025

bgaidioz Jan 27, 2025

bgaidioz Jan 27, 2025

miguelbranco80 Jan 27, 2025

bgaidioz Jan 27, 2025

bgaidioz Jan 27, 2025

bgaidioz Jan 27, 2025

bgaidioz Jan 27, 2025

bgaidioz Jan 27, 2025

bgaidioz Jan 27, 2025

miguelbranco80 Jan 27, 2025

bgaidioz Jan 30, 2025

miguelbranco80 Jan 27, 2025

bgaidioz Jan 28, 2025

miguelbranco80 Jan 27, 2025

bgaidioz Jan 30, 2025

miguelbranco80 Jan 27, 2025

bgaidioz Jan 30, 2025

miguelbranco80 Jan 27, 2025

miguelbranco80 Jan 27, 2025

miguelbranco80 Jan 27, 2025

Protocol v1 ben #19

Protocol v1 ben #19

Conversation

bgaidioz commented Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgaidioz commented Jan 13, 2025 •

edited

Loading