Basic structure of a builder, how can this work? #11

randytpierce · 2023-02-16T19:14:25Z

randytpierce
Feb 16, 2023
Maintainer

Initially I am thinking the algorithm is really a function.

Of course GO doesn't really do eval (it's compiled not interpreted) and it probably shouldn't even if someone did implement it because that would lead to incompatibilities when deploying different architectures, so the algorithm is probably a function. We do need to identify the proper function to use (i.e. algorithm), but that can be done with a switch statement.

The data, I believe, is a map. You obviously need populationA and populationB which are arrays of numbers, probably float64. It might be more complicated, like an array of these populations for example a list of forecast lead times that each have two populations.

The function has to apply a statistic to the populations and then compare them, so there needs to be a function for each statistic. Those are probably different builder types. In a given scorecard major Row you can have many different statistical calculations (each of those minor rows is a stat/variable combination).

So for student's t test as one example consider this and
this

So maybe we have builders that are like "TtestRMSE" and "TtestBias".

For data I don't think it is a good idea to have the calculation builders do QUERIES, like a query that would be like
SELECT AVE(data[*]['temperature']) where ....
we should query the data into sets of numbers that are put into the proper data structure and then pass that into the calculation builders. Otherwise the calculation builders will get too specific to be applied to other data sets. That said we probably do have DATA builders which do queries and data manipulation and then they pass the data into the calculation builders. Even then the DATA builders shouldn't be doing math inside the queries. Query for data and then do MATH in code.

randytpierce · 2023-03-06T18:56:38Z

randytpierce
Mar 6, 2023
Maintainer Author

This is how I am proceeding with the manager->director->builder interactions. These notes are in the appropriate readme files in the repo and in the prologue of each appropriate go file, but I wanted to put them here for discussion.

manager....

VxDataProcessor Manager

The Manager has the following responsibilities and transformations.

The manager will maintain a Couchbase connection.
The manager waits for a process_id to be put on a queue from the service. The service
will have as many managers open as go workers as needed so that it can handle multiple service
requests simultaneously. The service starts a manager in a GO worker routine and
the manager is passed the id of the corresponding scorecard document.
The manager will read the scorcard document associated with the id from Couchbase
and maintain it in memory on behalf of its directors.
The manager will start go workers (which are directors) making sure that the number of
workers (directors) does not exceed the maximum number of database connections
configured for each kind of director. For example currently most apps are legacy apps
that require a mysql database connection. If the configuration specifies 20 allowed mysql
database connections the manager will allow up to twenty workers. Each worker is a
director and each director will maintain its own database connection (e.g. mysql client).
The appname associated with a scorecard block tells the manager what kind of director is
needed for each scorecard block. Each block requires an associated database query template. The
manager will build a queue of sc_element structures each of which has an appname (url?),
and a pointer to the associated result section (which has the template variables e.g. region,
statistic, variable - they are the keys to the specific row). For example
... results..["rows"]["Row0"]["data"]["All HRRR domain"]["Bias (Model - Obs)"][]"2m RH"][.... ]
The director will query the app that is associated with the appname for the associated template
using the app rest API.
Each director must derive a query (making appropriate substitutions to the template) for each
cell that needs to be calculated, then query the database for the cell data, format the data
into an InputDataElement and send the data element to an appropriate builder in a go routine. The
director uses as many GO routines as necessary to derive all the cells required of it. For example,
maybe this is one director per row, and the builder parts are delineated by region and forecastlen.
The builder will process the data for a given cell by...
1. Matching the data by time.
2. Processing the data for the associated statistic (like RMSE or BIAS).
3. Processing the pvalue statistic.
4. Return the result to the director.
The builders update the in-memory scorecard directly. When enough builders finish
the director will notify the manager when a scorecard upsert is necessary.
(perhaps when each row is complete, i.e. the director dies?)
The manager upserts the scorecard document with the current new results. There may be many of these upserts.
The manager knows that the results have all been processed when the directors have all died. The
manager does a final upsert of the scorecard, provides the return status for the service call
and then it politely dies.

Inputs

The rest API service will call the manager with a scorecard document ID.

Type

The type (what kind of builders are needed) is determined by the manager which will queue the appropriate
type of director. It knows what type of director by the app name (which is in each scorecard block).
When we get multiple types of apps we will need different tpyes of directors.

data set

The scorecard is an unmarshalled JSON structure ....

Result set

The result set is a part of the scorecard structure ...

{
    "SCORECARD": {
      "dateRange": "01/30/2023 20:00 - 03/01/2023 18:00",
      "id": "SC:anonymous--1row-at-202303030114:0:01/30/2023_20_00_-_03/01/2023_18_00",
      "name": "anonymous--1row-at-202303030114",
      "plotParams": {..},
      "processedAt": 0,
      **"results": {...}**
    }

It can be reached like

SCORECARD.results

and subsets of data may be reached like

SCORECARD.results.`rows`.Row0.data.`All HRRR domain

DIRECTOR

VxDataProcessor mysql_director

The Director has the following responsibilities...

Recieve an app URL and a pointer to an sc_row (which is a map).
Query the app for the mysql query template.
Create a query from the template by substituting the necessary varaibles into the template
(these are embedded in the scorecard row).
Retrieve the input data.
Format the input data into the proper DerivedDataElement structures for the builders.
A derived DataElement has an InputData structure for a specific cell, and a pointer to the result
structure where the cell result value is to be placed.
For each data element create an inputData .
Fire off builders in go worker routines to process all the cell DerivedDataElement structures
1. the builder has to do these steps...
  1. Perform time matching on the input data
  2. Perform a statistic calculation (RMSE, BIAS, etc on the input data) and put it into DerivedDataElement.
  3. Compute the significance for the DerivedDataElement
  4. write the result value into the result structure. (value is a pointer)
Take the value from each builder and put it into the right part of the result structure.
(maybe we should just give the builder a pointer to the result location?)

Inputs

The manager starts a director in a go routine and gives it an sc_row structure
which has an app url, and a pointer to the row that the cell is in (which has the template
variables e.g. region, statistic, variable - they are the keys to the specific row)

struct sc_element {
    app_url string
    row_ptr *map
    result_ptr *int
}

Type

The type specifies what kind of builder is required for this data set

data set

The data set is a JSON structure ....

Result set

The result set is a JSON structure ...

Algorithm

The algorithm specifies what the calculation algorithm is that is to be applied to the data set to achieve the specified result.

Builder

A builder has to implement the ScorecardCellBuilder interface defined in iBuilder.
A builder has these responsibilities

Perform time matching on the input data
Perform a statistic calculation (RMSE, BIAS, etc on the input data) and put it into a DerivedDataElement
using one of the statistic routines from builder_stats package.
Compute the significance for the DerivedDataElement
write the result value into the result structure. (value is a pointer)
politely die and go away.

It isn't defined yet but the ScorecardCellBuilder interface will define a top level entry point that enables the
director to do all these steps in one go routine call.

1 reply

randytpierce Mar 7, 2023
Maintainer Author

The manager might not not wait for a document id on a queue. It might just be started by the service. We have to explore that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic structure of a builder, how can this work? #11

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Basic structure of a builder, how can this work? #11

randytpierce Feb 16, 2023 Maintainer

Replies: 1 comment · 1 reply

randytpierce Mar 6, 2023 Maintainer Author

VxDataProcessor Manager

Inputs

Type

data set

Result set

DIRECTOR

VxDataProcessor mysql_director

Inputs

Type

data set

Result set

Algorithm

Builder

randytpierce Mar 7, 2023 Maintainer Author

randytpierce
Feb 16, 2023
Maintainer

Replies: 1 comment 1 reply

randytpierce
Mar 6, 2023
Maintainer Author

randytpierce Mar 7, 2023
Maintainer Author