Add function for exporting to GUI format #1

daynefiler · 2022-02-11T15:25:47Z

The difficult aspect of doing this -- and the reason for not implementing in v1.0 -- is the transformation functions. The simple fix would be to NOT include transformation functions (i.e. make them all linear) and apply any transformations to input prior to exporting; this breaks down when the same input is included in different slices that may have different transformation functions. While we would advise against duplicating inputs across the model (particularly using different transformation functions), and throw a warning to the user when duplicating inputs, this particular situation is present within the format_C.csv example file. The "simple" fix, therefore would break unless we detected this situation and created new columns with alternately transformed inputs. Off the top of my head, I cannot see an easy way to map the used transformation functions (if possible) to the GUI options. And what would we do if users supplied function(s) not available in the GUI? Thoughts?

davidmreif · 2022-05-31T15:33:52Z

I agree that starting by NOT including transformation functions is the right move for a simple 'exportToGUI' function. R is really the better environment to handle transformations/scaling versus Java. I think that it may be possible to wrest custom user functions as GUI input, but that would be cumbersome.

From other discussions, the other issue that would need to be handled is the need for decimal weights to be rational fractions. Perhaps a warning could be added that decimal weights may be close approximations if input could be added?

daynefiler · 2022-07-11T22:10:05Z

Function merged to dev branch. See #7 for further work.

daynefiler · 2022-07-12T14:54:41Z

I merged the PR request (#6) prematurely. We can continue the work/discussion here.

daynefiler · 2022-07-12T15:00:07Z

Per @SkylarMarvel:

When looking at the data file output, there is an issue with the slice metric indicators being applied to multiple copies of the transformed metrics. I think each metric will need something appended to the name if it occurs multiple times, e.g. append "_1a" for slice 1 component 1 and "_3d" for slice 3 component 4. I think appending something like that should just be done for all metrics even if not repeated just for consistency. However, the main issue I see is with transformed values that are now negative - they will be ignored in the GUI and treated as missing. One workaround might be to push all values up by a constant so everything is positive - I don't think this will change the results because all scaling functions are linear at this point. Also, the names in the output seem to have been replaced with the row number. I've attached the original "format_C.csv" data file and the initial txpExportGui output ("txpModel.csv") and a modified version ("txpModel_modified.csv") that has the adjustments I mention above.

The reason the row names were used is because the txpExportGui function requires the id.var variable. We don’t store ID values until the result stage, so there’s not a way currently to grab the ID column from the import function and pass it to the export. The following works:

tst <- txpImportGui("format_C.csv")
txpExportGui(fileName = "test.csv", 
             input = tst$input, 
             model = tst$model, 
             id.var = "CASRN", 
             fills = tst$fills)

As for the duplicate metrics, I think Skylar’s proposal works. We could also just say any duplicate inputs trigger the export as slices option. Thoughts?
I need a refresher on how we decided to handle the negative values discrepancy between the package and GUI previously.

SkylarMarvel · 2022-07-13T13:27:39Z

Shifting negative values by adding a constant, c, to all values (to allow transformed negative values to work in the GUI) does not work when there are missing values in a slice that has multiple components/columns. This occurs because the slice scores are first computed as sum(x) before scaling to [0, 1], so a missing value would result in something like adding 2*c instead of 3*c. A possible solution for this is to process the transformations separately for each slice. If a slice has multiple components, negative values, and missing values, then those missing values are replaced with 0 before all values are shifted up. One caveat for this handling of transformed negative values in the case of a slice with multiple components and missing values, e.g. running "format_C" through the GUI directly or first through toxpiR txpImportGui() --> txpExportGui() and then the GUI - the slice/toxpi scores will match, but the bootstrapped confidence intervals will not. I anticipate this situation will rarely occur.

daynefiler · 2022-07-14T21:48:38Z

We should probably setup a call to discuss how the package will handle negative values. I had coerced the negatives to 0 on importing, but txpCalculateScores does not do any special handling of negative values. Therefore, if someone were to create a model from scratch in R with data containing negative values, then export it to the GUI, the GUI would give different results. We can simplify the import and export functions if we handle the negatives in the calculate function -- I think the more important question is should we?

daynefiler · 2022-07-14T21:50:20Z

I would make a sleep-deprived argument that because we scale the slice scores to 0-1, having negative values doesn't matter. Would it be better to update the GUI behavior?

daynefiler · 2022-07-14T21:55:22Z

Another argument for keeping negative values -- it's not overly intuitive they would be removed. Especially when doing a standard normal transformation. I imagine users could be very surprised learning half their data is being thrown out of the model.

davidmreif · 2022-07-15T01:38:15Z

Could we chat Friday at 10am? I can setup a Zoom.

On Thu, Jul 14, 2022 at 5:55 PM Dayne Filer ***@***.***> wrote: Another argument for keeping negative values -- it's not overly intuitive they would be removed. Especially when doing a standard normal transformation. I imagine users could be very surprised learning half their data is being thrown out of the model. — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKDS2LSFFLMKVA7AK4TY4OLVUCEFLANCNFSM5OESQLVQ> . You are receiving this because you were assigned.Message ID: ***@***.***>

-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *David M. Reif, PhD* Professor | Department of Biological Sciences (BioSci <https://bio.sciences.ncsu.edu/>) Director | Bioinformatics Consulting and Service Core (BCSC <https://brc.ncsu.edu/consulting>) Director | DMAC, Center for Environmental and Health Effects of PFAS ( Superfund <https://superfund.ncsu.edu/>) Co-Director | IHSFC, Center for Human Health and the Environment (CHHE <https://chhe.research.ncsu.edu/>) Bioinformatics Research Center (BRC <https://brc.ncsu.edu/>) North Carolina State University reif-lab.org toxpi.org [image: image.png] <http://reif-lab.org> [image: image.png] <https://toxpi.org> *(contact) ***@***.*** | 919-513-3812 *(mail)* Ricks Hall 344, Box 7566, 1 Lampe Drive, Raleigh NC 27695, USA *(pronouns)* he/him/his ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

davidmreif · 2022-07-15T14:09:40Z

Add handle.neg parameter (or flag) to function (data structure)...

Testing notes (to verify):

toxpiR creation of model with +/- values; check in GUI
toxpiR creation of model with +/- AND missing values; check in GUI
check distributions.csv file (incl neg/missing)

--> Be sure to annotate/justify departures from GUI (by design because different use cases) + document in vignette revisions.

daynefiler · 2022-07-17T21:18:03Z

I added an initial implementation of negative.value.handling (4adbd5e). I am almost certain I did this incorrectly. @SkylarMarvel Suppose (in the GUI) you have an input with negative values, and the user indicates a 'zscore' transformation -- what is the order of operations?

daynefiler · 2022-07-17T21:32:37Z

Shifting negative values by adding a constant, c, to all values (to allow transformed negative values to work in the GUI) does not work when there are missing values in a slice that has multiple components/columns. This occurs because the slice scores are first computed as sum(x) before scaling to [0, 1], so a missing value would result in something like adding 2c instead of 3c. A possible solution for this is to process the transformations separately for each slice. If a slice has multiple components, negative values, and missing values, then those missing values are replaced with 0 before all values are shifted up. One caveat for this handling of transformed negative values in the case of a slice with multiple components and missing values, e.g. running "format_C" through the GUI directly or first through toxpiR txpImportGui() --> txpExportGui() and then the GUI - the slice/toxpi scores will match, but the bootstrapped confidence intervals will not. I anticipate this situation will rarely occur.

This took me some time to wrap my head around. I thought shifting by a constant wouldn't matter, but it changes in the magnitude of the vector. So when I get confused about this in the future:

vec <- c(-100, 0, 10, 15, 40)
norm(vec, type = "2")
# [1] 109.2016
norm(vec + 100, type = "2")
# [1] 234.3608

SkylarMarvel · 2022-07-18T01:01:41Z

@daynefiler In the GUI, negative values are ignored during the slice score computation. I've copied a code segment below where the score summation occurs. Slice scores are normalized to [0, 1] after this step.

scores = new double[pieSlices.size()][chemicals.size()];
for (int i=0; i<pieSlices.size(); i++) {
  pieSlice = pieSlices.get(i);
  for (int j=0; j<chemicals.size(); j++) {
    chemical = chemicals.get(j);
    score = Double.NaN;
    for (Metric metric : pieSlice.getMetrics()) {
      value = toxData.getData(chemical, metric);
      if (!Double.isNaN(value) && value>=0d) {
        scaledValue = Utility.scale(value, metricStats.get(metric), pieSlice.getScaling());
        if (!Double.isNaN(scaledValue)) {
          score = Double.isNaN(score) ? scaledValue : score+scaledValue;
        }
      }
    }
    scores[i][j] = score;
  }
}

davidmreif · 2022-07-19T15:37:12Z

It looks like the checks in /dev/tests (https://github.com/ToxPi/toxpiR/tree/dev/tests) cover the intended cases with some wacky test data, and we can document the corner issues. I wanted to 3x check before closing this issue, though.

daynefiler added the enhancement New feature or request label Feb 11, 2022

daynefiler assigned daynefiler, davidmreif, dtlloyd96, pthunga and SkylarMarvel Feb 11, 2022

daynefiler mentioned this issue Jul 11, 2022

issue-01: added txpExportGui #6

Merged

daynefiler closed this as completed Jul 11, 2022

daynefiler reopened this Jul 12, 2022

davidmreif closed this as completed Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function for exporting to GUI format #1

Add function for exporting to GUI format #1

daynefiler commented Feb 11, 2022

davidmreif commented May 31, 2022

daynefiler commented Jul 11, 2022

daynefiler commented Jul 12, 2022

daynefiler commented Jul 12, 2022

SkylarMarvel commented Jul 13, 2022 •

edited

Loading

daynefiler commented Jul 14, 2022

daynefiler commented Jul 14, 2022

daynefiler commented Jul 14, 2022

davidmreif commented Jul 15, 2022 via email

davidmreif commented Jul 15, 2022

daynefiler commented Jul 17, 2022

daynefiler commented Jul 17, 2022

SkylarMarvel commented Jul 18, 2022 •

edited

Loading

davidmreif commented Jul 19, 2022

Add function for exporting to GUI format #1

Add function for exporting to GUI format #1

Comments

daynefiler commented Feb 11, 2022

davidmreif commented May 31, 2022

daynefiler commented Jul 11, 2022

daynefiler commented Jul 12, 2022

daynefiler commented Jul 12, 2022

SkylarMarvel commented Jul 13, 2022 • edited Loading

daynefiler commented Jul 14, 2022

daynefiler commented Jul 14, 2022

daynefiler commented Jul 14, 2022

davidmreif commented Jul 15, 2022 via email

davidmreif commented Jul 15, 2022

daynefiler commented Jul 17, 2022

daynefiler commented Jul 17, 2022

SkylarMarvel commented Jul 18, 2022 • edited Loading

davidmreif commented Jul 19, 2022

SkylarMarvel commented Jul 13, 2022 •

edited

Loading

SkylarMarvel commented Jul 18, 2022 •

edited

Loading