-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add covariate builder that uses cohorts to build (binary) features #96
Comments
@schuemie in this case, we will have four date combinations
Features may then be constructed based on
Plus - we may want to support observation_period in relation to f.cohort dates, or c.cohort dates. Do you anticipate the covariate builder to support all these scenarios? |
No, I was thinking of adhering to the current pattern used in FeatureExtraction, so allowing the user to specify 1 start and 1 end date relative to the index date (= start of the cohort of interest), and maybe an option to choose if the feature cohort start should be in the lookback window, or whether there needs to be overlap with the lookback window. |
I have code in the PLP skeletons - we do this for the simple models: https://github.com/OHDSI/StudyProtocolSandbox/blob/master/SkeletonPredictionStudy/R/CohortCovariateCode.R (it even ables counts rather than binary) in the recent studied I also included age interaction: https://github.com/ohdsi-studies/Covid19PredictionStudies/blob/master/CovidVulnerabilityIndex/R/CohortCovariateCode.R This is an example of running it: https://github.com/OHDSI/PredictionComparison/blob/andromeda/R/atriaModel.R |
I think I solved this by duplicating Modified Can be used like # cohortSetReference as the table used in CohortDiagnostics
# cohorts defined in atlas-demo.ohdsi.org
cohortSetReference <- tibble(
atlasId = c( 1776012, 1776013, 1776018 ) ,
atlasName =c( "Asthma", "Last condition", "First condition" ),
cohortId = c( 1776012, 1776013, 1776018 ),
name = c("Asthma", "Last condition", "First condition" )
)
# create temp table in server
overlapCohortsTable_name <- "tmp_table_cohorts_overlap"
overlapCohortsTable <- cohortSetReference %>% filter(cohortId %in% c("1776013", "1776018"))
DatabaseConnector::insertTable(
connection,
tableName = overlapCohortsTable_name,
data = as.data.frame(overlapCohortsTable),
dropTableIfExists = TRUE,
createTable = TRUE,
tempTable = FALSE,
oracleTempSchema = oracleTempSchema)
cohortOverlap <- createAnalysisDetails(
analysisId = 51,
sqlFileName = "CohortOverlap.sql",
parameters = list(
analysisId = 51,
analysisName = "Cohort overlap",
domainId = "Cohort overlap",
#domainTable = cohortTable,
#domainConceptId = "cohort_definition_id",
domain_start_date = "cohort_start_date",
domain_end_date = "cohort_end_date",
#
cohort_ids = "1776013, 1776018",
overlap_cohorts_table = overlapCohortsTable_name
),
includedCovariateConceptIds = c(),
addDescendantsToInclude = FALSE,
excludedCovariateConceptIds = c(),
addDescendantsToExclude = FALSE,
includedCovariateIds = c()
)
detTempCovSet <- createDetailedTemporalCovariateSettings(
analyses = list(
cohortOverlap
),
temporalStartDays = c(-365, 0, 365*5, 365*10, 365*20),
temporalEndDays = c( -1, 365*5, 365*10, 365*20, 365*50)
) I can write a more reproducible example if you need it. |
@jreps 's code has a very nice implementation that allows you to create different kinds of features of the same cohort, like binary, counts, etc. We should reuse that here. |
I'll start working on this, as I need it for a project. |
OK thanks for the head's up here @schuemie. Do you plan to use the implementation from PLP as you suggested in the earlier comment? I am just curious about the design at a high level. |
Here's my initial version: fddbdd6 Some of the thinking so far:
|
The default covariate builder is mostly based on the occurrence of concepts (or their ancestors), We have a covariate builder based on cohort attributes, but not one based on cohorts.
The builder could create binary features based on the occurrence of a user-defined set of cohorts in some user-specified time window,
The text was updated successfully, but these errors were encountered: