Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requireConceptIntersect running very slow in CPRD-GOLD #436

Open
martaalcalde opened this issue Jan 28, 2025 · 0 comments
Open

requireConceptIntersect running very slow in CPRD-GOLD #436

martaalcalde opened this issue Jan 28, 2025 · 0 comments

Comments

@martaalcalde
Copy link
Collaborator

martaalcalde commented Jan 28, 2025

I'm trying to do the following in CPRD-GOLD but it is running very slow (it has taken more than 24h):

library(IncidencePrevalence)
library(CohortConstructor)

cdm <- mockCohortConstructor()
#> Note: method with signature 'DBIConnection#Id' chosen for function 'dbExistsTable',
#>  target signature 'duckdb_connection#Id'.
#>  "duckdb_connection#ANY" would also be valid

cdm <- IncidencePrevalence::generateDenominatorCohortSet(
  cdm = cdm,
  name = "denominator",
  daysPriorObservation = 365)
#> ℹ Creating denominator cohorts
#> ! cohort columns will be reordered to match the expected order:
#>   cohort_definition_id, subject_id, cohort_start_date, and cohort_end_date.
#> ✔ Cohorts created in 0 min and 3 sec

dir <- file.path(tempdir(), "sql_folder")
dir.create(dir)
options("omopgenerics.log_sql_path" = dir)

cdm[["denominator"]] <- cdm[["denominator"]] |>
  requireConceptIntersect(
    conceptSet = list("concepts" = c(44022939L)),
    window = c(-Inf,-1),
    intersections = 0,
    inObservation = FALSE, 
    name = "denominator"
  )
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> ! tableName not found in cdm object.
#> 
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder
#> SQL query saved to C:\Users\martaa\AppData\Local\Temp\RtmpIPykv6/sql_folder

files<-list.files(dir, full.names = T)
for(i in seq_along(files)){
  print(paste0("### SQL: ", i))
  cat(readLines(files[i]),
      sep = "\n")
}
#> [1] "### SQL: 1"
#> <SQL>
#> SELECT subject_id, index_date, id, "start"
#> FROM main.tmp_002_og_004_1738068293
#> WHERE ("start" <= -1.0)
#> [1] "### SQL: 2"
#> <SQL>
#> SELECT
#>   q01.*,
#>   datediff('day', cohort_start_date, id_pm) AS id_jy,
#>   datediff('day', cohort_start_date, id_mc) AS id_sz
#> FROM (
#>   SELECT
#>     LHS.*,
#>     observation_period_start_date AS id_pm,
#>     observation_period_end_date AS id_mc
#>   FROM (
#>     SELECT DISTINCT subject_id, cohort_start_date
#>     FROM main.tmp_002_og_012_1738068294
#>   ) LHS
#>   INNER JOIN main.observation_period
#>     ON (LHS.subject_id = observation_period.person_id)
#> ) q01
#> WHERE (cohort_start_date <= id_mc AND cohort_start_date >= id_pm)
#> [1] "### SQL: 3"
#> <SQL>
#> SELECT denominator.*
#> FROM main.denominator
#> INNER JOIN main.og_001_1738068292
#>   ON (
#>     denominator.cohort_definition_id = og_001_1738068292.cohort_definition_id AND
#>     denominator.subject_id = og_001_1738068292.subject_id AND
#>     denominator.cohort_start_date = og_001_1738068292.cohort_start_date AND
#>     denominator.cohort_end_date = og_001_1738068292.cohort_end_date
#>   )

Created on 2025-01-28 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant