Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

q_ref_target_valid: change denominator from fields to rows #37

Merged
merged 1 commit into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion cumulus_library_data_metrics/c_system_use/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ slicing and dicing that field by category and time (if relevant).
### Fields

- category (depending on resource)
- has_text (depending on field)
- year (depending on resource)
- status
- systems
Expand Down
8 changes: 3 additions & 5 deletions cumulus_library_data_metrics/q_ref_target_valid/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,18 @@ does that target actual exist in the Patient database?"

### Numerator (flagged rows)

Any source Reference field which has a `.reference`
Any source row which has at least one `.reference` field
that looks like `"TargetResource/xxx"` but the corresponding `xxx` resource
does not actually exist.

Note that this does not check spec-valid but more complex forms like
absolute URLs, contained resources, logical references, or display-only references.
All such forms are not included in the numerator or denominator.

### Denominator

All fields for the resource in question that have a populated reference to
the target resource.
All possible rows for the resource in question.

(e.g. `count(*) from condition where subject.reference like 'Patient/%'`)
(e.g. `count(*) from condition`)

### Debugging

Expand Down
21 changes: 0 additions & 21 deletions cumulus_library_data_metrics/q_ref_target_valid/denominator.jinja

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,38 @@ CREATE TABLE data_metrics__q_ref_target_valid_{{ src|lower }}_{{ dest|lower }} A
{% else %}

WITH
src_status AS {{ utils.extract_status(src) }}
src_status AS {{ utils.extract_status(src) }},

{%- if is_array %}
, flattened AS (
flattened AS (
SELECT id, t.row AS unnested_field
FROM {{ src }},
UNNEST({{ field }}) AS t (row)
)
),
{%- set src = 'flattened' %}
{%- set field = 'unnested_field' %}
{%- endif %}

grouped_bad_refs AS (
SELECT
src.id,
ARRAY_AGG(src.{{ field }}.reference) AS target
FROM {{ src }} AS src
LEFT JOIN {{ dest }} AS dest
ON SUBSTRING(src.{{ field }}.reference, LENGTH('{{ dest }}/') + 1) = dest.id
WHERE
src.{{ field }}.reference LIKE '{{ dest }}/%'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder that we're supposed to use regexp() instead of LIKE for performance reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes thank you - filed #38 and I'll do that as a separate thing.

AND dest.id IS NULL
GROUP BY src.id
)

SELECT
src.id,
src_status.status,
src.{{ field }} AS target
FROM {{ src }} AS src
{{ utils.array_to_string('target') }} AS target
FROM grouped_bad_refs AS src
LEFT JOIN src_status
ON src.id = src_status.id
LEFT JOIN {{ dest }} AS dest
ON SUBSTRING(src.{{ field }}.reference, LENGTH('{{ dest }}/') + 1) = dest.id
WHERE
src.{{ field }}.reference LIKE '{{ dest }}/%' -- keep in sync with denominator query
AND dest.id IS NULL

{% endif %} -- closing initial "return empty table" check

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,7 @@ class TargetValidBuilder(MetricMixin, BaseTableBuilder):
def make_table(self, **kwargs) -> None:
"""Make a single metric table"""
summary_key = f"{kwargs['src'].lower()}_{kwargs['dest'].lower()}"

# Only count filled-in references, not every row
self.summary_entries[summary_key] = self.render_sql("denominator", **kwargs)
self.summary_entries[summary_key] = None

self.queries.append(self.render_sql(self.name, **kwargs))

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{"id": "valid", "subject": {"reference": "Patient/A"}, "context": {"encounter": [{"reference": "Encounter/A"}]}}
{"id": "bad-encounter", "subject": {"reference": "Patient/A"}, "context": {"encounter": [{"reference": "Encounter/Nope"}]}}
{"id": "bad-second-encounter-in-list", "subject": {"reference": "Patient/A"}, "context": {"encounter": [{"reference": "Encounter/A"}, {"reference": "Encounter/Nope"}]}}
{"id": "bad-multiple-encounters", "status": "current", "subject": {"reference": "Patient/A"}, "context": {"encounter": [{"reference": "Encounter/Nope2"}, {"reference": "Encounter/Nope"}]}}
{"id": "missing-encounter-field", "subject": {"reference": "Patient/A"}, "context": {"facilityType": {"text": "A nice one"}}}
{"id": "missing-context-field", "subject": {"reference": "Patient/A"}}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
id,status,target
bad-second-encounter-in-list,,{'reference': Encounter/Nope}
bad-encounter,,{'reference': Encounter/Nope}
bad-second-encounter-in-list,,Encounter/Nope
bad-multiple-encounters,current,Encounter/Nope; Encounter/Nope2
bad-encounter,,Encounter/Nope
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
id,status,target
bad-encounter,,"{'id': NULL, 'display': NULL, 'reference': Encounter/Nope, 'type': NULL}"
bad-encounter,,Encounter/Nope
8 changes: 4 additions & 4 deletions tests/data/q_ref_target_valid/general/expected_summary.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
id,numerator,denominator,percentage
# Various edge cases here in procedure
procedure_patient,1,6,16.67
procedure_encounter,1,6,16.67
procedure_patient,1,9,11.11
procedure_encounter,1,9,11.11
# Rest are just short happy-path checks to confirm that we look at the right json field
observation_patient,0,1,0.00
observation_encounter,0,1,0.00
Expand All @@ -11,8 +11,8 @@ immunization_patient,0,1,0.00
immunization_encounter,0,1,0.00
encounter_patient,0,1,0.00
# Except DocRefs also have some extra cases around encounter array support
documentreference_patient,0,5,0.00
documentreference_encounter,2,4,50.00
documentreference_patient,0,6,0.00
documentreference_encounter,3,6,50.00
diagnosticreport_patient,0,1,0.00
diagnosticreport_encounter,0,1,0.00
device_patient,0,1,0.00
Expand Down