From ba3eca7a9ed47c139ce5b16fd50657270ed48e91 Mon Sep 17 00:00:00 2001
From: Michael Terry <michael.terry@childrens.harvard.edu>
Date: Tue, 11 Feb 2025 15:07:44 -0500
Subject: [PATCH] docs: expand explanation of how to define patient set

---
 README.md | 63 ++++++++++++++++++++++++++-----------------------------
 1 file changed, 30 insertions(+), 33 deletions(-)

diff --git a/README.md b/README.md
index 4b372a4..4a30336 100644
--- a/README.md
+++ b/README.md
@@ -29,21 +29,6 @@ cp example-config.js my/folder/config.js
 Then edit that config file and enter your settings. Read the comments in the
 file for further details about each option.
 
-## Migration to v2
-1. In v1 the config file could have any name. The path to it was given to the script
-   via `-c` parameter. In v2 that file **must** be called `config.js`, so start by renaming it.
-2. The config file is now `.js` instead of `.ts`. To switch:
-   - Remove type imports like `import { Config } from "../src/types"`
-   - Switch to CommonJS exports. For example, use `module.exports = { ... }` instead
-     of `export default { ... }`
-3. The example config file is now converted to JS so you can see the difference
-4. Pick (or create) a "volume" folder. The script will load config from there. It will
-   also write output files to it.
-5. Place/move your `config.js` file into that "volume" folder.
-6. That should be it. Run it with
-   - Direct: `npm start -- -p /path/to/volume/`
-   - Docker: `docker run -v /path/to/volume/:/app/volume/ smartonfhir/fhir-crawler`
-
 ## Usage
 1. Running it directly
   `cd` into the `fhir-crawler` folder and run:
@@ -57,24 +42,21 @@ file for further details about each option.
    docker run -it -v /path/to/volume/:/app/volume/ smartonfhir/fhir-crawler
    ```
 
-## Skipping the patient bulk-data export
-This script does two major things. First it downloads all the patients in a given group
-using Bulk Data Group export, then it download the specified resources associated with
-these patients using a standard FHIR API calls. In some cases people may need to re-run
-the crawler but skip the patient download part. To achieve this do the following:
-1. After successful export locate the patient files. There should be one or more files with
-   names like `1.Patient.ndson` at `/path/to/volume/output/`.
-2. Copy those patient files one level up outside of that `output` folder (because everything in `output`
-   will be deleted before every run)
-3. On the next run pass the patient file names as `--patients` argument. Example:
-   ```
-   npm start -- -p /path/to/volume/ --patients 1.Patient.ndson --patients 2.Patient.ndson
-   ```
-   You can do the same using Docker:
-   ```
-   docker run -it -v /path/to/volume/:/app/volume/ smartonfhir/fhir-crawler --patients 1.Patient.ndjson
-   ```
-
+## Defining which patients to crawl
+This script does two major things.
+First it gathers a list of patients to operate on,
+then it downloads all configured resources of each of those patients, one by one,
+using standard FHIR API calls.
+
+There are three ways to define the list of patients to crawl:
+1. Set the `groupId` field in the configuration file.
+The crawler will perform a bulk export of that group's patients and then crawl all of them.
+2. Pass `--patients` pointing at a file with a list of EHR patient IDs, one per line.
+For example, `--patients list.txt`.
+3. Pass `--patients` pointing at an NDJSON file of Patient FHIR resources.
+You may have one from a previous run of the crawler or a separate bulk export operation.
+You can provide this argument multiple times, if your patients are split across files.
+For example, `--patients 1.Patient.ndjson --patients 2.Patient.ndjson`.
 
 ## Logs
 The script will display some basic stats in the terminal, and will also generate 
@@ -85,6 +67,21 @@ two log files within the output folder (where the NDJSON files are downloaded):
   These logs have a predictable structure so the TSV format was chosen to make them
   easier to consume by both humans and spreadsheet apps.
 
+## Migration to v2
+1. In v1 the config file could have any name. The path to it was given to the script
+   via `-c` parameter. In v2 that file **must** be called `config.js`, so start by renaming it.
+2. The config file is now `.js` instead of `.ts`. To switch:
+   - Remove type imports like `import { Config } from "../src/types"`
+   - Switch to CommonJS exports. For example, use `module.exports = { ... }` instead
+     of `export default { ... }`
+3. The example config file is now converted to JS so you can see the difference
+4. Pick (or create) a "volume" folder. The script will load config from there. It will
+   also write output files to it.
+5. Place/move your `config.js` file into that "volume" folder.
+6. That should be it. Run it with
+   - Direct: `npm start -- -p /path/to/volume/`
+   - Docker: `docker run -v /path/to/volume/:/app/volume/ smartonfhir/fhir-crawler`
+
 ## Contributing
 Contributions to the FHIR Crawler project are welcome and encouraged! If you find
 a bug or have a feature request, please open an issue on the project's GitHub page.