-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmultiple-phi.rmd
executable file
·1200 lines (1011 loc) · 113 KB
/
multiple-phi.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Combining chains of Bayesian models with Markov melding"
runtitle: "Chained Markov melding"
author:
- name: "Andrew A."
surname: "Manderson"
email: "andrew.manderson@mrc-bsu.cam.ac.uk"
label: e1
addressLabel: A
sepNext: and
- name: "Robert J. B."
surname: "Goudie"
email: "robert.goudie@mrc-bsu.cam.ac.uk"
label: e2
addressLabel: B
affiliation:
- label: A
name: MRC Biostatistics Unit, University of Cambridge, United Kingdom, and The Alan Turing Institute
authorsLabels: e1
- label: B
name: MRC Biostatistics Unit, University of Cambridge, United Kingdom
authorsLabels: e2
year: "`r format(Sys.time(), '%Y')`"
abstract: |
A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data sets.
It may be easier to instead specify distinct submodels for each source of data, then join the submodels together.
We consider chains of submodels, where submodels directly relate to their neighbours via common quantities which may be parameters or deterministic functions thereof.
We propose _chained Markov melding_, an extension of Markov melding, a generic method to combine chains of submodels into a joint model.
One challenge we address is appropriately capturing the prior dependence between common quantities within a submodel, whilst also reconciling differences in priors for the same common quantity between two adjacent submodels.
Estimating the posterior of the resulting overall joint model is also challenging, so we describe a sampler that uses the chain structure to incorporate information contained in the submodels in multiple stages, possibly in parallel.
We demonstrate our methodology using two examples.
The first example considers an ecological integrated population model, where multiple data sets are required to accurately estimate population immigration and reproduction rates.
We also consider a joint longitudinal and time-to-event model with uncertain, submodel-derived event times.
Chained Markov melding is a conceptually appealing approach to integrating submodels in these settings.
keywords:
- Combining models
- Markov melding
- Bayesian graphical models
- Multi-stage estimation
- Model/data integration
- Integrated population model
bibliography: bibliography/multi-phi-bib.bib
biblio-style: ba ## alternative: imsart-number
output:
rticles::ba_article:
includes:
in_header:
tex-input/pre.tex
keep_tex: true
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
comment = NA,
fig.align = "center"
)
```
# Introduction
The Bayesian philosophy is appealing in part because the posterior distribution quantifies all sources of uncertainty.
However, a joint model for all data and parameters is a prerequisite to posterior inference, and in situations where multiple, heterogeneous sources of data are available, specifying such a joint model is a formidable task.
Models that consider such data are necessary to describe complex phenomena at a useful precision.
One possible approach begins by specifying individual submodels for each source of data.
These submodels could guide the statistician when directly specifying the joint model, but to use the submodels only informally seems wasteful.
Instead, it may be preferable to construct a joint model by formally joining the individual submodels together.
Some specific forms of combining data are well established.
Meta-analyses and evidence synthesis methods are widely used to summarise data, often using hierarchical models [@ades_multiparameter_2006; @presanis_synthesising_2014].
Outside of the statistical literature, a common name for combining multiple data is _data fusion_ [@lahat_multimodal_2015; @kedem_statistical_2017], though there are many distinct methods that fall under this general name.
Interest in integrating data is not just methodological; applied researchers often collect multiple disparate data sets, or data of different modalities, and wish to combine them.
For example, to estimate SARS-CoV-2 positivity @donnat_bayesian_2020 build an intricate hierarchical model that integrates both testing data and self-reported questionnaire data, and @parsons_bayesian_2021 specify a hierarchical model of similar complexity to estimate the number of injecting drug users in Ukraine.
Both applications specify Bayesian models with data-specific components, which are united in a hierarchical manner.
In conservation ecology, _integrated population models_ (IPMs) [@besbeas_integrating_2002; @brooks_bayesian_2004; @schaub_integrated_2011; @maunder_review_2013; @zipkin_synthesizing_2018] are used to estimate population dynamics, e.g. reproduction and immigration rates, using multiple data on the same population.
Such data have standard models associated with them, such as the Cormack-Jolly-Seber model [@lebreton_modeling_1992] for capture-recapture data, and the IPM serves as the framework in which the standard models are combined.
More generally, the applications we list illustrate the importance of generic, flexible methods for combining data to applied researchers.
_Markov melding_ [@goudie_joining_2019] is a general statistical methodology for combining submodels.
Specifically, it considers $\Nm$ submodels that share some common quantity $\phi$, with each of the $\modelindex = 1, \ldots, \Nm$ submodels possessing distinct parameters $\psi_{\modelindex}$, data $Y_{\modelindex}$, and form $\pd_{\modelindex}(\phi, \psi_{\modelindex}, Y_{\modelindex})$.
@goudie_joining_2019 then propose to combine the submodels into a joint model, denoted $\pd_{\text{meld}}(\phi, \psi_{1}, \ldots, \psi_{\Nm}, Y_{1}, \ldots, Y_{\Nm})$.
However, it is unclear how to integrate models where there is no single quantity $\phi$ common to all submodels, such as for submodels that are linked in a chain structure.
We propose an extension to Markov melding, which we call _chained Markov melding_[^chained], which facilitates the combination of $\Nm$ submodels that are in a chain structure.
For example, when $\Nm = 3$ we address the case in which submodel 1 and 2 share a common quantity $\phi_{1 \cap 2}$, and submodel 2 and 3 share a different quantity $\phi_{2 \cap 3}$.
Our extension addresses previously unconsidered complications including the distinct domains (and possibly supports) of the common quantities, and the desire to capture possible prior correlation between them.
Two examples serve to illustrate our methodology, which we introduce in the following section.
The computational effort required to fit a complex, multi-response model is a burden to the model development process.
We propose a multi-stage posterior estimation method that exploits the properties of our chained melded model to reduce this burden.
We can parallelise aspects of the computation across the submodels, using less computationally expensive techniques for some submodels.
Reusing existing software implementations of submodels, and subposterior samples where available, is also possible.
Multi-stage samplers can aid in understanding the contribution of each submodel to the final posterior, and are used in many applied settings, including hierarchical modelling [@lunn:etal:13] and joint models [@mauff_joint_2020].
One contribution of our work is to clarify the informal process commonly used in applied analyses of summarising and/or approximating submodels for use in subsequent analyses.
The two most common approximation strategies seem to be (i) approximating the subposterior of the common quantity with a normal distribution for use in subsequent models [see, e.g. @jackson_when_2018; @nicholson_interoperability_2021] and (ii) taking only a point estimate of the subposterior, and treating it as a known value in further models.
These strategies may, but not always, produce acceptable approximations to the chained melded model.
Both the chained melded model and these approximation strategies are examples of 'multi-phase' and 'multi-source' inference [@meng_trio_2014], with the melding approach most comprehensively accounting for uncertainty.
[^chained]: _"Chained graphs"_ were considered by @lauritzen_chain_2002, however they are unrelated to our proposed model. We use "chained" to emphasise the nature of the relationships between submodels.
## Example introduction
In this section we provide a high-level overview of two applications that require integrating a chain of submodels, with more details in Sections \ref{an-integrated-population-model-for-little-owls-1} and \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times-1}.
Our first example decomposes a joint model into its constituent submodels and rejoins them.
This simple situation allows us to compare the output from the chained melding process to the complete joint model, and is meant to illustrate both the 'chain-of-submodels' notion and the mechanics of chained melding.
The second example is a realistic and complex setting in which the combining of submodels without chained Markov melding is nonobvious.
Our comparator is the common technique of summarising previously considered submodels with point estimates, and demonstrates the importance of fully accounting for uncertainty.
### An integrated population model for little owls
Integrated population models (IPMs) [@zipkin_synthesizing_2018] combine multiple data to estimate key quantities governing the dynamics of a specific population.
@schaub_local_2006 and @abadi_estimation_2010 used an IPM to estimate fecundity, immigration, and yearly survival rates for a population of little owls.
These authors collect and model three types of data, illustrated in Figure \ref{fig:owls-simple-dag}.
Capture-recapture data $Y_{1}$, and associated capture-recapture submodel $\pd_{1}(\phi_{1 \cap 2}, \psi_{1}, Y_{1})$, are acquired by capturing and tagging owls each year, and then counting the number of tagged individuals recaptured in subsequent years.
Population counts $Y_{2}$ are obtained by observing the number of occupied nesting sites, and are modelled in $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}, \psi_{2}, Y_{2})$.
Finally, nest-record data $Y_{3}$ counts both the number of reproductive successes and possible breading pairs, and is associated with a submodel for fecundity $\pd_{3}(\phi_{2 \cap 3}, \psi_{3}, Y_{3})$.
The population count model $\pd_{2}$ shares the parameter $\phi_{1 \cap 2}$ with the capture-recapture model $\pd_{1}$, and the parameter $\phi_{2 \cap 3}$ with the fecundity model $\pd_{3}$; each of the $\modelindex = 1, 2, 3$ submodels has distinct, submodel-specific parameters $\psi_{\modelindex}$.
No single source of data is sufficient to estimate all quantities of interest, so it is necessary to integrate the three submodels into a single joint model to produce acceptably precise estimates of fecundity and immigration rates.
We will show that the chained Markov melding framework developed in Section \ref{chained-model-specification} encapsulates the process of integrating these submodels, producing results that are concordant with the original joint IPM.
\input{tex-input/owls-example/0001-owls-simple-dag.tex}
### Survival analysis with time varying covariates and uncertain event times
Our second example considers the time to onset of respiratory failure (RF) amongst patients in intensive care units, and factors that influence the onset of RF.
A patient can be said to be experiencing RF if the ratio of the partial pressure of arterial blood oxygen (\paoii) to the faction of inspired oxygen (\fioii) is less than 300mmHg [@the_ards_definition_task_force_acute_2012], though this is not the only definition of RF.
Patients' \pfratio\ (P/F) ratios are typically measured only a few times a day.
The relative infrequency of P/F ratio data, when combined with the intrinsic variability in each individual's blood oxygen level, results in significant uncertainty in about the time of onset of RF.
Factors that influence the time to onset of RF are both longitudinal and time invariant.
Both types of data can be considered in _joint models_ [@rizopoulos_joint_2012], which are composed of two distinct submodels, one for each data type.
However, existing joint models are not able to incorporate the uncertainty surrounding the event time, which may result in overconfident and/or biased estimates of the parameters in the joint model.
Chained Markov melding offers a conceptually straightforward, Bayesian approach to incorporating uncertain event times into joint models.
Specifically, we consider the event time as a submodel-derived quantity from a hierarchical regression model akin to @lu_using_1993.
We call this submodel the _uncertain event time_ submodel and denote it $\pd_{1}(\phi_{1 \cap 2}, \psi_{1}, Y_{1})$, where $\phi_{1 \cap 2}$ incorporates the event time.
The survival submodel $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}, \psi_{2}, Y_{2})$ uses the event time within $\phi_{1 \cap 2}$, the common quantity, as the response.
We treat the longitudinal submodel, $\pd_{3}(\phi_{2 \cap 3}, \psi_{3}, Y_{3})$, separately from the survival submodel, as is common in two-stage joint modelling [@mauff_joint_2020], and denote the subject-specific parameters that also appear in the survival model as $\phi_{2 \cap 3}$.
Each of the $\modelindex = 1, 2, 3$ has submodel-specific data $Y_{\modelindex}$ and parameters $\psi_{\modelindex}$.
The high level submodel relationships are displayed as a DAG in Figure \ref{fig:surv-simple-dag}.
\input{tex-input/surv-example/0001-surv-simple-dag.tex}
It is in examples such as this one that we foresee the most use for chained Markov melding; a fully Bayesian approach is desired and the submodels are nontrivial in complexity, with no previously existing or obvious joint model.
## Markov melding
We now review Markov melding [@goudie_joining_2019] before detailing our proposed extension.
As noted in the introduction, Markov melding is a method for combining $\Nm$ submodels $\pd_{1}(\phi, \psi_{1}, Y_{1}), \ldots, \pd_{\Nm}(\phi, \psi_{\Nm}, Y_{\Nm})$ which share the same $\phi$.
When the submodel prior marginals $\pd_{\modelindex}(\phi)$ are identical, i.e. $\pd_{\modelindex}(\phi) = \pd(\phi)$ for all $\modelindex$, it is possible to combine the submodels using _Markov combination_ [@dawid_hyper_1993-1; @massa_combining_2010]
\input{tex-input/orig-markov-melding/0010-markov-combination.tex}\noindent
Markov combination is not immediately applicable when submodel prior marginals are distinct, so Goudie _et al._ define a _marginal replacement_ procedure, where individual submodel prior marginals are replaced with a common marginal $\pd_{\text{pool}}(\phi) = h(\pd_{1}(\phi), \ldots, \pd_{\Nm}(\phi))$ which is the result of a pooling function $h$ that appropriately summarises all prior marginals (the choice of which is described below).
The result of marginal replacement is
\input{tex-input/orig-markov-melding/0011-marginal-replacement.tex}\noindent
Goudie _et al._ show that $\pd_{\text{repl}, \modelindex}(\phi, \psi_{\modelindex}, Y_{\modelindex})$ minimises the Kullback–Leibler (KL) divergence between a distribution $\q(\phi, \psi_{\modelindex}, Y_{\modelindex})$ and $\pd_{\modelindex}(\phi, \psi_{\modelindex}, Y_{\modelindex})$ under the constraint that $\q(\phi) = \pd_{\text{pool}}(\phi)$, and that marginal replacement is valid when $\phi$ is a deterministic function of the other parameters in submodel $\modelindex$.
Markov melding joins the submodels via the Markov combination of the marginally replaced submodels
\input{tex-input/orig-markov-melding/0012-markov-melding.tex}
### Pooled prior
Goudie _et al._ proposed forming $\pd_{\text{pool}}(\phi)$ using linear or logarithmic prior pooling [@ohagan_uncertain_2006; @genest_characterization_1986]
\input{tex-input/orig-markov-melding/0020-orig-pooling.tex}\noindent
where $\lambda = (\lambda_{1}, \ldots, \lambda_{\Nm})$ are nonnegative weights, which are chosen subjectively to ensure $\pd_{\text{pool}}(\phi)$ appropriately represents prior knowledge about the common quantity.
Two special cases of pooling are of particular interest.
_Product of experts (PoE) pooling_ [@hinton_training_2002] is a special case of logarithmic pooling that occurs when $\lambda_{\modelindex} = 1$ for all $\modelindex$.
_Dictatorial pooling_ is a special case of either pooling method when $\lambda_{\modelindex'} = 1$ and, for all $\modelindex \neq \modelindex'$, $\lambda_{\modelindex} = 0$.
# Chained model specification
Consider $\modelindex = 1, \ldots, \Nm$ submodels each with data $Y_{\modelindex}$ and parameters $\theta_{\modelindex}$ denoted $\pd_{\modelindex}(\theta_{\modelindex}, Y_{\modelindex})$, with $\Nm \geq 3$.
We assume that the submodels are connected in a manner akin to a chain and so can be ordered such that only 'adjacent’ submodels in the chain have parameters in common.
Specifically we assume that submodels $\modelindex$ and $\modelindex + 1$ have some parameter $\phi_{\modelindex \cap \modelindex + 1}$ in common for $\modelindex = 1, \ldots, \Nm - 1$.
For notational convenience define $\phi_{1} = \phi_{1 \cap 2}, \phi_{\Nm} = \phi_{\Nm-1 \cap \Nm}$ and $\phi_{\modelindex} = (\phi_{\modelindex - 1 \cap \modelindex}, \, \phi_{\modelindex \cap \modelindex + 1})$ for $\modelindex = 2, \ldots, \Nm - 1$, so that $\phi_{\modelindex} \subseteq \theta_{\modelindex}$ denotes the parameters in model $\modelindex$ shared with another submodel.
The submodel-specific parameters of submodel $\modelindex$ are thus $\psi_\modelindex = \theta_\modelindex \setminus \phi_\modelindex$.
Define the vector of all common quantities $\boldsymbol{\phi} = \bigcup_{\modelindex = 1}^{\Nm} \phi_{\modelindex} = (\phi_{1 \cap 2}, \phi_{2 \cap 3}, \ldots, \phi_{\Nm - 1 \cap \Nm})$ so that all elements in $\boldsymbol{\phi}$ are unique.
Further denote by $\boldsymbol{\phi}_{-\modelindex}$ the subvector of $\boldsymbol{\phi}$ excluding the $\modelindex$\textsuperscript{th} element.
It will also be convenient to define $\boldsymbol{\psi} = (\psi_{1}, \ldots, \psi_{\Nm})$ and likewise $\boldsymbol{Y} = (Y_{1}, \ldots, Y_{\Nm})$.
Note that all components of $\boldsymbol{\phi}, \boldsymbol{\psi}$ and $\boldsymbol{Y}$ may themselves be multivariate.
Additionally, because $\phi_{\modelindex \cap \modelindex + 1}$ may be a deterministic function of either $\theta_{\modelindex}$ or $\theta_{\modelindex + 1}$ we refer to $\phi_{\modelindex \cap \modelindex + 1}$ as a common parameter or a common quantity as appropriate.
All submodels, and marginal and conditional distributions thereof, have density functions that are assumed to exist and integrate to one.
When considering conditional distributions we assume that the parameter being conditioned on has support in the relevant region.
We define the $\modelindex$\textsuperscript{th} _subposterior_ as $\pd_{\modelindex}(\phi_{\modelindex}, \psi_{\modelindex} \mid Y_{\modelindex})$.
## Extending marginal replacement
We now define the chained melded model by extending the marginal replacement procedure to submodels linked in a chain-like way.
The proposed chained marginal replacement operation modifies the submodels to enforce a common prior for $\boldsymbol{\phi}$.
This consistency allows us to employ Markov combination to unite the submodels.
Specifically, the $\modelindex$\textsuperscript{th} marginally replaced submodel is
\input{tex-input/multiple-phi/0001-chained-marginal-replacement.tex}\noindent
where $\pd_{\text{pool}}(\boldsymbol{\phi}) = g(\pd_{1}(\phi_{1}), \pd_{2}(\phi_{2}), \ldots, \pd_{\modelindex}(\phi_{\modelindex}))$ is a pooling function that appropriately summarises all submodel prior marginals.
The second equality in Equation \eqref{eqn:chained-marginal-replacement} is because of the conditional independence $(\psi_{\modelindex}, Y_{\modelindex} \indep \boldsymbol{\phi}_{-\modelindex}) \mid \phi_{\modelindex}$ that exists due to the chained relationship between submodels.
It is important to note that $\pd_{\text{repl}, \modelindex}(\boldsymbol{\phi}, \psi_{\modelindex}, Y_{\modelindex})$ is defined on a larger parameter space than $\pd_{\modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$, as it includes $\boldsymbol{\phi}_{-\modelindex}$.
Define $\pd_{\text{repl}, \modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex}) = \int \pd_{\text{repl}, \modelindex}(\boldsymbol{\phi}, \psi_{\modelindex}, Y_{\modelindex})\text{d}\boldsymbol{\phi}_{-\modelindex}$.
Each marginally replaced submodel, as defined in Equation \eqref{eqn:chained-marginal-replacement}, minimises the following KL divergence[^kl-proof]
\input{tex-input/multiple-phi/0003-chained-marginal-replacement-kl.tex}\noindent
where $\pd_{\text{pool}}(\phi_{\modelindex}) = \int \pd_{\text{pool}}(\boldsymbol{\phi})\text{d}\boldsymbol{\phi}_{-\modelindex}$.
We can thus interpret $\pd_{\text{repl}, \modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$ as a minimally modified $\pd_{\modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$ which admits $\pd_{\text{pool}}(\phi_{\modelindex})$ as a marginal.
Note that it is the combination of $\pd_{\text{repl}, \modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$ and $\pd_{\text{pool}}(\boldsymbol{\phi}_{-\modelindex} \mid \phi_{\modelindex})$ that uniquely determine \eqref{eqn:chained-marginal-replacement}.
We form the chained melded model by taking the Markov combination of the marginally replaced submodels
\input{tex-input/multiple-phi/0010-melded-model-cond.tex}
Rewriting \eqref{eqn:melded-model-full} in terms of $\phi_{\modelindex \cap \modelindex + 1}$ for $\modelindex = 1, \ldots, \Nm - 1$ yields
\input{tex-input/multiple-phi/0011-melded-model-general.tex}
Finally, we use _chained melded posterior_ $\pd_{\text{meld}}(\boldsymbol{\phi}, \boldsymbol{\psi} \mid \boldsymbol{Y}) \propto \pd_{\text{meld}}(\boldsymbol{\phi}, \boldsymbol{\psi}, \boldsymbol{Y})$ to refer to posterior of the chained melded model conditioned on all data.
[^kl-proof]: This is shown in Appendix B of the online supplement to @goudie_joining_2019.
## Pooled prior
Specifying \eqref{eqn:melded-model-full} requires a joint prior for $\boldsymbol{\phi}$.
As in Markov melding we form the joint prior by pooling the marginal priors, selecting a pooling function $g$ that appropriately represents prior knowledge about the common quantities.
We define $\pd_{\text{pool}}(\boldsymbol{\phi})$ as a generic function of all prior marginals
\input{tex-input/pooled-prior-discussion/0030-pooled-prior-general-def.tex}
because we do not always wish to assume independence between the components of $\boldsymbol{\phi}$.
Two special cases of Equation \eqref{eqn:pooled-prior-general-def-two} are noteworthy.
Firstly, if all components of $\boldsymbol{\phi}$ are independent, then we can form $\pd_{\text{pool}}(\boldsymbol{\phi})$ as the product of $\Nm - 1$ standard pooling functions $h_{\modelindex}$ defined in Section \ref{pooled-prior}
\input{tex-input/pooled-prior-discussion/0010-bad-alternative-one.tex}\noindent
A second case, in between complete dependence \eqref{eqn:pooled-prior-general-def-two} and independence \eqref{eqn:bad-alternative-one}, is that if $\pd_{\modelindex}(\phi_{\modelindex - 1 \cap \modelindex}, \phi_{\modelindex \cap \modelindex + 1}) = \pd_{\modelindex}(\phi_{\modelindex - 1 \cap \modelindex}) \pd_{\modelindex}(\phi_{\modelindex \cap \modelindex + 1})$ then we can define
\input{tex-input/pooled-prior-discussion/0031-pooled-prior-split-def.tex}\noindent
without any additional assumptions.
That is, if any two consecutive components of $\boldsymbol{\phi}$ are independent in the submodel containing both of them, we can divide the pooled prior specification problem into two pooling functions.
The smaller number of arguments to $g_{1}$ and $g_{2}$ make it easier to choose appropriate forms for those functions.
Selecting a specific form of $g$ is not trivial given the many choices of functional form and pooling weights (the latter of which we discuss momentarily).
One complication is that standard linear and logarithmic pooling, as defined in Equations \eqref{eqn:orig-pooling-linear} and \eqref{eqn:orig-pooling-logarithmic}, are not immediately applicable when the submodel marginal distributions consider different quantities.
We now propose extensions to logarithmic, linear, and dictatorial pooling for use in the case of chained melding.
### Chained logarithmic pooling
Extending logarithmic pooling for chained Markov melding is straightforward.
We define the logarithmically pooled prior to be
\input{tex-input/multiple-phi/0050-pooled-prior-overall.tex}\noindent
with $K_{\text{log}}(\lambda) = \int \prod_{\modelindex = 1}^{\Nm} \pd_{\modelindex}(\phi_{\modelindex})^{\lambda_{\modelindex}} \text{d}\boldsymbol{\phi}$ for nonnegative weight vector $\lambda = (\lambda_{1}, \ldots, \lambda_{\Nm})$ and $\sum_{\modelindex = 1}^{\Nm} \lambda_{\modelindex} \geq 1$.
Note that \eqref{eqn:pooled-prior-overall} does not imply independence between the elements of $\boldsymbol{\phi}$ because
\input{tex-input/multiple-phi/0051-pooled-prior-log-alt-def.tex}\noindent
When $\lambda_{1} = \lambda_{2} = \ldots = \lambda_{\Nm} = 1$ we obtain a special case which we call product-of-experts (PoE) pooling [@hinton_training_2002].
### Chained linear pooling
Our generalisation of linear pooling to handle marginals of different quantities is a two step procedure.
The first step forms intermediary pooling densities via standard linear pooling, using appropriate marginals of the relevant quantity
\input{tex-input/multiple-phi/0080-M-model-linear-pooling.tex}\noindent
where $\lambda_{\modelindex} = (\lambda_{\modelindex, 1}, \lambda_{\modelindex, 2})$ are nonnegative pooling weights, and for $\modelindex = 2, \ldots, \Nm - 1$
\input{tex-input/multiple-phi/00801-M-model-linear-pooling-component-def.tex}\noindent
For $\modelindex = 1$ and $\modelindex = \Nm$ the relevant marginals are $\pd_{1}(\phi_{1 \cap 2})$ and $\pd_{\Nm}(\phi_{\Nm - 1 \cap \Nm})$.
In step two we form the pooled prior as the product of the intermediaries
\input{tex-input/multiple-phi/0055-silly-linear-overall.tex}\noindent
with $K_{\text{lin}}(\lambda) = \int \prod_{\modelindex = 1}^{\Nm - 1} \pd_{\text{pool}, \modelindex}(\phi_{\modelindex \cap \modelindex + 1}) \text{d}\boldsymbol{\phi}$, for $\lambda = (\lambda_{1}, \ldots, \lambda_{\Nm})$.
Clearly, this assumes prior independence amongst all components of $\boldsymbol{\phi}$ which may be undesirable, particularly if this independence was not present under one or more of the submodel priors.
We discuss extensions to linear pooling that enable prior dependence between the components of $\boldsymbol{\phi}$ in Section \ref{conclusion}.
### Dictatorial pooling
Chained Markov melding does not admit a direct analogue to dictatorial pooling as defined in Section \ref{pooled-prior} because not all submodel prior marginals contain all common quantities.
For example, consider the logarithmically pooled prior of Equation \eqref{eqn:pooled-prior-overall} with, say, the $\modelindex$\textsuperscript{th} entry in $\lambda$ set to $1$ and all others set to $0$.
This choice of $\lambda$ results in $\pd_{\text{pool}}(\boldsymbol{\phi}) = \pd(\phi_{\modelindex})$, which is flat for $\boldsymbol{\phi}_{-\modelindex}$.
It seems reasonable to require any generalisation of dictatorial pooling to result in a reasonable prior for all components in $\boldsymbol{\phi}$.
Such a generalisation should also retain the original intention of dictatorial pooling, i.e. '_the authoritative prior for_ $\phi_{\modelindex}$ _is_ $\pd_{\modelindex}(\phi_{\modelindex})$'.
We propose two possible forms of dictatorial pooling that satisfy the aforementioned criteria.
_Partial dictatorial pooling_ enforces a single submodel prior for the relevant components of $\boldsymbol{\phi}$, with no restrictions on the pooling of the remaining components; and _complete dictatorial pooling_ which requires selecting one of the two possible submodel priors for each component of $\boldsymbol{\phi}$.
Partial dictatorial pooling considers $\pd_{\modelindex}(\phi_{\modelindex})$ as the authoritative prior for $\phi_{\modelindex}= (\phi_{\modelindex - 1 \cap \modelindex}, \phi_{\modelindex \cap \modelindex + 1})$.
This results in,
\input{tex-input/pooled-prior-discussion/0040-dictatorial-one-submodel.tex}\noindent
where $g_{1}$ and $g_{2}$ are linear or logarithmic pooling functions as desired[^end-of-chain].
[^end-of-chain]: Some care is required if the authoritative submodel is $\pd_{\modelindex}$ for $\modelindex \in \{1, 2, M - 1, M\}$. If it is taken to be $\modelindex \in \{1, 2\}$, then $g_{1}$ does not exist, and additionally in the $\modelindex = 1$ case $\pd_{1}(\phi_{0 \cap 1}, \phi_{1 \cap 2}) \coloneqq \pd_{1}(\phi_{1 \cap 2})$. The $\modelindex \in \{\Nm - 1, \Nm\}$ cases have analogous definitions.
Complete dictatorial pooling requires the marginal pooled prior for each component in $\boldsymbol{\phi}$ to be chosen solely on the basis of only one of the two priors specified for it under the submodels.
For $\modelindex = 1, \ldots, \Nm - 1$, the $\modelindex$\textsuperscript{th} marginal of the pooled prior is either
\input{tex-input/pooled-prior-discussion/0043-dictatorial-def-three.tex}\noindent
If two consecutive marginals are chosen to have the same submodel prior, then we wish to retain the dependence between $\phi_{\modelindex - 1 \cap \modelindex}$ and $\phi_{\modelindex \cap \modelindex + 1}$ present in $\pd_{\modelindex}$.
We thus redefine consecutive terms so that
\input{tex-input/pooled-prior-discussion/0045-dictatorial-def-three-part-three.tex}\noindent
The complete dictatorially pooled prior is thus
\input{tex-input/pooled-prior-discussion/0047-complete-dict-joint-def.tex}\noindent
where, subject to the potential modification in Equation \eqref{eqn:dictatorial-def-three-part-three}, the terms in the product are as defined in Equation \eqref{eqn:dictatorial-def-three}.
For example, if $\Nm = 5$ and we wish to ignore $\pd_{2}$ and $\pd_{4}$ when constructing the pooled prior and instead associate $\phi_{1 \cap 2}$ with $\pd_{1}$, both $\phi_{2 \cap 3}$ and $\phi_{3 \cap 4}$ with $\pd_{3}$, and $\phi_{4 \cap 5}$ with $\pd_{5}$, then
\input{tex-input/pooled-prior-discussion/0046-dictatorial-def-two-example-one.tex}
### Pooling weights
Choosing values for the pooling weights is an important step in specifying the pooled prior [@carvalho_bayesian_2022; @abbas_kullback-leibler_2009; @rufo_log-linear_2012; @rufo_bayesian_2012].
Because appropriate values for the weights depend on the submodels being pooled and the information available _a priori_, universal recommendations are impossible, so we illustrate the impact of different choices in a straightforward example.
It is important that prior predictive visualisations of the pooled prior are produced [@gabry_visualization_2019; @gelman_bayesian_2020] to guide the choice of pooling weights and ensure that the result suitably represents the available information.
Figure \ref{fig:pooled_densities_plot} illustrates how $\lambda$ and the choice of pooling method impacts $\pd_{\text{pool}}(\boldsymbol{\phi})$ when pooling normal distributions.
Specifically, we consider $\Nm = 3$ submodels and pool
\input{tex-input/multiple-phi/0061-marginal-gaussian-example.tex}\noindent
where $\text{N}(\phi; \mu, \sigma^{2})$ is the normal density function with mean $\mu$ and variance $\sigma^{2}$ (or covariance matrix where appropriate).
The two dimensional density function $\pd_{2}$ has an additional parameter $\rho$, which controls the intra-submodel marginal correlation.
We set $\mu_{1} = -2.5, \mu_{2} = \left[\mu_{2, 1} \,\, \mu_{2, 2}\right]' = \left[0 \,\, 0\right]', \mu_{3} = 2.5, \sigma_{1}^{2} = \sigma_{2}^{2} = \sigma_{3}^{2} = 1$ and $\rho = 0.8$.
In the logarithmic case we set $\lambda_{1} = \lambda_{3}$ and parameterise $\lambda_{2} = 1 - 2\lambda_{1}$, so that $\lambda_{1} + \lambda_{2} + \lambda_{3} = 1$ whilst limiting ourselves to varying only $\lambda_{1}$.
Similarly, in the linear case we set $\lambda_{1, 1} = \lambda_{2, 2} = \lambda_{1}$ and $\lambda_{1, 2} = \lambda_{2, 1} = 1 - 2 \lambda_{1}$.
We consider 5 evenly spaced values of $\lambda_{1} \in [0, 0.5]$.
```{r pooled_densities_plot, out.width = '0.95\\linewidth', fig.cap = "Contour plots of $\\pd_{\\text{pool}}(\\boldsymbol{\\phi})$ (red) under logarithmic and linear pooling (left and right column respectively). The three original densities $\\pd_{1}(\\phi_{1 \\cap 2})$, $\\pd_{3}(\\phi_{2 \\cap 3})$ and $\\pd_{2}(\\phi_{1 \\cap 2}, \\phi_{2 \\cap 3})$ are shown in blue, with the univariate densities shown on the appropriate axis. The pooling weight parameter $\\lambda_{1}$ is indicated in the plot titles."}
knitr::include_graphics("plots/pooling-tests/version-two.pdf")
```
For both pooling methods, as the weight $\lambda_{1}$ associated with models $\pd_{1}$ and $\pd_{3}$ increases, the relative contributions of $\pd_{1}(\phi_{1 \cap 2})$ and $\pd_{3}(\phi_{2 \cap 3})$ increase.
Note the lack of correlation in $\pd_{\text{pool}}$ under linear pooling (right column of Figure \ref{fig:pooled_densities_plot}) due to Equation \eqref{eqn:silly-linear-overall}.
A large, near-flat plateau is visible in the $\lambda_{1} = 0.25$ and $\lambda_{1} = 0.375$ cases, which is a result of the mixture of four, 2-D normal distributions that linear pooling produces in this example.
The logarithmic pooling process produces a more concentrated prior for small values of $\lambda_{1}$, and does not result in _a priori_ independence between $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
Appendix \ref{log-pooling-gaussian-densities} shows analytically that $\lambda_{2}$ controls the quantity of correlation present in $\pd_{\text{pool}}$ in this setting.
# Posterior estimation
We now present a multi-stage MCMC method for generating samples from the melded posterior.
Whilst the melded posterior is a standard Bayesian posterior and so can, in principle, be targeted using any suitable Monte Carlo method, in practice this may be cumbersome or infeasible.
More specifically, it may be feasible to fit each submodel separately using standard methods, but when the submodels are combined -- either through Markov melding, or by expanding the definition of one submodel to include another -- the computation required to estimate the posterior in a single step poses an insurmountable barrier.
In such settings we can employ multi-stage posterior estimation methods including @tom:etal:10, @lunn:etal:13, @hooten_making_2019, and @mauff_joint_2020.
We propose a multi-stage strategy that uses the chain-like relationship to both avoid evaluating all submodels simultaneously, and parallelise the computation required in the first stage to produce posterior samples in less time than an equivalent sequential method[^sequential].
Avoiding concurrently evaluating all submodels also enables the reuse of existing software, minimising the need for custom submodel and/or sampler implementations.
We also describe an approximate method, where stage one submodels are summarised by normal distributions for use in stage two.
We consider the $\Nm = 3$ case, as this setting includes both of our examples.
Our approach can be extended to $\Nm > 3$ settings, although we anticipate that it is unlikely to be suitable for large $\Nm$ settings.
We discuss some of difficulties associated with generic, parallel methodology for efficient posterior sampling in Section \ref{conclusion}.
[^sequential]: For completeness, Appendix \ref{sequential-sampler} describes such a sequential MCMC sampler. We do not use the sequential sampler in this paper.
## Parallel sampler
Our proposed strategy involves obtaining in stage one samples from submodels 1 and 3 in parallel.
Stage two reuses these samples in a Metropolis-within-Gibbs sampler, which targets the full melded posterior.
The stage specific targets are displayed in Figure \ref{fig:parallel-dag}.
\input{tex-input/dc-sampler/0001-parallel-dag.tex}
The parallel sampler assumes that the pooled prior decomposes such that
\input{tex-input/dc-sampler/0002-parallel-decomposition.tex}\noindent
All pooled priors trivially satisfy \eqref{eqn:parallel-decomposition} by assuming $\pd_{\text{pool}, 1}(\phi_{1 \cap 2})$ and $\pd_{\text{pool}, 3}(\phi_{2 \cap 3})$ are improper and/or flat distributions.
Alternatively we may choose $\pd_{\text{pool}, 1}(\phi_{1 \cap 2}) = \pd_{1}(\phi_{1 \cap 2})$ and $\pd_{\text{pool}, 3}(\phi_{2 \cap 3}) = \pd_{3}(\phi_{2 \cap 3})$, with appropriate adjustments to $\pd_{\text{pool}, 2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.
This choice targets, in stage one, the subposteriors of $\pd_{1}$ and $\pd_{3}$ under their original prior for $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ respectively.
#### Stage one
Two independent, parallel sampling processes occur in stage one.
Terms from the melded model that pertain to $\pd_{1}$ and $\pd_{3}$ are isolated
\input{tex-input/dc-sampler/0021-stage-one-targets.tex}\noindent
and targeted using standard MCMC methodology.
Assuming that the stage one chains converge and after discarding warmup iterations --possibly thinning them, if within-chain correlation is high-- we obtain $N_{1}$ samples from $\{(\phi_{1 \cap 2}, \psi_{1})_{n}\}_{n = 1}^{N_{1}}$ from $\pd_{\text{meld}, 1}(\phi_{1 \cap 2}, \psi_{2} \mid Y_{1})$, and $N_{3}$ samples $\{(\phi_{2 \cap 3}, \psi_{3})_{n}\}_{n = 1}^{N_{3}}$ from $\pd_{\text{meld}, 3}(\phi_{2 \cap 3}, \psi_{3} \mid Y_{3})$.
For well mixing stage one Markov chains targeting the correct stationary distribution, and large values of $N_{1}$ or $N_{3}$, the stage one samples accurately approximate the subposteriors.
#### Stage two
Stage two targets the melded posterior of Equation \eqref{eqn:melded-model-full} using a Metropolis-within-Gibbs sampler, where the proposal distributions are
\input{tex-input/dc-sampler/0031-stage-two-proposals.tex}\noindent
where $\q(\psi_{2}^{*} \mid \psi_{2})$ is a generic proposal distribution for $\psi_{2}$.
We draw an index $n_{1}^{*}$ uniformly from $\{1, \ldots, N_{1}\}$ and use the corresponding value $(\phi_{1 \cap 2}^{*}, \psi_{1}^{*})_{n_{1}^{*}}$ as the proposal, doing likewise for $n_{3}^{*}$ and $(\phi_{2 \cap 3}^{*}, \psi_{3}^{*})_{n_{3}^{*}}$.
The acceptance probabilities for these updates are
\input{tex-input/dc-sampler/0032-stage-two-acceptance.tex}\noindent
where $\alpha(x, z)$ denotes the probability associated with a move from $z$ to $x$.
Note that all stage two acceptance probabilities only contain terms from the second submodel and the pooled prior, and thus do not depend on $\psi_{1}$ or $\psi_{3}$.
If a move is accepted then we also store the index, i.e. $n_{1}^{*}$ or $n_{3}^{*}$, associated with the move, otherwise we store the current value of the index.
The stored indices are used to appropriately resample $\psi_{1}$ and $\psi_{3}$ from the stage one samples.
## Normal approximations to submodel components
Normal approximations are commonly employed to summarise submodels for subsequent use in more complex models.
For example, two-stage meta-analyses often use a normal distribution centred on each studies' effect estimate [@burke_meta-analysis_2017].
Suppose we employ such an approximation to summarise the prior and posterior of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ under $\pd_{1}$ and $\pd_{3}$ respectively.
In addition, assume that (a) such approximations are appropriate for $\pd_{1}(\phi_{1 \cap 2}), \pd_{1}(\phi_{1 \cap 2} \mid Y_{1}), \pd_{3}(\phi_{2 \cap 3})$, and $\pd_{3}(\phi_{2 \cap 3} \mid Y_{3})$, (b) we are not interested in $\psi_{1}$ and $\psi_{3}$, and can integrate them out of all relevant densities, and (c) we employ our second form of dictatorial pooling and choose $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ as the authoritative prior.
The latter two assumptions imply that the melded posterior of interest is proportional to
\input{tex-input/multiple-normal-approximation/0010-normal-approx-melded-posterior-target.tex}
Denote the normal approximation of $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ as $\widehat{\pd}_{1}(\phi_{1 \cap 2} \mid \widehat{\mu}_{1}, \widehat{\Sigma}_{1})$, which is a normal distribution with mean $\widehat{\mu}_{1}$ and covariance matrix $\widehat{\Sigma}_{1}$.
The corresponding normal approximation of the prior $\pd_{1}(\phi_{1 \cap 2})$ is $\widehat{\pd}_{1}(\phi_{1 \cap 2} \mid \widehat{\mu}_{1, 0}, \widehat{\Sigma}_{1, 0})$.
The equivalent approximations for the subposterior and prior of $\pd_{3}$ are $\widehat{\pd}_{3}(\phi_{2 \cap 3} \mid \widehat{\mu}_{3}, \widehat{\Sigma}_{3})$ and $\widehat{\pd}_{3}(\phi_{2 \cap 3} \mid \widehat{\mu}_{3, 0}, \widehat{\Sigma}_{3, 0})$ respectively.
Substituting in the approximations and using standard results for Gaussian density functions (see @bromiley_products_2003 and Appendix \ref{normal-approximation-calculations}) results in
\input{tex-input/multiple-normal-approximation/0040-final-normal-approx.tex}\noindent
where
\input{tex-input/multiple-normal-approximation/0050-in-text-normal-approx.tex}\noindent
Standard MCMC methods can be used to sample from the approximate melded posterior.
If instead we opt for product-of-experts pooling, all $\widehat{\mu}_{\text{de}}$ and $\widehat{\Sigma}_{\text{de}}$ terms disappear from the parameter definitions in Equation \eqref{eqn:in-text-normal-approx}.
# An integrated population model for little owls
We now return to the integrated population model (IPM) for the little owls introduced in Section \ref{an-integrated-population-model-for-little-owls}.
@finke_efficient_2019 consider a number of variations on the original model of @schaub_local_2006 and @abadi_estimation_2010: here we consider only the variant from @finke_efficient_2019 with the highest marginal likelihood (Model 4 of their online supplement).
This example is particularly interesting to us as, for a certain choice of pooling function and pooling weights, the chained Markov melded model and the IPM are identical.
This coincidence allows us to use the posterior from the IPM as a benchmark for our multi-stage sampler.
Before we detail the specifics of each submodel, we must introduce some notation.
Data and parameters are stratified into two age-groups $a \in \{J, A\}$ where $J$ denotes juvenile owls (less than one year old) and $A$ adults, two sexes $s \in \{M, F\}$, and observations occur annually at times $t \in \{1, \ldots, T\}$, with $T = 25$.
The sex- and age-specific probability of an owl surviving from time $t$ to $t + 1$ is $\delta_{a, s, t}$, and the sex-specific probability of a previously captured owl being recaptured at time $t + 1$ is $\pi_{s, t + 1}$ so long as the owl is alive at time $t + 1$.
## Capture recapture: $\pd_{1}$
Capture-recapture data pertain to owls that are released at time $t$ (having been captured and tagged), and then recaptured at time $u = t + 1, \dots, T$, or not recaptured before the conclusion of the study, in which case $u = T + 1$.
Define $M_{a, s, t, u}$ as the number of owls of age-group $a$ and sex $s$ released at time $t$ and recaptured at time $u$.
We aggregate these observations into age- and sex-specific matrices $\boldsymbol{M}_{a, s}$, with $T$ rows, corresponding to release times, and $T + 1$ columns, corresponding to recapture times.
Let $R_{a, s, t} = \sum_{u = 1}^{T + 1} M_{a, s, t, u}$ be the number of owls released at time $t$, i.e. a vector containing the row-wise sum of the entries in $\boldsymbol{M}_{a, s}$.
The recapture times for owls released at time $t$ follow an age- and sex-specific multinomial likelihood
\input{tex-input/owls-example/0010-capture-recapture-submodel.tex}\noindent
with probabilities $\boldsymbol{Q}_{a, s, t} = (Q_{a, s, t, 1}, \ldots, Q_{a, s, t, T + 1})$ such that
\input{tex-input/owls-example/0011-multinomial-probabilities.tex}
## Count data model: $\pd_{2}$
To estimate population abundance, a two level model is used: the first level models the observed (counted) number of females at each point in time denoted $y_{t}$, with a second, latent process modelling the total number of females in population.
The observation model is
\input{tex-input/owls-example/0021-observation-process.tex}\noindent
where we denote the number of juvenile and adult females in the population at time $t$ as $x_{J, t}$ and $x_{A, t}$ respectively, with $x_{t} = x_{J, t} + x_{A, t}$.
If $\text{sur}_{t}$ adult females survive from time $t - 1$ to time $t$, and $\text{imm}_{t}$ adult females immigrate over the same time period, then the latent, population level model is
\input{tex-input/owls-example/0020-count-data-submodel.tex}\noindent
where $\eta_{t}$ is the immigration rate.
The initial population sizes $x_{J, 1}$ and $x_{A, 1}$ have independent discrete uniform priors on $\{0, 1, \ldots, 50\}$.
If $x_{t - 1} = 0$ then we assume that the Poisson and binomial distributions become point masses at zero.
## Fecundity: $\pd_{3}$
The fecundity submodel considers the number of breeding females at time $t$ denoted $N_{t}$, and the number of chicks produced that survive and leave the nest denoted $n_{t}$.
A Poisson model is employed to estimate fecundity (reproductive rate) $\rho$
\input{tex-input/owls-example/0030-fecundity-submodel.tex}
## Parameterisation and melding quantities
@abadi_estimation_2010 parameterise the time dependent quantities via linear predictors to minimise the number of parameters in the submodels.
The specific parameterisation of @finke_efficient_2019 we employ is
\input{tex-input/owls-example/0040-parameterisation-info.tex}\noindent
thus the quantities in common between the submodels are $\phi_{1 \cap 2} = (\alpha_{0}, \alpha_{2})$ and $\phi_{2 \cap 3} = \rho$.
To align the notation of this example with our chained melding notation we define, for all permitted values of $a, s$ and $t$, $Y_{1} = (\boldsymbol{M}_{a, s})$, $\psi_{1} = \left(\alpha_{1}, \alpha_{4}, (\alpha_{5, u})_{u = 2}^{T}\right)$; $Y_{2} = (y_{t})$, $\psi_{2} = (x_{J, t}, \alpha_{6}, \text{sur}_{t}, \text{imm}_{t})$; and $Y_{3} = (N_{t}, n_{t})$, $\psi_{3} = \varnothing$.
Note that the definition of $\phi_{1 \cap 2}$ does not include $\alpha_{1}$ as it is male specific and does not exist in $\pd_{2}$.
The model variant of @finke_efficient_2019 we consider does not include $\alpha_{3}$, and for comparability we keep the other parameter indices the same.
## Priors
We use the priors of @finke_efficient_2019 for the parameters in each submodel.
Denote $\boldsymbol{\alpha} = (\alpha_{0}, \alpha_{1}, \alpha_{2}, \alpha_{4}, \alpha_{6})$.
In both $\pd_{1}$ and $\pd_{2}$ the elements of $\boldsymbol{\alpha}$ are assigned independent $\text{Normal}(0, 2^2)$ priors truncated to $[-10, 10]$.
The time varying recapture probabilities $\alpha_{5, u}$ also have $\text{Normal}(0, 2^2)$ priors truncated to $[-10, 10]$.
A $\text{Uniform}(0, 10)$ prior is assigned to $\rho$ in $\pd_{2}$ and $\pd_{3}$.
To completely specify $\pd_{\text{meld}}$ we must choose how to form $\pd_\text{pool}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.
We form $\pd_\text{pool}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ using three different pooling methods and estimate the melded posterior in each case.
The first pooling method is product-of-experts (PoE) pooling, which is logarithmic pooling with $\lambda = (1, 1, 1)$, and we denote the melded posterior as $\pd_{\text{meld, PoE}}$.
We also use logarithmic pooling with $\lambda = (\frac{1}{2}, \frac{1}{2}, \frac{1}{2})$, which is denoted $\pd_{\text{meld, log}}$ and results in the chained melded model being identical to the IPM.
The final pooling method is linear pooling with $\lambda = (\frac{1}{2}, \frac{1}{2}, \frac{1}{2}, \frac{1}{2})$, denoted $\pd_{\text{meld, lin}}$.
## Posterior estimation
We estimate the melded posterior -- $\pd_{\text{meld}}(\boldsymbol{\phi}, \boldsymbol{\psi} \mid \boldsymbol{Y})$, proportional to Equation \eqref{eqn:melded-model-full} -- using both the parallel sampler (Section \ref{parallel-sampler}) and normal approximation (Section \ref{normal-approximations-to-submodel-components}).
This allows us to use pre-existing implementations of the submodels.
Specifically, the capture-recapture submodel is written in BUGS [@lunn_bugs_2009] and sampled via `rjags` [@plummer_rjags_2019].
The fecundity submodel is written in Stan [@carpenter_stan_2017] and sampled via `rstan` [@stan_development_team_rstan_2021].
The count data submodel is also written in `BUGS`, and we reuse this implementation in stage two of the multi-stage sampler via `NIMBLE` [@de_valpine_programming_2017] and its `R` interface [@nimble_development_team_nimble_2019].
The approximate melded posterior obtained by Section \ref{normal-approximations-to-submodel-components} is sampled using `rjags`.
Code and data for this example, as well as trace plots and numerical convergence measures [@vehtari_rank-normalization_2020] for both stages of the parallel sampling process, are available in the accompanying online repository[^repo].
## Results
```{r phi_subpost, fig.cap = "Top row: credible intervals for $\\phi_{1 \\cap 2} = (\\alpha_{0}, \\alpha_{2})$ and $\\phi_{2 \\cap 3} = \\rho$ from the posterior of the original integrated population model $\\pd_{\\text{ipm}}$, and the individual subposteriors from submodels $\\pd_{1}, \\pd_{2}$, and $\\pd_{3}$. Bottom row: credible intervals for the same quantities, but with a different x-axis scale, from the original IPM (repeated from top row); the chained melded posteriors using product-of-experts pooling, logarithmic pooling, and linear pooling denoted $\\pd_{\\text{meld}}$, $\\pd_{\\text{meld, log}}$ and $\\pd_{\\text{meld, lin}}$; and the melded posterior using the normal approximation $\\widehat{\\pd}_{\\text{meld}}$. Intervals are 50\\%, 80\\%, 95\\%, and 99\\% wide."}
knitr::include_graphics("plots/owls-example/subposteriors-both-patchwork.pdf")
```
We empirically validate our methodology and sampler by comparing the melded posterior samples to a large sample -- 6 chains, each containing $1 \times 10^5$ post-warmup iterations -- from the original IPM posterior.
Similarity in the posteriors is expected as the IPM is effectively the joint model we wish to approximate with the chained melded model. It is simply fortunate, from a modelling standpoint, that this example's joint model is easy to construct and computationally feasible with standard tools.
Note that under logarithmic pooling with $\lambda = (\frac{1}{2}, \frac{1}{2}, \frac{1}{2})$ the melded posterior is identical to the original IPM, so any differences between the two posteriors are attributable to the multi-stage sampler.
Figure \ref{fig:phi_subpost} depicts the posterior credible intervals [@gabry_bayesplot_2021; @kay_tidybayes_2020] for the common quantities from the individual submodels, the melded models, and the original IPM.
The top row in Figure \ref{fig:phi_subpost} indicates that the count data alone ($\pd_{2}$) contain minimal information about $\alpha_{0}, \alpha_{2}$ and $\rho$; incorporating the data from the other submodels is essential for precise estimates.
The multi-stage sampler works well by producing melded posterior estimates generally similar to the original IPM estimate, and are near identical for logarithmic pooling.
PoE pooling produces the posterior most different from the original IPM, as it yields a prior for $(\alpha_{0}, \alpha_{2})$ that is more concentrated around zero than the other pooling methods.
The lack of large differences between the melded posteriors that use different pooled priors indicates that the prior has almost no effect on the posterior.
The similarity of the approximate approach ($\widehat{\pd}_{\text{meld}}$ - bottom row of Figure \ref{fig:phi_subpost}) to the melding approaches suggests that the normal approximations are good summaries of the subposteriors, and that the approximate melding procedure of Section \ref{normal-approximations-to-submodel-components} is suitable for this example.
# Survival analysis with time varying covariates and uncertain event times
We return now to the respiratory failure example introduced in Section \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times}.
Our intention is to illustrate the application of chained Markov melding to an example of realistic complexity, and explore empirically the importance of accounting for all sources of uncertainty by comparing chained Markov melding to equivalent analyses which use only a point estimate summary of the uncertainty.
Specifically, event times and indicators are a noninvertible function of other parameters in the first submodel, and are an uncertain response in the survival submodel.
Chained Markov melding enables us to specify a suitable joint model despite these complications.
There are $i = 1, \ldots, N$ individuals in the data set.
Each individual is admitted to the ICU at time $0$, and is discharged or dies at time $C_{i}$.
See Appendix \ref{cohort-selection-criteria} for information on how the $N = 37$ individuals were selected from MIMIC-III [@johnson_mimic-iii_2016].
## P/F ratio submodel (B-spline): $\pd_{1}$
The first submodel fits a B-spline to the \pfratio\ data to calculate if and when an individual experiences respiratory failure.
Each individual has \pfratio\ ratio observations $z_{i, j}$ (in units of mmHg) at times $t_{i, j}$, with $j = 1, \ldots, J_{i}$.
For each individual denote the vector of observations $\boldsymbol{z}_{i} = (z_{i, 1}, \ldots, z_{i, J_{i}})$ and observation times $\boldsymbol{t}_{i} = (t_{i, 1}, \ldots, t_{i, J_{i}})$.
To improve computational performance, we standardise the P/F ratio data for each individual such that $z_{i, j} = \frac{\tilde{z}_{i, j} - \overline{z}_{i}}{\hat{s}_{i}}$, where $\tilde{z}_{i, j}$ is the underlying unstandardised observation with mean $\overline{z}_{i}$ and standard deviation $\hat{s}_{i}$.
Similarly we rescale the threshold for respiratory failure: $\tau_{i} = \frac{300 - \overline{z}_{i}}{\hat{s}_{i}}$.
We choose to model the P/F ratio using cubic B-splines and 7 internal knots, and do not include an intercept column in the spline basis [for background on B-splines see: Chapter 2 in @hastie_generalized_1999; and the supplementary material of @wang_shape-restricted_2021].
The internal knots are evenly spaced between two additional boundary knots at $\min(\boldsymbol{t}_{i})$ and $\max(\boldsymbol{t}_{i})$.
These choices result in $k = 1, \ldots, 10$ spline basis terms per individual, with coefficients $\zeta_{i, k}$ where $\boldsymbol{\zeta}_{i} = (\zeta_{i, 1}, \ldots, \zeta_{i, 10})$.
We denote the individual specific B-spline basis evaluated at time $t_{i, j}$ as $B_{i}(t_{i, j}) \in [0, \infty)^{10}$ so that the submodel can be written as
\input{tex-input/mimic-example/0010-pf-model-def.tex}\noindent
We employ a weakly informative prior for the intercept $\beta_{0, i} \sim \text{N}(0, 1^2)$, a heavy tailed distribution for the error term[^pfoutliers] $\varepsilon_{i, j} \sim t_{5}(0, \omega_{i})$, and a weakly informative half-normal prior for the unknown scale parameter $\omega_{i} \sim \text{N}_{+}(0, 1^2)$.
For the spline basis coefficients we set $\zeta_{i, 1} \sim \text{N}(0, 0.1^2)$, and for $k = 2, \ldots, 10$ we employ the random-walk prior $\zeta_{i, k} \sim \text{N}(\zeta_{i, k - 1}, 0.1^2)$ from @kharratzadeh_splines_2017.
[^pfoutliers]: P/F data contain many outliers for, amongst many possible reasons, arterial/venous blood sample mislabelling; incorrectly recorded oxygenation support information; and differences between sample collection time, lab result time, and the observation time as recorded in the EHR.
We identify that a respiratory failure event occurred (which we denote by $d_{i} = 1$) at event time $T_{i}$ if a solution to the following optimisation problem exists
\input{tex-input/mimic-example/0012-event-time-model-def.tex}\noindent
We attempt to solve Equation \ref{eqn:event-time-model-def} using a standard multiple root finder [@soetaert_rootsolve_2020].
If there are no roots then the individual died or was discharged before respiratory failure occurred so we set $T_{i} = C_{i}$ and $d_{i} = 0$.
The relationship between $T_{i}$ and other model coefficients is displayed in the left hand panel of Figure \ref{fig:submodel-schematics}.
\begin{figure}
\begin{minipage}{.499\textwidth}
\input{tex-input/mimic-example/0090-pf-schematic.tex}
\end{minipage}%
\begin{minipage}{.499\textwidth}
\input{tex-input/mimic-example/0091-fluid-schematic.tex}
\end{minipage}
\caption{Parameters and form for the P/F ratio submodel ($\pd_{1}$, left) and cumulative fluid submodel ($\pd_{3}$, right).}
\label{fig:submodel-schematics}
\end{figure}
## Cumulative fluid submodel (piecewise linear) $\pd_{3}$
The rate of fluid administration reflects the clinical management of patients by ICU staff, and hence changes to the rate reflect decisions to change treatment strategy.
We employ a breakpoint regression model to capture the effect of such decisions, and consider only one breakpoint as this appears sufficient to fit the observed data.
Specifically, we model the 8-hourly cumulative fluid balance data $x_{i, l}$ (in litres) at times $u_{i, l}$, $l = 1, \ldots, L_{i}$.
The cumulative data are derived from the raw fluid input/output observations, which we detail in Appendix \ref{calculating-the-cumulative-fluid-balance-from-the-raw-fluid-data}.
We denote the complete vector of observations by $\boldsymbol{x}_{i} = (x_{i, 1}, \ldots, x_{i, L_{i}})$ and times by $\boldsymbol{u}_{i} = (u_{i, 1}, \ldots, u_{i, L_{i}})$.
We assume a piecewise linear model with $\eta_{0, i}$ as the value at the breakpoint at time $\kappa_{i}$, slope $\eta_{1, i}^{b}$ before the breakpoint, and slope $\eta_{1, i}^{a}$ after the breakpoint.
We write this submodel as
\input{tex-input/mimic-example/0020-piecewise-fluid-model.tex}\noindent
It will be useful to refer to the fitted value of this submodel at arbitrary time as $m_{i}(t)$.
We assume a weakly informative prior for the error term $\epsilon_{i, l} \sim \text{N}(0, \sigma^{2}_{x, i})$, with individual-specific error variances $\sigma_{x, i} \sim \text{N}_{+}(0, 5^2)$,
and specific, informative priors for the slope before the breakpoint $\eta^{b}_{1, i} \sim \text{Gamma}(1.53, 0.24)$ and after $\eta^{a}_{1, i} \sim \text{Gamma}(1.53, 0.24)$.
An appropriate prior for $\kappa_{i}$ and $\eta_{0, i}$ is challenging to specify due to the relationship between the two parameters and the individual-specific support for $\kappa_{i}$.
We address both challenges by reparameterisation, resulting in a prior for $\kappa_{i}$ that, in the absence of other information, places the breakpoint in the middle of an individual's ICU stay, and a prior for $\eta_{0, i}$ that captures the diverse pathways into ICU that an individual can experience.
Details and justifications for all the informative priors are available in Appendix \ref{priors-and-justification-for-the-cumulative-fluid-submodel}.
Figure \ref{fig:submodel-schematics} displays the parameters and their relationship to the fitted regression line.
## Survival submodel $\pd_{2}$
The rate at which fluid is administered is thought to influence the time to respiratory failure [@seethala_early_2017], so we explore this relationship using a survival model.
Individuals experience respiratory failure ($d_{i} = 1$) at time $0 < t < C_{i}$, or are censored $(d_{i} = 0, t = C_{i})$.
We assume a Weibull hazard with shape parameter $\gamma$ for the event times.
All individuals have baseline (time invariant) covariates $w_{i, a}$, $a = 1, \ldots, A$, with $\boldsymbol{w}_{i} = (1, w_{i, 1}, \ldots, w_{i, A})$ (i.e. including an intercept term), and common coefficients $\boldsymbol{\theta} = (\theta_{0}, \ldots, \theta_{A})$.
The hazard is assumed to be influenced by these covariates and the rate of increase $\frac{\partial}{\partial t} m_{i}(t)$ in the cumulative fluid balance.
The strength of the latter relationship is captured by $\alpha$.
Hence, the hazard is
\input{tex-input/mimic-example/0030-surv-submodel-def.tex}\noindent
The survival function at an individual's observed event time and status, $(T_{i}, d_{i})$, denoted $S_{i}(T_{i}) = \exp\{-\int_{0}^{T_{i}}h_{i}(u)\text{d}u\}$, has an analytic form which we derive in Appendix \ref{analytic-form-for-the-survival-function}.
Thus, the likelihood for individual $i$ is
\begin{equation}
\pd(T_{i}, d_{i} \mid \gamma, \boldsymbol{\theta}, \alpha, \kappa_{i}, \eta_{1, i}^{b}, \eta_{1, i}^{a}, \boldsymbol{w}_{i}) = h_{i}(T_{i})^{d_{i}} S_{i}(T_{i}), \\
\end{equation}
where we suppress the dependence on the parameters on the right hand side for brevity.
Our priors, which we justify in Appendix \ref{p2-prior-justification}, for the submodel specific parameters are $\gamma \sim \text{Gamma}(9.05, 8.72)$, $\alpha \sim \text{SkewNormal}(0, 0.5, -2)$, $\theta_{a} \sim \text{SkewNormal}(0, 0.5, -1)$, and $\theta_{0} \sim \text{N}(\hat{E}, 0.5^2)$ where $\hat{E}$ is the log of the crude event rate [@brilleman_bayesian_2020].
We adopt the same priors as the cumulative fluid balance submodel for $\kappa_{i}, \eta_{1, i}^{b}$, and $\eta_{1, i}^{a}$.
## Chained Markov melding details
```{r pf_fit_and_fluid_fit_plot, fig.cap = 'The P/F ratio data ($Y_{1}$, top row); cumulative fluid data ($Y_{3}$, bottom row); subposterior means and 95\\% credible intervals for each of the submodels (black solid lines and grey intervals); and stage one event times ($T_{i}$, red rug in the top row) for individuals $i = 17$ and $29$.'}
knitr::include_graphics('plots/mimic-example/combined-pf-fluid-small-new.pdf')
```
To combine the submodels with chained Markov melding we must define the common quantities $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
We meld $\pd_{1}$ and $\pd_{2}$ by treating the derived event times and indicators $\{(T_{i}, d_{i})\}_{i = 1}^{N}$ under $\pd_{1}$ as the "response", i.e. event times, in $\pd_{2}$.
Care is required when defining $\phi_{1 \cap 2}$ under $\pd_{1}$ as it is a deterministic function of $\beta_{0, i}$ and $\boldsymbol{\zeta}_{i}$.
Define $\chi_{1, i} = (\beta_{0, i}, \boldsymbol{\zeta}_{i})$ and $\phi_{1 \cap 2, i} = f(\chi_{1, i}) = (T_{i}, d_{i})$, where $f$ is the output from attempting to solve Equation \eqref{eqn:event-time-model-def}, so that $\phi_{1 \cap 2} = (f(\chi_{1, i}), \ldots, f(\chi_{1, N}))$.
The parameters shared by Equations \eqref{eqn:piecewise-fluid-model} and \eqref{eqn:surv-submodel-def-two} constitute $\phi_{2 \cap 3} = (\eta^{b}_{1, i}, \eta^{a}_{1, i}, \kappa_{i})_{i = 1}^{N}$.
To completely align with our chained melding notation we also define, for the P/F submodel, $Y_{1} = (\boldsymbol{z}_{i}, \boldsymbol{t}_{i})_{i = 1}^{N}$ and $\psi_{1} = (\omega_{i})_{i = 1}^{N}$, noting that $\psi_{1}$ and $(\chi_{1, i}, \ldots, \chi_{1, N})$ have no components in common.
For the cumulative fluid submodel we define $Y_{3} = (\boldsymbol{x}_{i}, \boldsymbol{u}_{i})_{i = 1}^{N}$, and $\psi_{3} = (\eta_{0, i}, \sigma^{2}_{x, i})_{i = 1}^{N}$.
Finally, for the survival submodel we define $Y_{2} = (\boldsymbol{w}_{i})_{i = 1}^{N}$ and $\psi_{2} = (\gamma, \boldsymbol{\theta}, \alpha)$.
## Pooling and estimation
We consider logarithmic pooling with $\lambda = (\frac{4}{5}, \frac{4}{5}, \frac{4}{5})$ (any smaller value of $\lambda$ results in a prior that is so uninformative that it causes computational problems) and with $\lambda = (1, 1, 1)$ (Product-of-Experts).
Because the correlation between $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ in $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ is important, we do not consider linear pooling in this example.
Logarithmic pooling requires estimates of $\pd_{1}(\phi_{1 \cap 2})$ and $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.
Because these are mixed distributions, with both discrete and continuous components, standard kernel density estimation, as suggested by @goudie_joining_2019, is inappropriate.
Instead we fit, to transformed versions of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$, a mixture containing a discrete component and either a Gaussian or beta distribution, depending on the transformation.
Further details for all the mixture distribution estimates are contained in Appendix \ref{estimating-submodel-prior-marginal-distributions}.
We use the parallel multi-stage sampler with $\pd_{\text{pool}, 1}(\phi_{1 \cap 2}) = \pd_{1}(\phi_{1 \cap 2}), \,\, \pd_{\text{pool}, 3}(\phi_{2 \cap 3}) = \pd_{3}(\phi_{2 \cap 3})$ and $\pd_{\text{pool}, 2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}) = \pd_{\text{pool}}(\boldsymbol{\phi}) \mathop{/} \left(\pd_{1}(\phi_{1 \cap 2}) \pd_{3}(\phi_{2 \cap 3}) \right)$.
That is, in stage one we target the subposteriors $\pd_{1}(\phi_{1 \cap 2}, \psi_{1} \mid Y_{1})$ and $\pd_{3}(\phi_{2 \cap 3}, \psi_{3} \mid Y_{3})$; in stage two we target the full melded model.
Targeting $\pd_{1}(\phi_{1 \cap 2}, \psi_{1} \mid Y_{1})$ in stage one alleviates the need to solve Equation \eqref{eqn:event-time-model-def} within an MCMC iteration, instead turning the production of $\phi_{1 \cap 2}$ into an embarrassingly parallel, post-stage-one processing step.
Attempting to sample the melded posterior directly would involve solving \eqref{eqn:event-time-model-def} many times within each iteration, presenting a sizeable computational hurdle which we avoid.
It is crucial for the convergence of our multi-stage sampler that the components of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ are updated _individual-at-a-time_ in stage two.
This is possible due to the conditional independence between individuals in the stage one posterior, and Appendix \ref{one-at-a-time} contains the details of this scheme.
The stage one subposteriors are sampled using `Stan`, using 5 chains with $10^{3}$ warm-up iterations and $10^{4}$ post warm-up iterations.
We use `Stan` to sample $\psi_{2}$ where, in every MH-within-Gibbs step, we run `Stan` for 9 warm-up iterations and 1 post warm-up iteration[^initialisation].
We run 5 chains of $10^{4}$ iterations for all stage two targets.
Visual and numerical diagnostics [@vehtari_rank-normalization_2020] are assessed and are available in the repository accompanying this paper[^repo].
[^initialisation]: We also initialise Stan at the previous value of $\psi_{2}$, and disable all adaptive procedures as the default (identity) mass matrix and step size are suitable for this example.
[^repo]: [https://doi.org/10.5281/zenodo.6552714](https://doi.org/10.5281/zenodo.6552714)
## Results
We first inspect the subposterior fitted values for $\pd_{1}$ and $\pd_{3}$.
The top row of Figure \ref{fig:pf_fit_and_fluid_fit_plot} displays the P/F data, the fitted submodel, and derived event times for individuals $i = 17$ and $29$.
The spline appears to fit the raw P/F data well, with the heavy tailed error term accounting for the larger deviations away from the fitted value.
It is interesting to see the relatively wide, multimodal distribution for $(T_{29}, d_{29})$ (there is a second mode at $(T_{29} = C_{29}, d_{29} = 0)$ and for other individuals not shown here).
The bottom row of Figure \ref{fig:pf_fit_and_fluid_fit_plot} displays the cumulative fluid data and the fitted submodel, with the little noise in the data resulting in minimal uncertainty about the fitted value and a concentrated subposterior distribution.
To assess the importance of fully accounting for the uncertainty in $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$, we compare the posterior for $\psi_{2}$ obtained using the chained melding approach with the posterior obtained by fixing $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
Plugging in a point estimate reflects common applied statistical practice when combining submodels, particularly when a distributional approximation is difficult to obtain (as it is for $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$).
Additionally, standard survival models and software typically do not permit uncertainty in event times and indicators, rendering such a plug-in approach necessary.
Specifically, we fix $\phi_{1 \cap 2}$ to the median value[^median] for each individual under $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ and denote it $\widehat{\phi}_{1 \cap 2}$, and use the subposterior mean of $\pd_{3}(\phi_{2 \cap 3} \mid Y_{3})$ denoted $\widehat{\phi}_{2 \cap 3}$.
With these fixed values we sample $\pd(\psi_{2} \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$.
We also compare the melded posterior to the submodel marginal prior $\pd_{2}(\psi_{2})$, but we note that this comparison is difficult to interpret, as the melding process alters the prior for $\psi_{2}$.
Figure \ref{fig:psi_2_comparison_plot} displays the aforementioned densities for $(\theta_{3}, \theta_{17}, \gamma, \alpha) \subset \psi_{2}$, with $(\theta_{3}, \theta_{17})$ chosen as they exhibit the greatest sensitivity to the fixing of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
For the baseline coefficients ($\theta_{3}, \theta_{17}$) the chained melding posterior differs slightly in location from $\pd(\psi_{2} \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$, with a small increase in uncertainty.
A more pronounced change is visible for $\alpha$, where the melding process has added a notable degree of uncertainty and shifted the posterior leftwards.
[^median]: For each individual the samples of $(T_{i}, d_{i})_{i = 1}^{N}$-pairs are sorted by $T_{i}$, and the $\lfloor \frac{N}{2}\rfloor$\textsuperscript{th} tuple $(\widehat{T}_{i}, \widehat{d}_{i})$ is chosen as the median.
```{r psi_2_comparison_plot, fig.cap = 'Density estimates for a subset of $\\psi_{2}$. The submodel marginal prior $\\pd_{2}(\\psi_{2})$ is shown as the grey dotted line (note that this is not the marginal prior under the melded model). The figure also contains the subposteriors obtained from chained melding using PoE pooling (red, solid line) and logarithmic pooling (blue, solid line), as well as the posterior using the fixed values $\\pd(\\psi_{2} \\mid \\widehat{\\phi}_{1 \\cap 2}, \\widehat{\\phi}_{2 \\cap 3}, Y_{2})$ (black, dashed line).'}
knitr::include_graphics('plots/mimic-example/psi-2-method-comparison-small.pdf')
```
To investigate which part of the melding process causes this change in the posterior of $\alpha$, we consider fixing either one of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ to their respective point estimates.
That is, we employ Markov melding as described in Section \ref{markov-melding}, using either logarithmic or PoE pooling, to obtain $\pd_{\text{meld}}(\alpha \mid \widehat{\phi}_{1 \cap 2}, Y_{2}, Y_{3})$ and $\pd_{\text{meld}}(\alpha \mid \widehat{\phi}_{2 \cap 3}, Y_{1}, Y_{2})$.
Figure \ref{fig:alpha_only_comparision_plot} displays the same distributions for $\alpha$ as Figure \ref{fig:psi_2_comparison_plot}, and adds the posteriors obtained using one fixed value ($\widehat{\phi}_{1 \cap 2}$ or $\widehat{\phi}_{2 \cap 3}$) whilst melding the other non-fixed parameter.
Evident for both choices of pooling is the importance of incorporating the uncertainty in $\phi_{1 \cap 2}$.
This is expected given the large uncertainty and multimodal nature of $\phi_{1 \cap 2}$ compared to $\phi_{2 \cap 3}$ (see Figure \ref{fig:pf_fit_and_fluid_fit_plot}).
We suspect that it is the multimodality in $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ that produces the shift in posterior mode of $\phi_{1 \cap 2}$, with the width of $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ affecting the increase in uncertainty.
Because we prefer the chained melded posterior, under either pooling method, for its full accounting of uncertainty we conclude that $\pd(\alpha \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$ is both overconfident and biased.
```{r alpha_only_comparision_plot, fig.cap = 'Median (vertical line), 50\\%, 80\\%, 95\\%, and 99\\% credible intervals (least transparent to most transparent) for $\\alpha$. The marginal prior (grey, top row) and posterior using fixed $\\widehat{\\phi}_{1 \\cap 2}$ and $\\widehat{\\phi}_{2 \\cap 3}$ (black, bottom row) are as in Figure \\ref{fig:psi_2_comparison_plot}. For the chained melded posteriors (red and blue, rows 2 and 3) and the melded posteriors (red and blue, rows 4 -- 7), the tick label on the y-axis denotes the type of pooling used, and which of $\\phi_{1 \\cap 2}$ and/or $\\phi_{2 \\cap 3}$ are fixed.'}
knitr::include_graphics('plots/mimic-example/psi-2-alpha-only-compare.pdf')
```
The marginal changes to the components of $\psi_{2}$ visible in Figure \ref{fig:psi_2_comparison_plot} appear small, however the cumulative effect of such changes becomes apparent when inspecting the posterior of the survival function.
Figure \ref{fig:kap_meier_pc_plot} displays the model-based, mean survival function under the melded posterior (using PoE pooling), and corresponding draws of $\phi_{1 \cap 2}$ converted into survival curves using the Kaplan-Meier estimator.
Also shown are the Kaplan-Meier estimate of $\widehat{\phi}_{1 \cap 2}$ and the mean survival function computed using $\pd(\psi_{2} \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$.
The posterior survival functions differ markedly, with the 95\% intervals overlapping only for small values of time.
It is also interesting to see that $\widehat{\phi}_{1 \cap 2}$, despite being a reasonable point estimate of $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$, is not very likely under the melded posterior.
Figure \ref{fig:kap_meier_pc_plot} also suggests that the Weibull hazard is insufficiently flexible for this example.
We discuss the complexities of other hazards in Section \ref{conclusion}.
```{r kap_meier_pc_plot, fig.cap = 'Survival curves and mean survival function at time $t$. The red, stepped lines are draws of $\\phi_{1 \\cap 2}$ from the melded posterior using PoE pooling, converted into survival curves via the Kaplan-Meier estimator. The smooth red line and interval (posterior mean and 95\\% credible interval) denote the model-based, mean survival function obtained from the melded posterior (PoE pooling) values of $\\psi_{2}$ and $\\phi_{2 \\cap 3}$. The blue dashed line is the Kaplan-Meier estimate of $\\widehat{\\phi}_{1 \\cap 2}$, and the blue solid line and interval are the corresponding model-based estimate from $\\pd(\\psi_{2} \\mid \\widehat{\\phi}_{1 \\cap 2}, \\widehat{\\phi}_{2 \\cap 3}, Y_{2})$.'}
knitr::include_graphics('plots/mimic-example/kaplan-meier-pc.pdf')
```
# Conclusion
This paper introduces the chained Markov melded model.
In doing so we make explicit the notion of submodels related in a chain-like way, describe a generic methodology for joining together any number of such submodels and illustrate its application with our examples.
Our examples also demonstrate the importance of quantifying the uncertainty when joining submodels; not doing so can produce biased, over-confident inference.
We also present the choices, and their impacts, that users of chained Markov melding must make which include: the choice of pooling function, and where required the pooling weights; the choice of posterior sampler and the design thereof, including the apportionment of the pooled prior over the stages and stage-specific MCMC techniques.
We have introduced extensions to linear and logarithmic pooling to marginals of different but overlapping quantities.
Linear pooling, introduced in Section \ref{pooled-prior}, could be extended to induce dependence between the components of $\boldsymbol{\phi}$ using multivariate or vine copulas [@kurowicka_dependence_2011; @nelsen_introduction_2006], or other techniques [@lin_recent_2014].
Copula methods are particularly appealing as, depending on the choice of copula, they yield computationally cheap to evaluate expressions for the density function, are easy to sample, and induce correlation between an arbitrary number of marginals.
Our parallel multi-stage sampler currently only considers $\Nm = 3$ submodels, rather than the fully generic definition of chained Markov melding in Equation \eqref{eqn:melded-model-general}.
Whilst we anticipate needing more complex methods in large $\Nm$ settings, the value of $\Nm$ at which the performance of our multi-stage sampler becomes unacceptable will depend on the specific submodels and data under consideration.
A general method would consider a large and arbitrary number of submodels in a chain, and initially split the chain into more/fewer pieces depending on the computational resources available.
Designing such a method is complex, as it would have to:
\begin{itemize}[nosep]
\tightlist
\item avoid requiring the inverse of any component of $\boldsymbol{\phi}$ with a noninvertible definition,
\item estimate the relative cost of sampling each submodel's subposterior, to split the chain of submodels into steps/jobs of approximately the same computational cost,
\item decide the order in which pieces of the chain are combined.
\end{itemize} \noindent
These are substantial challenges.
It may be possible to use combine the ideas in @lindsten_divide-and-conquer_2017 and @kuntz_divide-and-conquer_2021, who propose a parallel Sequential Monte Carlo method, with the aforementioned constraints to obtain a generic methodology.
Ideally we would retain the ability to use existing implementations of the submodels, however the need to recompute the weights of the particles, and hence reevaluate previously considered submodels, may preclude this requirement.
Our current sampler is also sensitive to large differences in location or scale of the target distribution between the stages.
The impact of these differences can be ameliorated using the methodology of @manderson_numerically_2022, and, more generally, Sequential Monte Carlo samplers are likely to perform better in these settings.
Our chained Markov melding methodology is general and permits any form of uncertainty in the common quantities.
In Section \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times-1} we use our chained melded model to incorporate uncertainty in the event times and indicators into a survival submodel.
Some specific forms of uncertainty in the event times have been considered in previous work.
These include @wang_integrative_2020, who consider uncertain event times arising from record linkage, where the event time is assumed to be one of a finite number of event times arising from the record linkage; and @giganti_accounting_2020, @oh_considerations_2018, and @oh_raking_2021, who leverage external validation data to account for measurement error in the event time.
However, the general and Bayesian nature of our methodology readily facilitates any form of uncertainty in the event times and the event indicators; uncertainty in the latter is not considered in the cited papers.
The example in Section \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times-1} has three more interesting aspects to discuss.
Firstly, the P/F ratio data used in the first submodel is obtained by finding all blood gas measurements from arterial blood samples.
Approximately $20\%$ of the venous/arterial labels are missing.
In these instances a logistic regression model, fit by the MIMIC team[^mimiclr], is used to predict the missing label based on other covariates.
It is theoretically possible to refit the model in a Bayesian framework and use the chained melded model to incorporate the uncertainty in the predicted sample label -- adding another 'link' to the chain.
[^mimiclr]: The coefficients, classification threshold, and the imputation used in the case of missing data are supplied in the `blood-gasses.sql` file in the GitHub repository accompanying this paper. No other information is available about this model (the data used to produce the coefficients, and the performance of the fitted model).
Secondly, the application of our multi-stage sampler to this example is similar to the two-stage approach used for joint longitudinal and time-to-event models [see @mauff_joint_2020 for a description of this approach].
In the two-stage approach, the longitudinal model is fit using standard MCMC methods in stage one, and the samples are reused in stage two when considering the time-to-event data.
This can significantly reduce the computational effort required to fit the joint model.
However, unlike our multi-stage sampler, the typical two-stage approach does not target the full posterior distribution, which can lead to biased estimates (though @mauff_joint_2020 extend the typical two-stage approach to reduce this bias).
Thirdly, we observe a lack of flexibility the baseline hazard, visible in Figure \ref{fig:kap_meier_pc_plot}.
More complex hazards could be employed, e.g. modelling the (log-)hazard using a (penalised) B-spline [@royston_flexible_2002; @rosenberg_hazard_1995; @rutherford_use_2015].
However, this increased flexibility precludes an analytic form for the survival function.
Whilst numerical integration is possible it is not trivial, particularly when the hazard is discontinuous, as our hazard is at the breakpoint.
Splines also have more coefficients than the single parameter of the Weibull hazard.
Identifiability issues arise with a small number of individuals, many of whom are censored, and are compounded when there are a relatively large number of other parameters $(\alpha, \boldsymbol{\theta})$.
Whilst we do not believe these costs are worth incurring for our example, for settings with a larger number of patients and more complicated longitudinal submodels the increased flexibility may be vital.
## Acknowledgements {-}
We thank Sarah L Cowan for assistance in understanding respiratory failure, and Anne Presanis and Brian Tom for many helpful discussions about the methodological aspects of this work.
We also thank Luiz Max Carvalho for comments on an earlier version of this paper.
This work was supported by The Alan Turing Institute under the UK Engineering and Physical Sciences Research Council (EPSRC) [EP/N510129/1] and the UK Medical Research Council [programme code MC\_UU\_00002/2].
<!-- This makes pandoc-citeproc put the references before the end of document. -->
<!-- <div id="refs"></div> -->
<!-- Now switch to alphabetical numbering for the appendix, and reset the counter. -->
\renewcommand{\thesubsection}{\Alph{subsection}}
\setcounter{subsection}{0}
# Appendices {-}
## Log pooling Gaussian densities
We can exactly compute $\pd_{\text{pool}}$ when logarithmically pooling Gaussian densities.
Noting that, in the one dimensional case, $\text{N}(\phi; \mu, \sigma^2)^{\lambda_{\modelindex}} = \text{N}(\phi; \mu, \frac{\sigma^2}{\lambda_{\modelindex}})$, we use the results of @bromiley_products_2003 and write
\input{tex-input/multiple-phi/0070-log-pooling-gaussian.tex}\noindent
hence $\pd_{\text{pool}}(\phi_{1 \cap 2}, \phi_{2 \cap 3}) = \text{N}((\phi_{1 \cap 2}, \phi_{2 \cap 3});\, \mu_{\text{log}}, \, \Sigma_{\text{log}})$.
The choice of $\lambda_{2}$ is critical; by controlling the contribution of $\pd_{2}$ to $\pd_{\text{pool}}$, $\lambda_{2}$ controls the degree of correlation present in the latter.
The left hand column of Figure \ref{fig:pooled_densities_plot} illustrates this phenomena.
When $\lambda_{1} = \lambda_{3} = 0 \implies \lambda_{2} = 1$, all correlation in $\pd_{2}$ is present in $\pd_{\text{pool}}$.
The correlation decreases for increasing values of $\lambda_{1}$ until $\lambda_{1} = \lambda_{3} = 0.5 \implies \lambda_{2} = 0$, where no correlation persists.
## Sequential sampler
\input{tex-input/multi-stage-sampler/0001-seq-sampler-dag.tex}
Figure \ref{fig:seq-sampler-dag} depicts graphically the strategy employed by the sequential sampler.
The sequential sampler assumes that the pooled prior decomposes such that
\input{tex-input/multi-stage-sampler/0002-sequential-sampler-decomposition.tex}\noindent
This is necessary to avoid sampling all components of $\boldsymbol{\phi}$ in the first stage.
All pooled priors trivially satisfy \eqref{eqn:sequential-sampler-decomposition}, as we can assume all but $\pd_{\text{pool}, 3}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ are improper, flat distributions.
However, including some portion of the pooled prior in each stage of the sampler can improve performance, and eliminate computational instabilities when submodel likelihoods contain little information.
### Stage one
Stage one of the sequential sampler targets
\input{tex-input/multi-stage-sampler/0020-stage-one-target.tex}\noindent
using a generic proposal kernel for both $\phi_{1 \cap 2}$ and $\psi_{1}$.
The corresponding acceptance probability for a proposed update from $(\phi_{1 \cap 2}, \psi_{1})$ to $(\phi_{1 \cap 2}^{*}, \psi_{1}^{*})$ is
\input{tex-input/multi-stage-sampler/0021-stage-one-acceptance-probability.tex}
### Stage two
The stage two target augments the stage one target by including the second submodel, corresponding prior marginal distribution, and an additional pooled prior term
\input{tex-input/multi-stage-sampler/0030-stage-two-target.tex}\noindent
A Metropolis-within-Gibbs strategy is employed, where the stage one samples are used as a proposal for $\phi_{1 \cap 2}$, whilst a generic proposal kernel is used for $\psi_{2}$ and $\phi_{2 \cap 3}$.
Thus the proposal distributions for $\phi_{1 \cap 2}^{*}$ and $(\phi_{2 \cap 3}^{*}, \psi_{2}^{*})$ are
\input{tex-input/multi-stage-sampler/0031-stage-two-gibbs-updates.tex}\noindent
The acceptance probability for this proposal strategy is
\input{tex-input/multi-stage-sampler/0032-stage-two-acceptance-probabilities.tex}\noindent
Our judicious choice of proposal distribution has resulted in a cancellation in Equation \eqref{eqn:stage-two-acceptance-probabilities-one} which removes all terms related to $\pd_{1}$.
Similarly, all terms related to $\pd_{1}$ are constant -- hence cancel -- in Equation \eqref{eqn:stage-two-acceptance-probabilities-two}.
This eliminates any need to re-evaluate the first submodel.
### Stage three
In stage three we target the full melded posterior
\input{tex-input/multi-stage-sampler/0044-stage-three-target.tex}\noindent
The target has now been broadened to include terms from the third submodel and the entirety of the pooled prior.
Again, we employ a Metropolis-within-Gibbs sampler, with proposals drawn such that
\input{tex-input/multi-stage-sampler/0045-stage-three-gibbs-updates.tex}\noindent
which leads to acceptance probabilities of
\input{tex-input/multi-stage-sampler/0046-stage-three-acceptance-probabilities.tex}\noindent
The informed choice of proposal distribution for ($\phi_{1 \cap 2}, \phi_{2 \cap 3}, \psi_{1}, \psi_{2}$) has allowed us to target the full melded posterior without needing to evaluate all submodels simultaneously.
## Normal approximation calculations
Substituting in the approximations of Section \ref{normal-approximations-to-submodel-components} to Equation \eqref{eqn:normal-approx-melded-posterior-target} yields the approximate melded posterior
\input{tex-input/multiple-normal-approximation/0020-normal-approximation-approximate-target.tex}\noindent
Noting that the product of independent normal densities is an unnormalised multivariate normal density with independent components, we rewrite Equation \eqref{eqn:normal-approximation-approximate-target} as
\input{tex-input/multiple-normal-approximation/0030-normal-approx-nu-de-form.tex}\noindent
The ratio of normal densities is also an unnormalised normal density, and hence Equation \eqref{eqn:normal-approx-nu-de-form} simplifies to
\input{tex-input/multiple-normal-approximation/0041-final-normal-approx-appendix.tex}
## Calculating the cumulative fluid balance from the raw fluid data
In the raw fluid data each patient has $\tilde{l} = 1, \ldots, \tilde{L}_{i}$ observations.
Each observation $\tilde{x}_{i, \tilde{l}}$ is typically a small fluid administration (e.g. an injection of some medicine in saline solution), or a fluid discharge (almost always urine excretion).
The observations have corresponding observation times $\tilde{u}_{i, \tilde{l}}$, with $\tilde{\boldsymbol{u}}_{i} = \{\tilde{u}_{i, 1}, \ldots, \tilde{u}_{i, \tilde{L}_{i}}\}$ and $\tilde{\boldsymbol{x}}_{i} = \{\tilde{x}_{i, 1}, \ldots, \tilde{x}_{i, \tilde{L}_{i}}\}$.
We code the fluid administrations/inputs as positive values, and the excretions/outputs as negative values.
Each patient has an enormous number of raw fluid observations $(L_{i} \ll \tilde{L}_{i})$ and it is computationally infeasible to consider them all at once.
We aggregate the raw fluid observations into 8-hourly changes in fluid balance.
From these 8-hourly changes we calculate the cumulative fluid balance.
Mathematically, we define an ordered vector of boundary values
\begin{equation}
\boldsymbol{v}_{i} = (\lfloor \min\{\tilde{\boldsymbol{u}}_{i}\} \rfloor, \lfloor \min\{\tilde{\boldsymbol{u}}_{i}\} \rfloor + \frac{1}{3}, \ldots, \lceil \max\{\tilde{\boldsymbol{u}}_{i}\} \rceil),
\end{equation}
noting that $\dim(\boldsymbol{v}_{i}) = L_{i} + 1$.
Because the observation times encoded as _days since ICU admission_ and we are interested in the 8-hourly changes, our floor and ceiling functions round down or up to the appropriate third respectively.
The raw fluid observations are then divided up into $L_{i}$ subsets of $\{\tilde{\boldsymbol{x}}_{i}, \tilde{\boldsymbol{u}}_{i}\}$ based on which boundary values the observation falls in between:
\begin{equation}
V_{i, l} = \left\{
\{\tilde{\boldsymbol{x}}_{i}, \tilde{\boldsymbol{u}}_{i}\}
\mid
v_{i, l} \leq \tilde{\boldsymbol{u}}_{i} < v_{i, l + 1}
\right\},
\end{equation}
for $l = 1, \ldots, L_{i}$.
Denote $N^{V}_{i, l} = \lvert V_{i, l} \rvert \mathop{/} 2$ (dividing by two as $V_{i, l}$ contains both the observation and the observation time).
The $l$\textsuperscript{th} 8-hourly fluid change $\Delta_{i, l}$ and corresponding observation time $u_{i, l}$ can then be computed as
\begin{equation}
\Delta_{i, l} = \sum_{s = 1}^{N^{V}_{i, l}} \tilde{x}_{i, s}, \,\, \text{s.t.} \,\, \tilde{x}_{i, s} \in V_{i, l}, \qquad
u_{i, l} = \frac{1}{N^{V}_{i, l}} \sum_{s = 1}^{N^{V}_{i, l}} \tilde{u}_{i, s}, \,\, \text{s.t.} \,\, \tilde{u}_{i, s} \in V_{i, l}.
\end{equation}
Finally, the 8-hourly cumulative fluid balance data are computed by $x_{i, l} = \sum_{s = 1}^{l} \Delta_{i, s}$, and we assume they too are observed at $u_{i, l}$.
## Priors and justification for the cumulative fluid submodel
The parameters for the gamma prior for $\eta^{b}_{1, i}$ and $\eta^{a}_{1, i}$ are obtained by assuming that the 2.5-, 50-, and 97.5- percentiles are at 0.5, 5, and 20 [@belgorodski_rriskdistributions_2017].
A slope of $0.5$ (i.e. the change in cumulative fluid balance per day) is unlikely but possible due to missing data; a slope of $20$ is also unlikely but possible as extremely unwell patients can have very high respiratory rates and thus require large fluid inputs.
The prior for the breakpoint $\kappa_{i}$ is derived as follows.
Define $u_{i, (1)} = \min(\boldsymbol{u}_{i})$ and $u_{i, (n)} = \max(\boldsymbol{u}_{i})$, with $r_{i} = u_{i, (n)} - u_{i, (1)}$.
We reparameterise the breakpoint by noting that $\kappa_{i} = \kappa^{\text{raw}}_{i}r_{i} + u_{i, (1)}$, where $\kappa^{\text{raw}} \in [0, 1]$.
We then set $\kappa^{\text{raw}}_{i} \sim \text{Beta}(5, 5)$ to regularise the breakpoint towards the middle of each individual's stay in ICU.
This is crucial to ensure the submodel is identifiable when there is little evidence of a breakpoint in the data.
Note that this results in the following analytic expression for $\pd_{2}(\phi_{2 \cap 3})$
\begin{equation}
\pd_{3}(\phi_{2 \cap 3}) = \prod_{i = 1}^{N} \pd(\eta^{b}_{1, i}) \pd(\eta^{a}_{1, i}) \pd(\kappa_{i}), \,\, \text{with} \,\,\,
\pd(\kappa_{i}) = \pd_{\kappa^{\text{raw}}_{i}}\left(\frac{\kappa_{i} - u_{i, (1)}}{r_{i}}\right) \frac{1}{r_{i}}
\end{equation}
by the change of variables formula.
Specifying a prior for $\eta_{0, i}$, the cumulative fluid balance at $\kappa_{i}$, is difficult because it too depends on the length of stay.
Instead, we reparameterise so that $\eta_{0, i}$ is a function of the y-intercept $\eta_{0, i}^{\text{raw}}$.
\begin{equation}
\eta_{0, i} =
(\eta_{0, i}^{\text{raw}} + \eta^{b}_{1, i} \kappa_{i}) \boldsymbol{1}_{\{0 < \kappa_{i}\}} +
(\eta_{0, i}^{\text{raw}} + \eta^{a}_{1, i} \kappa_{i}) \boldsymbol{1}_{\{0 \geq \kappa_{i}\}}
\end{equation}
We place a $\text{LogNormal}(1.61, 0.47^2)$ prior on $\eta_{0, i}^{\text{raw}}$.
These values are obtained assuming that, _a priori_, the $2.5\%, 50\%$, and $99\%$ percentiles of $\eta_{0, i}^{\text{raw}}$ are $0.5, 5$, and $15$ respectively [@belgorodski_rriskdistributions_2017].
This is a broad prior that reflects the numerous possible admission routes into the ICU.
We expect those admitted from the wards to have little pre-admission fluid data.
Those admitted from the operating theatre occasionally have their in-theatre fluid input recorded after admission into the ICU, with no easy way to distinguish these records in the data.
## Analytic form for the survival function
The hazard at arbitrary time $t$ is
\begin{gather*}
h_{i}(t) = \gamma t^{\gamma - 1} \exp\left\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta} + \alpha \frac{\partial}{\partial t} m_{i}(t)\right\} \\
m_{i}(t) = \eta_{0, i} + \eta^{b}_{1, i}(t - \kappa_{i})\boldsymbol{1}_{\{t < \kappa_{i}\}} + \eta^{a}_{1, i}(t - \kappa_{i})\boldsymbol{1}_{\{t \geq \kappa_{i}\}} \\
\frac{\partial}{\partial t} m_{i}(t) = \eta^{b}_{1, i}\boldsymbol{1}_{\{t < \kappa_{i}\}} + \eta^{a}_{1, i}\boldsymbol{1}_{\{t \geq \kappa_{i}\}}.
\end{gather*}
Then, for $t > \kappa_{i}$, the cumulative hazard is
\begin{align*}
\int_{0}^{t} h_{i}(u) \text{d}u
&= \int_{0}^{t}
\gamma u^{\gamma - 1}
\exp\left\{
\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta} +
\alpha \eta^{b}_{1, i}\boldsymbol{1}_{\{u < \kappa_{i}\}} +
\alpha \eta^{a}_{1, i}\boldsymbol{1}_{\{u \geq \kappa_{i}\}}
\right\}
\text{d}u \\
&= \gamma \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
\int_{0}^{t}
u^{\gamma - 1}
\exp\left\{
\alpha \eta^{b}_{1, i}\boldsymbol{1}_{\{u < \kappa_{i}\}} +
\alpha \eta^{a}_{1, i}\boldsymbol{1}_{\{u \geq \kappa_{i}\}}
\right\}
\text{d}u \\
&= \gamma \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
\left[
\int_{0}^{\kappa_{i}}
u^{\gamma - 1}
\exp\left\{
\alpha \eta^{b}_{1, i}
\right\}
\text{d}u
+
\int_{\kappa_{i}}^{t}
u^{\gamma - 1}
\exp\left\{
\alpha \eta^{a}_{1, i}
\right\}
\text{d}u
\right] \\
&= \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
\left[
\exp\left\{
\alpha \eta^{b}_{1, i}
\right\}
\kappa_{i}^{\gamma}
+
\exp\left\{
\alpha \eta^{a}_{1, i}
\right\}
(t^{\gamma} - \kappa_{i}^{\gamma})
\right]
\end{align*}
and for $t < \kappa_{i}$
\begin{align*}
\int_{0}^{t} h_{i}(u) \text{d}u
&= \gamma \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
\left[
\int_{0}^{t}
u^{\gamma - 1}
\exp\left\{
\alpha \eta^{b}_{1, i}
\right\}
\text{d}u
\right] \\
&= \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
\left[
\exp\left\{
\alpha \eta^{b}_{1, i}
\right\}
t_{i}^{\gamma}
\right] \\
&= t_{i}^{\gamma} \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta} + \alpha \eta^{b}_{1, i}\}.
\end{align*}
The survival functions then have corresponding definitions for $t > \kappa_{i}$ and $t < \kappa_{i}$ as $S_{i}(t) = \exp\{-\int_{0}^{t} h_{i}(u) \text{d}u\}$.
## Survival submodel prior justification {#p2-prior-justification}
Our prior for $(\gamma, \alpha, \boldsymbol{\theta})$ must result in a plausible distribution for $\pd_{2, i}(T_{i} \mid d_{i} = 1)$, and a reasonable balance between $d_{i} = 1$ and $d_{i} = 0$ events.
The primary concern is unintentionally specifying a prior for which the bulk of $\pd_{2, i}(T_{i} \mid d_{i} = 1)$ is very close to zero.
In addition, certain extreme configurations of $(\gamma, \alpha, \boldsymbol{\theta})$ cause issues for the methodology of @crowther_simulating_2013, particularly the numerical root finding and numerical integration steps.
We would like to rule out such extreme configurations _a priori_.
Ideally we would encode this information a joint prior for $(\gamma, \alpha, \boldsymbol{\theta})$, but specifying the appropriate correlation structure for these parameters is prohibitively challenging.
Instead we focus on specifying appropriate marginals for each of $\gamma, \alpha$, and $\boldsymbol{\theta}$, and create visual prior predictive checks [@gabry_visualization_2019; @gelman_bayesian_2020] to ensure the induced prior for $(T_{i}, d_{i})$ is acceptable.
Before justifying our chosen marginal prior, we note that the $\exp\{\boldsymbol{x}_{i}\theta + \alpha \frac{\partial}{\partial T_{i}} m_{i}(T_{i})\}$ term implies that the priors for $\theta$ and $\alpha$ are on the log-scale.
Hence the magnitude of these parameters must be small, otherwise all event times would be very near zero or at infinity.
The asymmetric effect of the transformation from the log scale also implies that symmetric priors are not obviously sensible.
From these observations we deduce that $\theta$ and $\alpha$ must not be too large in magnitude, however if they are negative then they can be slightly larger.
Hence, we specify the skew-normal priors detailed in Section \ref{survival-submodel-pd_2}, noting that the skewness parameter for $\alpha$ is smaller, because $\frac{\partial}{\partial T_{i}} m_{i}(T_{i})$ is strictly positive and typically between 0.5 and 20, whilst $\boldsymbol{w}_{i}$ is standardised to be approximately standard normal.
Lastly, if $\gamma$ is too far away from $1$ (in either direction), then the event times are very small either because the hazard increases rapidly ($\gamma \gg 1$), or because almost all of the cumulative hazard is in the neighbourhood of 0 ($\gamma \ll 1$).
We specify a gamma distribution for $\gamma$ with the $1$\textsuperscript{th}-, $50$\textsuperscript{th}-, and $99$\textsuperscript{th}-percentiles of $\pd_{2}(\gamma)$ at $0.2, 1$, and $2$, allowing for a wide range of hazard shapes, but removing many of the extremes.
## Estimating submodel prior marginal distributions
For $\pd_{1}$, we note that $\pd_{1}(\phi_{1 \cap 2}) = \prod_{i = 1}^{N}\pd_{1, i}(T_{i}, d_{i})$, and that $\pd_{1, i}(T_{i}, d_{i})$ conditions on each individual's length of stay (in specifying the location of the knots), as well as the range, mean, and standard deviation of the P/F data (by standardising $\tilde{z}_{i, j}$).
Simple Monte Carlo samples are drawn from $\pd_{1}(\phi_{1 \cap 2})$ and used to estimate $\widehat{\pd}_{1}(\phi_{1 \cap 2})$.
Under the second submodel we obtain samples from $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ using the methodology of @crowther_simulating_2013 as implemented in `simsurv` [@brilleman_simsurv_2021].
These samples are use to estimate $\widehat{\pd}_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.
### P/F submodel
We approximate $\pd_{1}(\phi_{1 \cap 2})$ using a mixture of discrete and continuous distributions, with a discrete spike at $C_{i}$ for the censored events and a beta distribution for the (rescaled) event times.
Monte Carlo samples of $T_{i}$ and $d_{i}$ are obtained from $\pd_{1, i}(T_{i}, d_{i})$ by drawing $\beta_{0, i}$ and $\boldsymbol{\zeta}_{i}$ from their respective prior distributions and then solving \eqref{eqn:event-time-model-def}.
Denoting the estimated mixture weight $\widehat{\pi}_{i} \in [0, 1]$, the density estimate is
\begin{equation}
\widehat{\pd}_{1, i}(T_{i}, d_{i}) =
\widehat{\pi}_{i} \, \text{Beta}\left(\frac{T_{i}}{C_{i}}; \widehat{a}, \widehat{b}\right) \frac{1}{C_{i}} \boldsymbol{1}_{\{d_{i} = 1\}} +
(1 - \widehat{\pi}_{i}) \boldsymbol{1}_{\{d_{i} = 0, T_{i} = C_{i}\}}
\label{eqn:pf-event-time-prior-dist}
\end{equation}
where $\widehat{\pi}_{i}, \widehat{a}_{i}$ and $\widehat{b}_{i}$ are maximum likelihood estimates obtained using the prior samples.
Examples of $\widehat{\pd}_{1, i}(T_{i}, d_{i})$ for a subset of individuals are displayed in Figure \ref{fig:pf_prior_fit}.
```{r pf_prior_fit, fig.cap = "Fitted distribution (curve) and Monte Carlo samples drawn from $\\pd_{1}(\\phi_{1 \\cap 2})$ (histogram) for a subset of the individuals in the cohort. The height of the atom at $C_{i}$ (red bar and point) has been set to $1 - \\widehat{\\pi}_{i}$"}
knitr::include_graphics("plots/mimic-example/pf-prior-plot-small.pdf")
```
### Survival submodel
Our estimate of $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ relies on the fact that
\begin{equation}
\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}) = \prod_{i = 1}^{N}\pd_{2, i}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i}).
\end{equation}
As such we estimate $\pd_{2, i}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i})$ for each individual and take the product of these estimates.
Drawing samples from $\pd_{2, i}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i})$ is challenging: we use the approach proposed by @crowther_simulating_2013 as implemented in @brilleman_simsurv_2021.
Inspecting the samples reveals correlation between $(T_{i}, d_{i})$ and $(\kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i})$ that we would like to capture in our estimate.
To do so, we fit a mixture of multivariate normal distributions to transformations of the continuous parameters with support on $\mathbb{R}$,
\begin{equation}
\begin{gathered}
\tilde{T}_{i} = \text{Logit}\left(\frac{T_{i}}{C_{i}}\right), \quad
\tilde{\kappa}_{i} = \text{Logit}\left(\frac{\kappa_{i} - u_{i, (1)}}{u_{i, (n)} - u_{i, (1)}}\right), \\
\tilde{\eta}^{b}_{1, i} = \log(\eta^{b}_{1, i}), \quad
\tilde{\eta}^{a}_{1, i} = \log(\eta^{a}_{1, i}).
\end{gathered}
\end{equation}
The resulting density estimate, with estimated mixture weight $\widehat{\theta}_{i} \in [0, 1]$, is
\begin{align*}
\widehat{\pd}_{2}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i}) =
\Big[&
(\widehat{\theta}_{i})
\text{N}\left(
\left[\tilde{T}_{i}, \tilde{\kappa}_{i}, \tilde{\eta}^{b}_{1, i}, \tilde{\eta}^{a}_{1, i} \right]^{\mathsf{T}}; \widehat{\mu}_{1, i}, \widehat{\Sigma}_{1, i}
\right)
\boldsymbol{1}_{\{d_{i} = 1\}}
+ \\
& (1 - \widehat{\theta}_{i})
\text{N}\left(\left[\tilde{\kappa}_{i}, \tilde{\eta}^{b}_{1, i}, \tilde{\eta}^{a}_{1, i} \right]^{\mathsf{T}}; \widehat{\mu}_{0, i}, \widehat{\Sigma}_{0, i} \right)
\boldsymbol{1}_{\{d_{i} = 0, T_{i} = C_{i}\}}