multiple-phi.rmd

---
title: "Combining chains of Bayesian models with Markov melding"
runtitle: "Chained Markov melding"
author:
  - name: "Andrew A."
    surname: "Manderson"
    email: "andrew.manderson@mrc-bsu.cam.ac.uk"
    label: e1
    addressLabel: A
    sepNext: and
  - name: "Robert J. B."
    surname: "Goudie"
    email: "robert.goudie@mrc-bsu.cam.ac.uk"
    label: e2
    addressLabel: B
affiliation:
  - label: A
    name: MRC Biostatistics Unit, University of Cambridge, United Kingdom, and The Alan Turing Institute
    authorsLabels: e1
  - label: B
    name: MRC Biostatistics Unit, University of Cambridge, United Kingdom
    authorsLabels: e2
year: "`r format(Sys.time(), '%Y')`"
abstract: |
  A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data sets.
  It may be easier to instead specify distinct submodels for each source of data, then join the submodels together.
  We consider chains of submodels, where submodels directly relate to their neighbours via common quantities which may be parameters or deterministic functions thereof.
  We propose _chained Markov melding_, an extension of Markov melding, a generic method to combine chains of submodels into a joint model.
  One challenge we address is appropriately capturing the prior dependence between common quantities within a submodel, whilst also reconciling differences in priors for the same common quantity between two adjacent submodels.
  Estimating the posterior of the resulting overall joint model is also challenging, so we describe a sampler that uses the chain structure to incorporate information contained in the submodels in multiple stages, possibly in parallel.
  We demonstrate our methodology using two examples.
  The first example considers an ecological integrated population model, where multiple data sets are required to accurately estimate population immigration and reproduction rates.
  We also consider a joint longitudinal and time-to-event model with uncertain, submodel-derived event times.
  Chained Markov melding is a conceptually appealing approach to integrating submodels in these settings.
keywords:
  - Combining models
  - Markov melding
  - Bayesian graphical models
  - Multi-stage estimation
  - Model/data integration
  - Integrated population model
bibliography: bibliography/multi-phi-bib.bib
biblio-style: ba ## alternative: imsart-number
output:
  rticles::ba_article:
    includes:
      in_header:
        tex-input/pre.tex
      keep_tex: true
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  echo = FALSE,
  comment = NA,
  fig.align = "center"
)
```

# Introduction

The Bayesian philosophy is appealing in part because the posterior distribution quantifies all sources of uncertainty.
However, a joint model for all data and parameters is a prerequisite to posterior inference, and in situations where multiple, heterogeneous sources of data are available, specifying such a joint model is a formidable task.
Models that consider such data are necessary to describe complex phenomena at a useful precision.
One possible approach begins by specifying individual submodels for each source of data.
These submodels could guide the statistician when directly specifying the joint model, but to use the submodels only informally seems wasteful.
Instead, it may be preferable to construct a joint model by formally joining the individual submodels together.

Some specific forms of combining data are well established.
Meta-analyses and evidence synthesis methods are widely used to summarise data, often using hierarchical models [@ades_multiparameter_2006; @presanis_synthesising_2014].
Outside of the statistical literature, a common name for combining multiple data is _data fusion_ [@lahat_multimodal_2015; @kedem_statistical_2017], though there are many distinct methods that fall under this general name.
Interest in integrating data is not just methodological; applied researchers often collect multiple disparate data sets, or data of different modalities, and wish to combine them.
For example, to estimate SARS-CoV-2 positivity @donnat_bayesian_2020 build an intricate hierarchical model that integrates both testing data and self-reported questionnaire data, and @parsons_bayesian_2021 specify a hierarchical model of similar complexity to estimate the number of injecting drug users in Ukraine.
Both applications specify Bayesian models with data-specific components, which are united in a hierarchical manner.
In conservation ecology, _integrated population models_ (IPMs) [@besbeas_integrating_2002; @brooks_bayesian_2004; @schaub_integrated_2011; @maunder_review_2013; @zipkin_synthesizing_2018] are used to estimate population dynamics, e.g. reproduction and immigration rates, using multiple data on the same population.
Such data have standard models associated with them, such as the Cormack-Jolly-Seber model [@lebreton_modeling_1992] for capture-recapture data, and the IPM serves as the framework in which the standard models are combined.
More generally, the applications we list illustrate the importance of generic, flexible methods for combining data to applied researchers.

_Markov melding_ [@goudie_joining_2019] is a general statistical methodology for combining submodels.
Specifically, it considers $\Nm$ submodels that share some common quantity $\phi$, with each of the $\modelindex = 1, \ldots, \Nm$ submodels possessing distinct parameters $\psi_{\modelindex}$, data $Y_{\modelindex}$, and form $\pd_{\modelindex}(\phi, \psi_{\modelindex}, Y_{\modelindex})$.
@goudie_joining_2019 then propose to combine the submodels into a joint model, denoted  $\pd_{\text{meld}}(\phi, \psi_{1}, \ldots, \psi_{\Nm}, Y_{1}, \ldots, Y_{\Nm})$.
However, it is unclear how to integrate models where there is no single quantity $\phi$ common to all submodels, such as for submodels that are linked in a chain structure.

We propose an extension to Markov melding, which we call _chained Markov melding_[^chained], which facilitates the combination of $\Nm$ submodels that are in a chain structure.
For example, when $\Nm = 3$ we address the case in which submodel 1 and 2 share a common quantity $\phi_{1 \cap 2}$, and submodel 2 and 3 share a different quantity $\phi_{2 \cap 3}$.
Our extension addresses previously unconsidered complications including the distinct domains (and possibly supports) of the common quantities, and the desire to capture possible prior correlation between them.
Two examples serve to illustrate our methodology, which we introduce in the following section.
The computational effort required to fit a complex, multi-response model is a burden to the model development process.
We propose a multi-stage posterior estimation method that exploits the properties of our chained melded model to reduce this burden.
We can parallelise aspects of the computation across the submodels, using less computationally expensive techniques for some submodels.
Reusing existing software implementations of submodels, and subposterior samples  where available, is also possible.
Multi-stage samplers can aid in understanding the contribution of each submodel to the final posterior, and are used in many applied settings, including hierarchical modelling [@lunn:etal:13] and joint models [@mauff_joint_2020].

One contribution of our work is to clarify the informal process commonly used in applied analyses of summarising and/or approximating submodels for use in subsequent analyses.
The two most common approximation strategies seem to be (i) approximating the subposterior of the common quantity with a normal distribution for use in subsequent models [see, e.g. @jackson_when_2018; @nicholson_interoperability_2021] and (ii) taking only a point estimate of the subposterior, and treating it as a known value in further models.
These strategies may, but not always, produce acceptable approximations to the chained melded model.
Both the chained melded model and these approximation strategies are examples of 'multi-phase' and 'multi-source' inference [@meng_trio_2014], with the melding approach most comprehensively accounting for uncertainty.

[^chained]: _"Chained graphs"_ were considered by @lauritzen_chain_2002, however they are unrelated to our proposed model. We use "chained" to emphasise the nature of the relationships between submodels.

## Example introduction

In this section we provide a high-level overview of two applications that require integrating a chain of submodels, with more details in Sections \ref{an-integrated-population-model-for-little-owls-1} and \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times-1}.
Our first example decomposes a joint model into its constituent submodels and rejoins them.
This simple situation allows us to compare the output from the chained melding process to the complete joint model, and is meant to illustrate both the 'chain-of-submodels' notion and the mechanics of chained melding.
The second example is a realistic and complex setting in which the combining of submodels without chained Markov melding is nonobvious.
Our comparator is the common technique of summarising previously considered submodels with point estimates, and demonstrates the importance of fully accounting for uncertainty.

### An integrated population model for little owls

Integrated population models (IPMs) [@zipkin_synthesizing_2018] combine multiple data to estimate key quantities governing the dynamics of a specific population.
@schaub_local_2006 and @abadi_estimation_2010 used an IPM to estimate fecundity, immigration, and yearly survival rates for a population of little owls.
These authors collect and model three types of data, illustrated in Figure \ref{fig:owls-simple-dag}.
Capture-recapture data $Y_{1}$, and associated capture-recapture submodel $\pd_{1}(\phi_{1 \cap 2}, \psi_{1}, Y_{1})$, are acquired by capturing and tagging owls each year, and then counting the number of tagged individuals recaptured in subsequent years.
Population counts $Y_{2}$ are obtained by observing the number of occupied nesting sites, and are modelled in $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}, \psi_{2}, Y_{2})$.
Finally, nest-record data $Y_{3}$ counts both the number of reproductive successes and possible breading pairs, and is associated with a submodel for fecundity $\pd_{3}(\phi_{2 \cap 3}, \psi_{3}, Y_{3})$.
The population count model $\pd_{2}$ shares the parameter $\phi_{1 \cap 2}$ with the capture-recapture model $\pd_{1}$, and the parameter $\phi_{2 \cap 3}$ with the fecundity model $\pd_{3}$; each of the $\modelindex = 1, 2, 3$ submodels has distinct, submodel-specific parameters $\psi_{\modelindex}$.
No single source of data is sufficient to estimate all quantities of interest, so it is necessary to integrate the three submodels into a single joint model to produce acceptably precise estimates of fecundity and immigration rates.
We will show that the chained Markov melding framework developed in Section \ref{chained-model-specification} encapsulates the process of integrating these submodels, producing results that are concordant with the original joint IPM.

\input{tex-input/owls-example/0001-owls-simple-dag.tex}

### Survival analysis with time varying covariates and uncertain event times

Our second example considers the time to onset of respiratory failure (RF) amongst patients in intensive care units, and factors that influence the onset of RF.
A patient can be said to be experiencing RF if the ratio of the partial pressure of arterial blood oxygen (\paoii) to the faction of inspired oxygen (\fioii) is less than 300mmHg [@the_ards_definition_task_force_acute_2012], though this is not the only definition of RF.
Patients' \pfratio\ (P/F) ratios are typically measured only a few times a day.
The relative infrequency of P/F ratio data, when combined with the intrinsic variability in each individual's blood oxygen level, results in significant uncertainty in about the time of onset of RF.

Factors that influence the time to onset of RF are both longitudinal and time invariant.
Both types of data can be considered in _joint models_ [@rizopoulos_joint_2012], which are composed of two distinct submodels, one for each data type.
However, existing joint models are not able to incorporate the uncertainty surrounding the event time, which may result in overconfident and/or biased estimates of the parameters in the joint model.

Chained Markov melding offers a conceptually straightforward, Bayesian approach to incorporating uncertain event times into joint models.
Specifically, we consider the event time as a submodel-derived quantity from a hierarchical regression model akin to @lu_using_1993.
We call this submodel the _uncertain event time_ submodel and denote it $\pd_{1}(\phi_{1 \cap 2}, \psi_{1}, Y_{1})$, where $\phi_{1 \cap 2}$ incorporates the event time.
The survival submodel $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}, \psi_{2}, Y_{2})$ uses the event time within $\phi_{1 \cap 2}$, the common quantity, as the response.
We treat the longitudinal submodel, $\pd_{3}(\phi_{2 \cap 3}, \psi_{3}, Y_{3})$, separately from the survival submodel, as is common in two-stage joint modelling [@mauff_joint_2020], and denote the subject-specific parameters that also appear in the survival model as $\phi_{2 \cap 3}$.
Each of the $\modelindex = 1, 2, 3$ has submodel-specific data $Y_{\modelindex}$ and parameters $\psi_{\modelindex}$.
The high level submodel relationships are displayed as a DAG in Figure \ref{fig:surv-simple-dag}.
\input{tex-input/surv-example/0001-surv-simple-dag.tex}

It is in examples such as this one that we foresee the most use for chained Markov melding; a fully Bayesian approach is desired and the submodels are nontrivial in complexity, with no previously existing or obvious joint model.

## Markov melding

We now review Markov melding [@goudie_joining_2019] before detailing our proposed extension.
As noted in the introduction, Markov melding is a method for combining $\Nm$ submodels $\pd_{1}(\phi, \psi_{1}, Y_{1}), \ldots, \pd_{\Nm}(\phi, \psi_{\Nm}, Y_{\Nm})$ which share the same $\phi$.
When the submodel prior marginals $\pd_{\modelindex}(\phi)$ are identical, i.e. $\pd_{\modelindex}(\phi) = \pd(\phi)$ for all $\modelindex$, it is possible to combine the submodels using _Markov combination_ [@dawid_hyper_1993-1; @massa_combining_2010]
\input{tex-input/orig-markov-melding/0010-markov-combination.tex}\noindent
Markov combination is not immediately applicable when submodel prior marginals are distinct, so Goudie _et al._&nbsp;define a _marginal replacement_ procedure, where individual submodel prior marginals are replaced with a common marginal $\pd_{\text{pool}}(\phi) = h(\pd_{1}(\phi), \ldots, \pd_{\Nm}(\phi))$ which is the result of a pooling function $h$ that appropriately summarises all prior marginals (the choice of which is described below).
The result of marginal replacement is
\input{tex-input/orig-markov-melding/0011-marginal-replacement.tex}\noindent
Goudie _et al._&nbsp;show that $\pd_{\text{repl}, \modelindex}(\phi, \psi_{\modelindex}, Y_{\modelindex})$ minimises the Kullback–Leibler (KL) divergence between a distribution $\q(\phi, \psi_{\modelindex}, Y_{\modelindex})$ and $\pd_{\modelindex}(\phi, \psi_{\modelindex}, Y_{\modelindex})$ under the constraint that $\q(\phi) = \pd_{\text{pool}}(\phi)$, and that marginal replacement is valid when $\phi$ is a deterministic function of the other parameters in submodel $\modelindex$.
Markov melding joins the submodels via the Markov combination of the marginally replaced submodels
\input{tex-input/orig-markov-melding/0012-markov-melding.tex}

### Pooled prior

Goudie _et al._&nbsp;proposed forming $\pd_{\text{pool}}(\phi)$ using linear or logarithmic prior pooling [@ohagan_uncertain_2006; @genest_characterization_1986]
\input{tex-input/orig-markov-melding/0020-orig-pooling.tex}\noindent
where $\lambda = (\lambda_{1}, \ldots, \lambda_{\Nm})$ are nonnegative weights, which are chosen subjectively to ensure $\pd_{\text{pool}}(\phi)$ appropriately represents prior knowledge about the common quantity.
Two special cases of pooling are of particular interest.
_Product of experts (PoE) pooling_  [@hinton_training_2002] is a special case of logarithmic pooling that occurs when $\lambda_{\modelindex} = 1$ for all $\modelindex$.
_Dictatorial pooling_ is a special case of either pooling method when $\lambda_{\modelindex'} = 1$ and, for all $\modelindex \neq \modelindex'$, $\lambda_{\modelindex} = 0$.

# Chained model specification

Consider $\modelindex = 1, \ldots, \Nm$ submodels each with data $Y_{\modelindex}$ and parameters $\theta_{\modelindex}$ denoted $\pd_{\modelindex}(\theta_{\modelindex}, Y_{\modelindex})$, with $\Nm \geq 3$.
We assume that the submodels are connected in a manner akin to a chain and so can be ordered such that only 'adjacent’ submodels in the chain have parameters in common.
Specifically we assume that submodels $\modelindex$ and $\modelindex + 1$ have some parameter $\phi_{\modelindex \cap \modelindex + 1}$ in common for $\modelindex = 1, \ldots, \Nm - 1$.
For notational convenience define $\phi_{1} = \phi_{1 \cap 2}, \phi_{\Nm} = \phi_{\Nm-1 \cap \Nm}$ and $\phi_{\modelindex} = (\phi_{\modelindex - 1 \cap \modelindex}, \, \phi_{\modelindex \cap \modelindex + 1})$ for $\modelindex = 2, \ldots, \Nm - 1$, so that $\phi_{\modelindex} \subseteq \theta_{\modelindex}$ denotes the parameters in model $\modelindex$ shared with another submodel.
The submodel-specific parameters of submodel $\modelindex$ are thus $\psi_\modelindex = \theta_\modelindex \setminus \phi_\modelindex$.
Define the vector of all common quantities $\boldsymbol{\phi} = \bigcup_{\modelindex = 1}^{\Nm} \phi_{\modelindex} = (\phi_{1 \cap 2}, \phi_{2 \cap 3}, \ldots, \phi_{\Nm - 1 \cap \Nm})$ so that all elements in $\boldsymbol{\phi}$ are unique.
Further denote by $\boldsymbol{\phi}_{-\modelindex}$ the subvector of $\boldsymbol{\phi}$ excluding the $\modelindex$\textsuperscript{th} element.
It will also be convenient to define $\boldsymbol{\psi} = (\psi_{1}, \ldots, \psi_{\Nm})$ and likewise $\boldsymbol{Y} = (Y_{1}, \ldots, Y_{\Nm})$.
Note that all components of $\boldsymbol{\phi}, \boldsymbol{\psi}$ and $\boldsymbol{Y}$ may themselves be multivariate.
Additionally, because $\phi_{\modelindex \cap \modelindex + 1}$ may be a deterministic function of either $\theta_{\modelindex}$ or $\theta_{\modelindex + 1}$ we refer to $\phi_{\modelindex \cap \modelindex + 1}$ as a common parameter or a common quantity as appropriate.

All submodels, and marginal and conditional distributions thereof, have density functions that are assumed to exist and integrate to one.
When considering conditional distributions we assume that the parameter being conditioned on has support in the relevant region.
We define the $\modelindex$\textsuperscript{th} _subposterior_ as $\pd_{\modelindex}(\phi_{\modelindex}, \psi_{\modelindex} \mid Y_{\modelindex})$.

## Extending marginal replacement

We now define the chained melded model by extending the marginal replacement procedure to submodels linked in a chain-like way.
The proposed chained marginal replacement operation modifies the submodels to enforce a common prior for $\boldsymbol{\phi}$.
This consistency allows us to employ Markov combination to unite the submodels.

Specifically, the $\modelindex$\textsuperscript{th} marginally replaced submodel is
\input{tex-input/multiple-phi/0001-chained-marginal-replacement.tex}\noindent
where $\pd_{\text{pool}}(\boldsymbol{\phi}) = g(\pd_{1}(\phi_{1}), \pd_{2}(\phi_{2}), \ldots, \pd_{\modelindex}(\phi_{\modelindex}))$ is a pooling function that appropriately summarises all submodel prior marginals.
The second equality in Equation \eqref{eqn:chained-marginal-replacement} is because of the conditional independence $(\psi_{\modelindex}, Y_{\modelindex} \indep \boldsymbol{\phi}_{-\modelindex}) \mid \phi_{\modelindex}$ that exists due to the chained relationship between submodels.
It is important to note that $\pd_{\text{repl}, \modelindex}(\boldsymbol{\phi}, \psi_{\modelindex}, Y_{\modelindex})$ is defined on a larger parameter space than $\pd_{\modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$, as it includes $\boldsymbol{\phi}_{-\modelindex}$.

Define $\pd_{\text{repl}, \modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex}) = \int \pd_{\text{repl}, \modelindex}(\boldsymbol{\phi}, \psi_{\modelindex}, Y_{\modelindex})\text{d}\boldsymbol{\phi}_{-\modelindex}$.
Each marginally replaced submodel, as defined in Equation \eqref{eqn:chained-marginal-replacement}, minimises the following KL divergence[^kl-proof]
\input{tex-input/multiple-phi/0003-chained-marginal-replacement-kl.tex}\noindent
where $\pd_{\text{pool}}(\phi_{\modelindex}) = \int \pd_{\text{pool}}(\boldsymbol{\phi})\text{d}\boldsymbol{\phi}_{-\modelindex}$.
We can thus interpret $\pd_{\text{repl}, \modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$ as a minimally modified $\pd_{\modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$ which admits $\pd_{\text{pool}}(\phi_{\modelindex})$ as a marginal.
Note that it is the combination of $\pd_{\text{repl}, \modelindex}(\phi_{\modelindex}, \psi_{\modelindex}, Y_{\modelindex})$ and $\pd_{\text{pool}}(\boldsymbol{\phi}_{-\modelindex} \mid \phi_{\modelindex})$ that uniquely determine \eqref{eqn:chained-marginal-replacement}.

We form the chained melded model by taking the Markov combination of the marginally replaced submodels
\input{tex-input/multiple-phi/0010-melded-model-cond.tex}
Rewriting \eqref{eqn:melded-model-full} in terms of $\phi_{\modelindex \cap \modelindex + 1}$ for $\modelindex = 1, \ldots, \Nm - 1$ yields
\input{tex-input/multiple-phi/0011-melded-model-general.tex}
Finally, we use _chained melded posterior_ $\pd_{\text{meld}}(\boldsymbol{\phi}, \boldsymbol{\psi} \mid \boldsymbol{Y}) \propto \pd_{\text{meld}}(\boldsymbol{\phi}, \boldsymbol{\psi}, \boldsymbol{Y})$ to refer to posterior of the chained melded model conditioned on all data.

[^kl-proof]: This is shown in Appendix B of the online supplement to @goudie_joining_2019.

## Pooled prior

Specifying \eqref{eqn:melded-model-full} requires a joint prior for $\boldsymbol{\phi}$.
As in Markov melding we form the joint prior by pooling the marginal priors, selecting a pooling function $g$ that appropriately represents prior knowledge about the common quantities.
We define $\pd_{\text{pool}}(\boldsymbol{\phi})$ as a generic function of all prior marginals
\input{tex-input/pooled-prior-discussion/0030-pooled-prior-general-def.tex}
because we do not always wish to assume independence between the components of $\boldsymbol{\phi}$.

Two special cases of Equation \eqref{eqn:pooled-prior-general-def-two} are noteworthy.
Firstly, if all components of $\boldsymbol{\phi}$ are independent, then we can form $\pd_{\text{pool}}(\boldsymbol{\phi})$ as the product of $\Nm - 1$ standard pooling functions $h_{\modelindex}$ defined in Section \ref{pooled-prior}
\input{tex-input/pooled-prior-discussion/0010-bad-alternative-one.tex}\noindent
A second case, in between complete dependence \eqref{eqn:pooled-prior-general-def-two} and independence \eqref{eqn:bad-alternative-one}, is that if $\pd_{\modelindex}(\phi_{\modelindex - 1 \cap \modelindex}, \phi_{\modelindex \cap \modelindex + 1}) = \pd_{\modelindex}(\phi_{\modelindex - 1 \cap \modelindex}) \pd_{\modelindex}(\phi_{\modelindex \cap \modelindex + 1})$ then we can define
\input{tex-input/pooled-prior-discussion/0031-pooled-prior-split-def.tex}\noindent
without any additional assumptions.
That is, if any two consecutive components of $\boldsymbol{\phi}$ are independent in the submodel containing both of them, we can divide the pooled prior specification problem into two pooling functions.
The smaller number of arguments to $g_{1}$ and $g_{2}$ make it easier to choose appropriate forms for those functions.

Selecting a specific form of $g$ is not trivial given the many choices of functional form and pooling weights (the latter of which we discuss momentarily).
One complication is that standard linear and logarithmic pooling, as defined in Equations \eqref{eqn:orig-pooling-linear} and \eqref{eqn:orig-pooling-logarithmic}, are not immediately applicable when the submodel marginal distributions consider different quantities.
We now propose extensions to logarithmic, linear, and dictatorial pooling for use in the case of chained melding.

### Chained logarithmic pooling

Extending logarithmic pooling for chained Markov melding is straightforward.
We define the logarithmically pooled prior to be
\input{tex-input/multiple-phi/0050-pooled-prior-overall.tex}\noindent
with $K_{\text{log}}(\lambda) = \int \prod_{\modelindex = 1}^{\Nm} \pd_{\modelindex}(\phi_{\modelindex})^{\lambda_{\modelindex}} \text{d}\boldsymbol{\phi}$ for nonnegative weight vector $\lambda = (\lambda_{1}, \ldots, \lambda_{\Nm})$ and $\sum_{\modelindex = 1}^{\Nm} \lambda_{\modelindex} \geq 1$.
Note that \eqref{eqn:pooled-prior-overall} does not imply independence between the elements of $\boldsymbol{\phi}$ because
\input{tex-input/multiple-phi/0051-pooled-prior-log-alt-def.tex}\noindent
When $\lambda_{1} = \lambda_{2} = \ldots = \lambda_{\Nm} = 1$ we obtain a special case which we call product-of-experts (PoE) pooling [@hinton_training_2002].

### Chained linear pooling

Our generalisation of linear pooling to handle marginals of different quantities is a two step procedure.
The first step forms intermediary pooling densities via standard linear pooling, using appropriate marginals of the relevant quantity
\input{tex-input/multiple-phi/0080-M-model-linear-pooling.tex}\noindent
where $\lambda_{\modelindex} = (\lambda_{\modelindex, 1}, \lambda_{\modelindex, 2})$ are nonnegative pooling weights, and for $\modelindex = 2, \ldots, \Nm - 1$
\input{tex-input/multiple-phi/00801-M-model-linear-pooling-component-def.tex}\noindent
For $\modelindex = 1$ and $\modelindex = \Nm$ the relevant marginals are $\pd_{1}(\phi_{1 \cap 2})$ and $\pd_{\Nm}(\phi_{\Nm - 1 \cap \Nm})$.
In step two we form the pooled prior as the product of the intermediaries
\input{tex-input/multiple-phi/0055-silly-linear-overall.tex}\noindent
with $K_{\text{lin}}(\lambda) = \int \prod_{\modelindex = 1}^{\Nm - 1} \pd_{\text{pool}, \modelindex}(\phi_{\modelindex \cap \modelindex + 1}) \text{d}\boldsymbol{\phi}$, for $\lambda = (\lambda_{1}, \ldots, \lambda_{\Nm})$.
Clearly, this assumes prior independence amongst all components of $\boldsymbol{\phi}$ which may be undesirable, particularly if this independence was not present under one or more of the submodel priors.
We discuss extensions to linear pooling that enable prior dependence between the components of $\boldsymbol{\phi}$ in Section \ref{conclusion}.

### Dictatorial pooling

Chained Markov melding does not admit a direct analogue to dictatorial pooling as defined in Section \ref{pooled-prior} because not all submodel prior marginals contain all common quantities.
For example, consider the logarithmically pooled prior of Equation \eqref{eqn:pooled-prior-overall} with, say, the $\modelindex$\textsuperscript{th} entry in $\lambda$ set to $1$ and all others set to $0$.
This choice of $\lambda$ results in $\pd_{\text{pool}}(\boldsymbol{\phi}) = \pd(\phi_{\modelindex})$, which is flat for $\boldsymbol{\phi}_{-\modelindex}$.
It seems reasonable to require any generalisation of dictatorial pooling to result in a reasonable prior for all components in $\boldsymbol{\phi}$.
Such a generalisation should also retain the original intention of dictatorial pooling, i.e.&nbsp;'_the authoritative prior for_ $\phi_{\modelindex}$ _is_ $\pd_{\modelindex}(\phi_{\modelindex})$'.

We propose two possible forms of dictatorial pooling that satisfy the aforementioned criteria.
_Partial dictatorial pooling_ enforces a single submodel prior for the relevant components of $\boldsymbol{\phi}$, with no restrictions on the pooling of the remaining components; and _complete dictatorial pooling_ which requires selecting one of the two possible submodel priors for each component of $\boldsymbol{\phi}$.

Partial dictatorial pooling considers $\pd_{\modelindex}(\phi_{\modelindex})$ as the authoritative prior for $\phi_{\modelindex}= (\phi_{\modelindex - 1 \cap \modelindex}, \phi_{\modelindex \cap \modelindex + 1})$.
This results in,
\input{tex-input/pooled-prior-discussion/0040-dictatorial-one-submodel.tex}\noindent
where $g_{1}$ and $g_{2}$ are linear or logarithmic pooling functions as desired[^end-of-chain].

[^end-of-chain]: Some care is required if the authoritative submodel is $\pd_{\modelindex}$ for $\modelindex \in \{1, 2, M - 1, M\}$. If it is taken to be $\modelindex \in \{1, 2\}$, then $g_{1}$ does not exist, and additionally in the $\modelindex = 1$ case $\pd_{1}(\phi_{0 \cap 1}, \phi_{1 \cap 2}) \coloneqq \pd_{1}(\phi_{1 \cap 2})$. The $\modelindex \in \{\Nm - 1, \Nm\}$ cases have analogous definitions.

Complete dictatorial pooling requires the marginal pooled prior for each component in $\boldsymbol{\phi}$ to be chosen solely on the basis of only one of the two priors specified for it under the submodels.
For $\modelindex = 1, \ldots, \Nm - 1$, the $\modelindex$\textsuperscript{th} marginal of the pooled prior is either
\input{tex-input/pooled-prior-discussion/0043-dictatorial-def-three.tex}\noindent
If two consecutive marginals are chosen to have the same submodel prior, then we wish to retain the dependence between $\phi_{\modelindex - 1 \cap \modelindex}$ and $\phi_{\modelindex \cap \modelindex + 1}$ present in $\pd_{\modelindex}$.
We thus redefine consecutive terms so that
\input{tex-input/pooled-prior-discussion/0045-dictatorial-def-three-part-three.tex}\noindent
The complete dictatorially pooled prior is thus
\input{tex-input/pooled-prior-discussion/0047-complete-dict-joint-def.tex}\noindent
where, subject to the potential modification in Equation \eqref{eqn:dictatorial-def-three-part-three}, the terms in the product are as defined in Equation \eqref{eqn:dictatorial-def-three}.
For example, if $\Nm = 5$ and we wish to ignore $\pd_{2}$ and $\pd_{4}$ when constructing the pooled prior and instead associate $\phi_{1 \cap 2}$ with $\pd_{1}$, both $\phi_{2 \cap 3}$ and $\phi_{3 \cap 4}$ with $\pd_{3}$, and $\phi_{4 \cap 5}$ with $\pd_{5}$, then
\input{tex-input/pooled-prior-discussion/0046-dictatorial-def-two-example-one.tex}

### Pooling weights

Choosing values for the pooling weights is an important step in specifying the pooled prior [@carvalho_bayesian_2022; @abbas_kullback-leibler_2009; @rufo_log-linear_2012; @rufo_bayesian_2012].
Because appropriate values for the weights depend on the submodels being pooled and the information available _a priori_, universal recommendations are impossible, so we illustrate the impact of different choices in a straightforward example.
It is important that prior predictive visualisations of the pooled prior are produced [@gabry_visualization_2019; @gelman_bayesian_2020] to guide the choice of pooling weights and ensure that the result suitably represents the available information.
Figure \ref{fig:pooled_densities_plot} illustrates how $\lambda$ and the choice of pooling method impacts $\pd_{\text{pool}}(\boldsymbol{\phi})$ when pooling normal distributions.
Specifically, we consider $\Nm = 3$ submodels and pool
\input{tex-input/multiple-phi/0061-marginal-gaussian-example.tex}\noindent
where $\text{N}(\phi; \mu, \sigma^{2})$ is the normal density function with mean $\mu$ and variance $\sigma^{2}$ (or covariance matrix where appropriate).
The two dimensional density function $\pd_{2}$ has an additional parameter $\rho$, which controls the intra-submodel marginal correlation.
We set $\mu_{1} = -2.5, \mu_{2} = \left[\mu_{2, 1} \,\, \mu_{2, 2}\right]' = \left[0 \,\, 0\right]', \mu_{3} = 2.5, \sigma_{1}^{2} = \sigma_{2}^{2} = \sigma_{3}^{2} = 1$ and $\rho = 0.8$.
In the logarithmic case we set $\lambda_{1} = \lambda_{3}$ and parameterise $\lambda_{2} = 1 - 2\lambda_{1}$, so that $\lambda_{1} + \lambda_{2} + \lambda_{3} = 1$ whilst limiting ourselves to varying only $\lambda_{1}$.
Similarly, in the linear case we set $\lambda_{1, 1} = \lambda_{2, 2} = \lambda_{1}$ and $\lambda_{1, 2} = \lambda_{2, 1} = 1 - 2 \lambda_{1}$.
We consider 5 evenly spaced values of $\lambda_{1} \in [0, 0.5]$.

```{r pooled_densities_plot, out.width = '0.95\\linewidth', fig.cap = "Contour plots of $\\pd_{\\text{pool}}(\\boldsymbol{\\phi})$ (red) under logarithmic and linear pooling (left and right column respectively). The three original densities $\\pd_{1}(\\phi_{1 \\cap 2})$, $\\pd_{3}(\\phi_{2 \\cap 3})$ and $\\pd_{2}(\\phi_{1 \\cap 2}, \\phi_{2 \\cap 3})$ are shown in blue, with the univariate densities shown on the appropriate axis. The pooling weight parameter $\\lambda_{1}$ is indicated in the plot titles."}
knitr::include_graphics("plots/pooling-tests/version-two.pdf")
```

For both pooling methods, as the weight $\lambda_{1}$ associated with models $\pd_{1}$ and $\pd_{3}$ increases, the relative contributions of $\pd_{1}(\phi_{1 \cap 2})$ and $\pd_{3}(\phi_{2 \cap 3})$ increase.
Note the lack of correlation in $\pd_{\text{pool}}$ under linear pooling (right column of Figure \ref{fig:pooled_densities_plot}) due to Equation \eqref{eqn:silly-linear-overall}.
A large, near-flat plateau is visible in the $\lambda_{1} = 0.25$ and $\lambda_{1} = 0.375$ cases, which is a result of the mixture of four, 2-D normal distributions that linear pooling produces in this example.
The logarithmic pooling process produces a more concentrated prior for small values of $\lambda_{1}$, and does not result in _a priori_ independence between $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
Appendix&nbsp;\ref{log-pooling-gaussian-densities} shows analytically that $\lambda_{2}$ controls the quantity of correlation present in $\pd_{\text{pool}}$ in this setting.

# Posterior estimation

We now present a multi-stage MCMC method for generating samples from the melded posterior.
Whilst the melded posterior is a standard Bayesian posterior and so can, in principle, be targeted using any suitable Monte Carlo method, in practice this may be cumbersome or infeasible.
More specifically, it may be feasible to fit each submodel separately using standard methods, but when the submodels are combined -- either through Markov melding, or by expanding the definition of one submodel to include another -- the computation required to estimate the posterior in a single step poses an insurmountable barrier.
In such settings we can employ multi-stage posterior estimation methods including @tom:etal:10, @lunn:etal:13, @hooten_making_2019, and @mauff_joint_2020.
We propose a multi-stage strategy that uses the chain-like relationship to both avoid evaluating all submodels simultaneously, and parallelise the computation required in the first stage to produce posterior samples in less time than an equivalent sequential method[^sequential].
Avoiding concurrently evaluating all submodels also enables the reuse of existing software, minimising the need for custom submodel and/or sampler implementations.

We also describe an approximate method, where stage one submodels are summarised by normal distributions for use in stage two.

We consider the $\Nm = 3$ case, as this setting includes both of our examples.
Our approach can be extended to $\Nm > 3$ settings, although we anticipate that it is unlikely to be suitable for large $\Nm$ settings.
We discuss some of difficulties associated with generic, parallel methodology for efficient posterior sampling in Section \ref{conclusion}.

[^sequential]: For completeness, Appendix \ref{sequential-sampler} describes such a sequential MCMC sampler. We do not use the sequential sampler in this paper.

## Parallel sampler

Our proposed strategy involves obtaining in stage one samples from submodels 1 and 3 in parallel.
Stage two reuses these samples in a Metropolis-within-Gibbs sampler, which targets the full melded posterior.
The stage specific targets are displayed in Figure \ref{fig:parallel-dag}.

\input{tex-input/dc-sampler/0001-parallel-dag.tex}

The parallel sampler assumes that the pooled prior decomposes such that
\input{tex-input/dc-sampler/0002-parallel-decomposition.tex}\noindent
All pooled priors trivially satisfy \eqref{eqn:parallel-decomposition} by assuming $\pd_{\text{pool}, 1}(\phi_{1 \cap 2})$ and $\pd_{\text{pool}, 3}(\phi_{2 \cap 3})$ are improper and/or flat distributions.
Alternatively we may choose $\pd_{\text{pool}, 1}(\phi_{1 \cap 2}) = \pd_{1}(\phi_{1 \cap 2})$ and $\pd_{\text{pool}, 3}(\phi_{2 \cap 3}) = \pd_{3}(\phi_{2 \cap 3})$, with appropriate adjustments to $\pd_{\text{pool}, 2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.
This choice targets, in stage one, the subposteriors of $\pd_{1}$ and $\pd_{3}$ under their original prior for $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ respectively.

#### Stage one

Two independent, parallel sampling processes occur in stage one.
Terms from the melded model that pertain to $\pd_{1}$ and $\pd_{3}$ are isolated
\input{tex-input/dc-sampler/0021-stage-one-targets.tex}\noindent
and targeted using standard MCMC methodology.
Assuming that the stage one chains converge and after discarding warmup iterations --possibly thinning them, if within-chain correlation is high-- we obtain $N_{1}$ samples from $\{(\phi_{1 \cap 2}, \psi_{1})_{n}\}_{n = 1}^{N_{1}}$ from $\pd_{\text{meld}, 1}(\phi_{1 \cap 2}, \psi_{2} \mid Y_{1})$, and $N_{3}$ samples $\{(\phi_{2 \cap 3}, \psi_{3})_{n}\}_{n = 1}^{N_{3}}$ from $\pd_{\text{meld}, 3}(\phi_{2 \cap 3}, \psi_{3} \mid Y_{3})$.
For well mixing stage one Markov chains targeting the correct stationary distribution, and large values of $N_{1}$ or $N_{3}$, the stage one samples accurately approximate the subposteriors.

#### Stage two

Stage two targets the melded posterior of Equation \eqref{eqn:melded-model-full} using a Metropolis-within-Gibbs sampler, where the proposal distributions are
\input{tex-input/dc-sampler/0031-stage-two-proposals.tex}\noindent
where $\q(\psi_{2}^{*} \mid \psi_{2})$ is a generic proposal distribution for $\psi_{2}$.
We draw an index $n_{1}^{*}$ uniformly from $\{1, \ldots, N_{1}\}$ and use the corresponding value $(\phi_{1 \cap 2}^{*}, \psi_{1}^{*})_{n_{1}^{*}}$ as the proposal, doing likewise for $n_{3}^{*}$ and $(\phi_{2 \cap 3}^{*}, \psi_{3}^{*})_{n_{3}^{*}}$.
The acceptance probabilities for these updates are
\input{tex-input/dc-sampler/0032-stage-two-acceptance.tex}\noindent
where $\alpha(x, z)$ denotes the probability associated with a move from $z$ to $x$.
Note that all stage two acceptance probabilities only contain terms from the second submodel and the pooled prior, and thus do not depend on $\psi_{1}$ or $\psi_{3}$.
If a move is accepted then we also store the index, i.e. $n_{1}^{*}$ or $n_{3}^{*}$, associated with the move, otherwise we store the current value of the index.
The stored indices are used to appropriately resample $\psi_{1}$ and $\psi_{3}$ from the stage one samples.

## Normal approximations to submodel components

Normal approximations are commonly employed to summarise submodels for subsequent use in more complex models.
For example, two-stage meta-analyses often use a normal distribution centred on each studies' effect estimate [@burke_meta-analysis_2017].
Suppose we employ such an approximation to summarise the prior and posterior of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ under $\pd_{1}$ and $\pd_{3}$ respectively.
In addition, assume that (a) such approximations are appropriate for $\pd_{1}(\phi_{1 \cap 2}), \pd_{1}(\phi_{1 \cap 2} \mid Y_{1}), \pd_{3}(\phi_{2 \cap 3})$, and $\pd_{3}(\phi_{2 \cap 3} \mid Y_{3})$, (b) we are not interested in $\psi_{1}$ and $\psi_{3}$, and can integrate them out of all relevant densities, and (c) we employ our second form of dictatorial pooling and choose $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ as the authoritative prior.
The latter two assumptions imply that the melded posterior of interest is proportional to
\input{tex-input/multiple-normal-approximation/0010-normal-approx-melded-posterior-target.tex}

Denote the normal approximation of $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ as $\widehat{\pd}_{1}(\phi_{1 \cap 2} \mid \widehat{\mu}_{1}, \widehat{\Sigma}_{1})$, which is a normal distribution with mean $\widehat{\mu}_{1}$ and covariance matrix $\widehat{\Sigma}_{1}$.
The corresponding normal approximation of the prior $\pd_{1}(\phi_{1 \cap 2})$ is $\widehat{\pd}_{1}(\phi_{1 \cap 2} \mid \widehat{\mu}_{1, 0}, \widehat{\Sigma}_{1, 0})$.
The equivalent approximations for the subposterior and prior of $\pd_{3}$ are $\widehat{\pd}_{3}(\phi_{2 \cap 3} \mid \widehat{\mu}_{3}, \widehat{\Sigma}_{3})$ and $\widehat{\pd}_{3}(\phi_{2 \cap 3} \mid \widehat{\mu}_{3, 0}, \widehat{\Sigma}_{3, 0})$ respectively.
Substituting in the approximations and using standard results for Gaussian density functions (see @bromiley_products_2003 and Appendix \ref{normal-approximation-calculations}) results in
\input{tex-input/multiple-normal-approximation/0040-final-normal-approx.tex}\noindent
where
\input{tex-input/multiple-normal-approximation/0050-in-text-normal-approx.tex}\noindent
Standard MCMC methods can be used to sample from the approximate melded posterior.
If instead we opt for product-of-experts pooling, all $\widehat{\mu}_{\text{de}}$ and $\widehat{\Sigma}_{\text{de}}$ terms disappear from the parameter definitions in Equation \eqref{eqn:in-text-normal-approx}.

# An integrated population model for little owls

We now return to the integrated population model (IPM) for the little owls introduced in Section \ref{an-integrated-population-model-for-little-owls}.
@finke_efficient_2019 consider a number of variations on the original model of @schaub_local_2006 and @abadi_estimation_2010: here we consider only the variant from @finke_efficient_2019 with the highest marginal likelihood (Model 4 of their online supplement).
This example is particularly interesting to us as, for a certain choice of pooling function and pooling weights, the chained Markov melded model and the IPM are identical.
This coincidence allows us to use the posterior from the IPM as a benchmark for our multi-stage sampler.

Before we detail the specifics of each submodel, we must introduce some notation.
Data and parameters are stratified into two age-groups $a \in \{J, A\}$ where $J$ denotes juvenile owls (less than one year old) and $A$ adults, two sexes $s \in \{M, F\}$, and observations occur annually at times $t \in \{1, \ldots, T\}$, with $T = 25$.
The sex- and age-specific probability of an owl surviving from time $t$ to $t + 1$ is $\delta_{a, s, t}$, and the sex-specific probability of a previously captured owl being recaptured at time $t + 1$ is $\pi_{s, t + 1}$ so long as the owl is alive at time $t + 1$.

## Capture recapture: $\pd_{1}$

Capture-recapture data pertain to owls that are released at time $t$ (having been captured and tagged), and then recaptured at time $u = t + 1, \dots, T$, or not recaptured before the conclusion of the study, in which case $u = T + 1$.
Define $M_{a, s, t, u}$ as the number of owls of age-group $a$ and sex $s$ released at time $t$ and recaptured at time $u$.
We aggregate these observations into age- and sex-specific matrices $\boldsymbol{M}_{a, s}$, with $T$ rows, corresponding to release times, and $T + 1$ columns, corresponding to recapture times.
Let $R_{a, s, t} = \sum_{u = 1}^{T + 1} M_{a, s, t, u}$ be the number of owls released at time $t$, i.e. a vector containing the row-wise sum of the entries in $\boldsymbol{M}_{a, s}$.
The recapture times for owls released at time $t$ follow an age- and sex-specific multinomial likelihood
\input{tex-input/owls-example/0010-capture-recapture-submodel.tex}\noindent
with probabilities $\boldsymbol{Q}_{a, s, t} = (Q_{a, s, t, 1}, \ldots, Q_{a, s, t, T + 1})$ such that
\input{tex-input/owls-example/0011-multinomial-probabilities.tex}

## Count data model: $\pd_{2}$

To estimate population abundance, a two level model is used: the first level models the observed (counted) number of females at each point in time denoted $y_{t}$, with a second, latent process modelling the total number of females in population.
The observation model is
\input{tex-input/owls-example/0021-observation-process.tex}\noindent
where we denote the number of juvenile and adult females in the population at time $t$ as $x_{J, t}$ and $x_{A, t}$ respectively, with $x_{t} = x_{J, t} + x_{A, t}$.
If $\text{sur}_{t}$ adult females survive from time $t - 1$ to time $t$, and $\text{imm}_{t}$ adult females immigrate over the same time period, then the latent, population level model is
\input{tex-input/owls-example/0020-count-data-submodel.tex}\noindent
where $\eta_{t}$ is the immigration rate.
The initial population sizes $x_{J, 1}$ and $x_{A, 1}$ have independent discrete uniform priors on $\{0, 1, \ldots, 50\}$.
If $x_{t - 1} = 0$ then we assume that the Poisson and binomial distributions become point masses at zero.

## Fecundity: $\pd_{3}$

The fecundity submodel considers the number of breeding females at time $t$ denoted $N_{t}$, and the number of chicks produced that survive and leave the nest denoted $n_{t}$.
A Poisson model is employed to estimate fecundity (reproductive rate) $\rho$
\input{tex-input/owls-example/0030-fecundity-submodel.tex}

## Parameterisation and melding quantities

@abadi_estimation_2010 parameterise the time dependent quantities via linear predictors to minimise the number of parameters in the submodels.
The specific parameterisation of @finke_efficient_2019 we employ is
\input{tex-input/owls-example/0040-parameterisation-info.tex}\noindent
thus the quantities in common between the submodels are $\phi_{1 \cap 2} = (\alpha_{0}, \alpha_{2})$ and $\phi_{2 \cap 3} = \rho$.
To align the notation of this example with our chained melding notation we define, for all permitted values of $a, s$ and $t$, $Y_{1} = (\boldsymbol{M}_{a, s})$, $\psi_{1} = \left(\alpha_{1}, \alpha_{4}, (\alpha_{5, u})_{u = 2}^{T}\right)$; $Y_{2} = (y_{t})$, $\psi_{2} = (x_{J, t}, \alpha_{6}, \text{sur}_{t}, \text{imm}_{t})$; and $Y_{3} = (N_{t}, n_{t})$, $\psi_{3} = \varnothing$.
Note that the definition of $\phi_{1 \cap 2}$ does not include $\alpha_{1}$ as it is male specific and does not exist in $\pd_{2}$.
The model variant of @finke_efficient_2019 we consider does not include $\alpha_{3}$, and for comparability we keep the other parameter indices the same.

## Priors

We use the priors of @finke_efficient_2019 for the parameters in each submodel.
Denote $\boldsymbol{\alpha} = (\alpha_{0}, \alpha_{1}, \alpha_{2}, \alpha_{4}, \alpha_{6})$.
In both $\pd_{1}$ and $\pd_{2}$ the elements of $\boldsymbol{\alpha}$ are assigned independent $\text{Normal}(0, 2^2)$ priors truncated to $[-10, 10]$.
The time varying recapture probabilities $\alpha_{5, u}$ also have $\text{Normal}(0, 2^2)$ priors truncated to $[-10, 10]$.
A $\text{Uniform}(0, 10)$ prior is assigned to $\rho$ in $\pd_{2}$ and $\pd_{3}$.

To completely specify $\pd_{\text{meld}}$ we must choose how to form $\pd_\text{pool}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.
We form $\pd_\text{pool}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ using three different pooling methods and estimate the melded posterior in each case.
The first pooling method is product-of-experts (PoE) pooling, which is logarithmic pooling with $\lambda = (1, 1, 1)$, and we denote the melded posterior as $\pd_{\text{meld, PoE}}$.
We also use logarithmic pooling with $\lambda = (\frac{1}{2}, \frac{1}{2}, \frac{1}{2})$, which is denoted $\pd_{\text{meld, log}}$ and results in the chained melded model being identical to the IPM.
The final pooling method is linear pooling with $\lambda = (\frac{1}{2}, \frac{1}{2}, \frac{1}{2}, \frac{1}{2})$, denoted $\pd_{\text{meld, lin}}$.

## Posterior estimation

We estimate the melded posterior -- $\pd_{\text{meld}}(\boldsymbol{\phi}, \boldsymbol{\psi} \mid \boldsymbol{Y})$, proportional to Equation \eqref{eqn:melded-model-full} -- using both the parallel sampler (Section \ref{parallel-sampler}) and normal approximation (Section \ref{normal-approximations-to-submodel-components}).
This allows us to use pre-existing implementations of the submodels.
Specifically, the capture-recapture submodel is written in BUGS [@lunn_bugs_2009] and sampled via `rjags` [@plummer_rjags_2019].
The fecundity submodel is written in Stan [@carpenter_stan_2017] and sampled via `rstan` [@stan_development_team_rstan_2021].
The count data submodel is also written in `BUGS`, and we reuse this implementation in stage two of the multi-stage sampler via `NIMBLE` [@de_valpine_programming_2017] and its `R` interface [@nimble_development_team_nimble_2019].
The approximate melded posterior obtained by Section \ref{normal-approximations-to-submodel-components} is sampled using `rjags`.
Code and data for this example, as well as trace plots and numerical convergence measures [@vehtari_rank-normalization_2020] for both stages of the parallel sampling process, are available in the accompanying online repository[^repo].

## Results

```{r phi_subpost, fig.cap = "Top row: credible intervals for $\\phi_{1 \\cap 2} = (\\alpha_{0}, \\alpha_{2})$ and $\\phi_{2 \\cap 3} = \\rho$ from the posterior of the original integrated population model $\\pd_{\\text{ipm}}$, and the individual subposteriors from submodels $\\pd_{1}, \\pd_{2}$, and $\\pd_{3}$. Bottom row: credible intervals for the same quantities, but with a different x-axis scale, from the original IPM (repeated from top row); the chained melded posteriors using product-of-experts pooling, logarithmic pooling, and linear pooling denoted $\\pd_{\\text{meld}}$, $\\pd_{\\text{meld, log}}$ and $\\pd_{\\text{meld, lin}}$; and the melded posterior using the normal approximation $\\widehat{\\pd}_{\\text{meld}}$. Intervals are 50\\%, 80\\%, 95\\%, and 99\\% wide."}
knitr::include_graphics("plots/owls-example/subposteriors-both-patchwork.pdf")
```

We empirically validate our methodology and sampler by comparing the melded posterior samples to a large sample -- 6 chains, each containing $1 \times 10^5$ post-warmup iterations -- from the original IPM posterior.
Similarity in the posteriors is expected as the IPM is effectively the joint model we wish to approximate with the chained melded model. It is simply fortunate, from a modelling standpoint, that this example's joint model is easy to construct and computationally feasible with standard tools.
Note that under logarithmic pooling with $\lambda = (\frac{1}{2}, \frac{1}{2}, \frac{1}{2})$ the melded posterior is identical to the original IPM, so any differences between the two posteriors are attributable to the multi-stage sampler.
Figure \ref{fig:phi_subpost} depicts the posterior credible intervals [@gabry_bayesplot_2021; @kay_tidybayes_2020] for the common quantities from the individual submodels, the melded models, and the original IPM.
The top row in Figure \ref{fig:phi_subpost} indicates that the count data alone ($\pd_{2}$) contain minimal information about $\alpha_{0}, \alpha_{2}$ and $\rho$; incorporating the data from the other submodels is essential for precise estimates.

The multi-stage sampler works well by producing melded posterior estimates generally similar to the original IPM estimate, and are near identical for logarithmic pooling.
PoE pooling produces the posterior most different from the original IPM, as it yields a prior for $(\alpha_{0}, \alpha_{2})$ that is more concentrated around zero than the other pooling methods.
The lack of large differences between the melded posteriors that use different pooled priors indicates that the prior has almost no effect on the posterior.
The similarity of the approximate approach ($\widehat{\pd}_{\text{meld}}$ - bottom row of Figure \ref{fig:phi_subpost}) to the melding approaches suggests that the normal approximations are good summaries of the subposteriors, and that the approximate melding procedure of Section \ref{normal-approximations-to-submodel-components} is suitable for this example.

# Survival analysis with time varying covariates and uncertain event times

We return now to the respiratory failure example introduced in Section \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times}.
Our intention is to illustrate the application of chained Markov melding to an example of realistic complexity, and explore empirically the importance of accounting for all sources of uncertainty by comparing chained Markov melding to equivalent analyses which use only a point estimate summary of the uncertainty.
Specifically, event times and indicators are a noninvertible function of other parameters in the first submodel, and are an uncertain response in the survival submodel.
Chained Markov melding enables us to specify a suitable joint model despite these complications.

There are $i = 1, \ldots, N$ individuals in the data set.
Each individual is admitted to the ICU at time $0$, and is discharged or dies at time $C_{i}$.
See Appendix \ref{cohort-selection-criteria} for information on how the $N = 37$ individuals were selected from MIMIC-III [@johnson_mimic-iii_2016].

## P/F ratio submodel (B-spline): $\pd_{1}$

The first submodel fits a B-spline to the \pfratio\ data to calculate if and when an individual experiences respiratory failure.
Each individual has \pfratio\ ratio observations $z_{i, j}$ (in units of mmHg) at times $t_{i, j}$, with $j = 1, \ldots, J_{i}$.
For each individual denote the vector of observations $\boldsymbol{z}_{i} = (z_{i, 1}, \ldots, z_{i, J_{i}})$ and observation times $\boldsymbol{t}_{i} = (t_{i, 1}, \ldots, t_{i, J_{i}})$.
To improve computational performance, we standardise the P/F ratio data for each individual such that $z_{i, j} = \frac{\tilde{z}_{i, j} - \overline{z}_{i}}{\hat{s}_{i}}$, where $\tilde{z}_{i, j}$ is the underlying unstandardised observation with mean $\overline{z}_{i}$ and standard deviation $\hat{s}_{i}$.
Similarly we rescale the threshold for respiratory failure: $\tau_{i} = \frac{300 - \overline{z}_{i}}{\hat{s}_{i}}$.

We choose to model the P/F ratio using cubic B-splines and 7 internal knots, and do not include an intercept column in the spline basis [for background on B-splines see: Chapter 2 in @hastie_generalized_1999; and the supplementary material of @wang_shape-restricted_2021].
The internal knots are evenly spaced between two additional boundary knots at $\min(\boldsymbol{t}_{i})$ and $\max(\boldsymbol{t}_{i})$.
These choices result in $k = 1, \ldots, 10$ spline basis terms per individual, with coefficients $\zeta_{i, k}$ where $\boldsymbol{\zeta}_{i} = (\zeta_{i, 1}, \ldots, \zeta_{i, 10})$.
We denote the individual specific B-spline basis evaluated at time $t_{i, j}$ as $B_{i}(t_{i, j}) \in [0, \infty)^{10}$ so that the submodel can be written as
\input{tex-input/mimic-example/0010-pf-model-def.tex}\noindent
We employ a weakly informative prior for the intercept $\beta_{0, i} \sim \text{N}(0, 1^2)$, a heavy tailed distribution for the error term[^pfoutliers] $\varepsilon_{i, j} \sim t_{5}(0, \omega_{i})$, and a weakly informative half-normal prior for the unknown scale parameter $\omega_{i} \sim \text{N}_{+}(0, 1^2)$.
For the spline basis coefficients we set $\zeta_{i, 1} \sim \text{N}(0, 0.1^2)$, and for $k = 2, \ldots, 10$ we employ the random-walk prior $\zeta_{i, k} \sim \text{N}(\zeta_{i, k - 1}, 0.1^2)$ from @kharratzadeh_splines_2017.

[^pfoutliers]: P/F data contain many outliers for, amongst many possible reasons, arterial/venous blood sample mislabelling; incorrectly recorded oxygenation support information; and differences between sample collection time, lab result time, and the observation time as recorded in the EHR.

We identify that a respiratory failure event occurred (which we denote by $d_{i} = 1$) at event time $T_{i}$ if a solution to the following optimisation problem exists
\input{tex-input/mimic-example/0012-event-time-model-def.tex}\noindent
We attempt to solve Equation \ref{eqn:event-time-model-def} using a standard multiple root finder [@soetaert_rootsolve_2020].
If there are no roots then the individual died or was discharged before respiratory failure occurred so we set $T_{i} = C_{i}$ and $d_{i} = 0$.
The relationship between $T_{i}$ and other model coefficients is displayed in the left hand panel of Figure \ref{fig:submodel-schematics}.

\begin{figure}
  \begin{minipage}{.499\textwidth}
    \input{tex-input/mimic-example/0090-pf-schematic.tex}
  \end{minipage}%
  \begin{minipage}{.499\textwidth}
    \input{tex-input/mimic-example/0091-fluid-schematic.tex}
  \end{minipage}
  \caption{Parameters and form for the P/F ratio submodel ($\pd_{1}$, left) and cumulative fluid submodel ($\pd_{3}$, right).}
  \label{fig:submodel-schematics}
\end{figure}

## Cumulative fluid submodel (piecewise linear) $\pd_{3}$

The rate of fluid administration reflects the clinical management of patients by ICU staff, and hence changes to the rate reflect decisions to change treatment strategy.
We employ a breakpoint regression model to capture the effect of such decisions, and consider only one breakpoint as this appears sufficient to fit the observed data.
Specifically, we model the 8-hourly cumulative fluid balance data $x_{i, l}$ (in litres) at times $u_{i, l}$, $l = 1, \ldots, L_{i}$.
The cumulative data are derived from the raw fluid input/output observations, which we detail in Appendix&nbsp;\ref{calculating-the-cumulative-fluid-balance-from-the-raw-fluid-data}.
We denote the complete vector of observations by $\boldsymbol{x}_{i} = (x_{i, 1}, \ldots, x_{i, L_{i}})$ and times by $\boldsymbol{u}_{i} = (u_{i, 1}, \ldots, u_{i, L_{i}})$.

We assume a piecewise linear model with $\eta_{0, i}$ as the value at the breakpoint at time $\kappa_{i}$, slope $\eta_{1, i}^{b}$ before the breakpoint, and slope $\eta_{1, i}^{a}$ after the breakpoint.
We write this submodel as
\input{tex-input/mimic-example/0020-piecewise-fluid-model.tex}\noindent
It will be useful to refer to the fitted value of this submodel at arbitrary time as $m_{i}(t)$.
We assume a weakly informative prior for the error term $\epsilon_{i, l} \sim \text{N}(0, \sigma^{2}_{x, i})$, with individual-specific error variances $\sigma_{x, i} \sim \text{N}_{+}(0, 5^2)$,
and specific, informative priors for the slope before the breakpoint $\eta^{b}_{1, i} \sim \text{Gamma}(1.53, 0.24)$ and after $\eta^{a}_{1, i} \sim \text{Gamma}(1.53, 0.24)$.
An appropriate prior for $\kappa_{i}$ and $\eta_{0, i}$ is challenging to specify due to the relationship between the two parameters and the individual-specific support for $\kappa_{i}$.
We address both challenges by reparameterisation, resulting in a prior for $\kappa_{i}$ that, in the absence of other information, places the breakpoint in the middle of an individual's ICU stay, and a prior for $\eta_{0, i}$ that captures the diverse pathways into ICU that an individual can experience.
Details and justifications for all the informative priors are available in Appendix&nbsp;\ref{priors-and-justification-for-the-cumulative-fluid-submodel}.
Figure \ref{fig:submodel-schematics} displays the parameters and their relationship to the fitted regression line.

## Survival submodel $\pd_{2}$

The rate at which fluid is administered is thought to influence the time to respiratory failure [@seethala_early_2017], so we explore this relationship using a survival model.
Individuals experience respiratory failure ($d_{i} = 1$) at time $0 < t < C_{i}$, or are censored $(d_{i} = 0, t = C_{i})$.
We assume a Weibull hazard with shape parameter $\gamma$ for the event times.
All individuals have baseline (time invariant) covariates $w_{i, a}$, $a = 1, \ldots, A$, with $\boldsymbol{w}_{i} = (1, w_{i, 1}, \ldots, w_{i, A})$ (i.e. including an intercept term), and common coefficients $\boldsymbol{\theta} =  (\theta_{0}, \ldots, \theta_{A})$.
The hazard is assumed to be influenced by these covariates and the rate of increase $\frac{\partial}{\partial t} m_{i}(t)$ in the cumulative fluid balance.
The strength of the latter relationship is captured by $\alpha$.
Hence, the hazard is
\input{tex-input/mimic-example/0030-surv-submodel-def.tex}\noindent
The survival function at an individual's observed event time and status, $(T_{i}, d_{i})$, denoted $S_{i}(T_{i}) = \exp\{-\int_{0}^{T_{i}}h_{i}(u)\text{d}u\}$, has an analytic form which we derive in Appendix&nbsp;\ref{analytic-form-for-the-survival-function}.
Thus, the likelihood for individual $i$ is
\begin{equation}
  \pd(T_{i}, d_{i} \mid \gamma, \boldsymbol{\theta}, \alpha, \kappa_{i}, \eta_{1, i}^{b}, \eta_{1, i}^{a}, \boldsymbol{w}_{i}) = h_{i}(T_{i})^{d_{i}} S_{i}(T_{i}), \\
\end{equation}
where we suppress the dependence on the parameters on the right hand side for brevity.

Our priors, which we justify in Appendix&nbsp;\ref{p2-prior-justification}, for the submodel specific parameters are $\gamma \sim \text{Gamma}(9.05, 8.72)$, $\alpha \sim \text{SkewNormal}(0, 0.5, -2)$, $\theta_{a} \sim \text{SkewNormal}(0, 0.5, -1)$, and $\theta_{0} \sim \text{N}(\hat{E}, 0.5^2)$ where $\hat{E}$ is the log of the crude event rate [@brilleman_bayesian_2020].
We adopt the same priors as the cumulative fluid balance submodel for $\kappa_{i}, \eta_{1, i}^{b}$, and $\eta_{1, i}^{a}$.

## Chained Markov melding details

```{r pf_fit_and_fluid_fit_plot, fig.cap = 'The P/F ratio data ($Y_{1}$, top row); cumulative fluid data ($Y_{3}$, bottom row); subposterior means and 95\\% credible intervals for each of the submodels (black solid lines and grey intervals); and stage one event times ($T_{i}$, red rug in the top row) for individuals $i = 17$ and $29$.'}
knitr::include_graphics('plots/mimic-example/combined-pf-fluid-small-new.pdf')
```

To combine the submodels with chained Markov melding we must define the common quantities $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
We meld $\pd_{1}$ and $\pd_{2}$ by treating the derived event times and indicators $\{(T_{i}, d_{i})\}_{i = 1}^{N}$ under $\pd_{1}$ as the "response", i.e.&nbsp;event times, in $\pd_{2}$.
Care is required when defining $\phi_{1 \cap 2}$ under $\pd_{1}$ as it is a deterministic function of $\beta_{0, i}$ and  $\boldsymbol{\zeta}_{i}$.
Define $\chi_{1, i} = (\beta_{0, i}, \boldsymbol{\zeta}_{i})$ and $\phi_{1 \cap 2, i} = f(\chi_{1, i}) = (T_{i}, d_{i})$, where $f$ is the output from attempting to solve Equation \eqref{eqn:event-time-model-def}, so that $\phi_{1 \cap 2} = (f(\chi_{1, i}), \ldots, f(\chi_{1, N}))$.
The parameters shared by Equations \eqref{eqn:piecewise-fluid-model} and \eqref{eqn:surv-submodel-def-two} constitute $\phi_{2 \cap 3} = (\eta^{b}_{1, i}, \eta^{a}_{1, i}, \kappa_{i})_{i = 1}^{N}$.

To completely align with our chained melding notation we also define, for the P/F submodel, $Y_{1} = (\boldsymbol{z}_{i}, \boldsymbol{t}_{i})_{i = 1}^{N}$ and $\psi_{1} = (\omega_{i})_{i = 1}^{N}$, noting that $\psi_{1}$ and $(\chi_{1, i}, \ldots, \chi_{1, N})$ have no components in common.
For the cumulative fluid submodel we define $Y_{3} = (\boldsymbol{x}_{i}, \boldsymbol{u}_{i})_{i = 1}^{N}$, and $\psi_{3} = (\eta_{0, i}, \sigma^{2}_{x, i})_{i = 1}^{N}$.
Finally, for the survival submodel we define $Y_{2} = (\boldsymbol{w}_{i})_{i = 1}^{N}$ and $\psi_{2} = (\gamma, \boldsymbol{\theta}, \alpha)$.

## Pooling and estimation

We consider logarithmic pooling with $\lambda = (\frac{4}{5}, \frac{4}{5}, \frac{4}{5})$ (any smaller value of $\lambda$ results in a prior that is so uninformative that it causes computational problems) and with $\lambda = (1, 1, 1)$ (Product-of-Experts).
Because the correlation between $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ in $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ is important, we do not consider linear pooling in this example.
Logarithmic pooling requires estimates of $\pd_{1}(\phi_{1 \cap 2})$ and $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.
Because these are mixed distributions, with both discrete and continuous components, standard kernel density estimation, as suggested by @goudie_joining_2019, is inappropriate.
Instead we fit, to transformed versions of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$, a mixture containing a discrete component and either a Gaussian or beta distribution, depending on the transformation.
Further details for all the mixture distribution estimates are contained in Appendix&nbsp;\ref{estimating-submodel-prior-marginal-distributions}.

We use the parallel multi-stage sampler with $\pd_{\text{pool}, 1}(\phi_{1 \cap 2}) = \pd_{1}(\phi_{1 \cap 2}), \,\, \pd_{\text{pool}, 3}(\phi_{2 \cap 3}) = \pd_{3}(\phi_{2 \cap 3})$ and $\pd_{\text{pool}, 2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}) = \pd_{\text{pool}}(\boldsymbol{\phi}) \mathop{/} \left(\pd_{1}(\phi_{1 \cap 2}) \pd_{3}(\phi_{2 \cap 3}) \right)$.
That is, in stage one we target the subposteriors $\pd_{1}(\phi_{1 \cap 2}, \psi_{1} \mid Y_{1})$ and $\pd_{3}(\phi_{2 \cap 3}, \psi_{3} \mid Y_{3})$; in stage two we target the full melded model.
Targeting $\pd_{1}(\phi_{1 \cap 2}, \psi_{1} \mid Y_{1})$ in stage one alleviates the need to solve Equation \eqref{eqn:event-time-model-def} within an MCMC iteration, instead turning the production of $\phi_{1 \cap 2}$ into an embarrassingly parallel, post-stage-one processing step.
Attempting to sample the melded posterior directly would involve solving \eqref{eqn:event-time-model-def} many times within each iteration, presenting a sizeable computational hurdle which we avoid.
It is crucial for the convergence of our multi-stage sampler that the components of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ are updated _individual-at-a-time_ in stage two.
This is possible due to the conditional independence between individuals in the stage one posterior, and Appendix&nbsp;\ref{one-at-a-time} contains the details of this scheme.
The stage one subposteriors are sampled using `Stan`, using 5 chains with $10^{3}$ warm-up iterations and $10^{4}$ post warm-up iterations.
We use `Stan` to sample $\psi_{2}$ where, in every MH-within-Gibbs step, we run `Stan` for 9 warm-up iterations and 1 post warm-up iteration[^initialisation].
We run 5 chains of $10^{4}$ iterations for all stage two targets.
Visual and numerical diagnostics [@vehtari_rank-normalization_2020] are assessed and are available in the repository accompanying this paper[^repo].

[^initialisation]: We also initialise Stan at the previous value of $\psi_{2}$, and disable all adaptive procedures as the default (identity) mass matrix and step size are suitable for this example.

[^repo]: [https://doi.org/10.5281/zenodo.6552714](https://doi.org/10.5281/zenodo.6552714)

## Results

We first inspect the subposterior fitted values for $\pd_{1}$ and $\pd_{3}$.
The top row of Figure \ref{fig:pf_fit_and_fluid_fit_plot} displays the P/F data, the fitted submodel, and derived event times for individuals $i = 17$ and $29$.
The spline appears to fit the raw P/F data well, with the heavy tailed error term accounting for the larger deviations away from the fitted value.
It is interesting to see the relatively wide, multimodal distribution for $(T_{29}, d_{29})$ (there is a second mode at $(T_{29} = C_{29}, d_{29} = 0)$ and for other individuals not shown here).
The bottom row of Figure \ref{fig:pf_fit_and_fluid_fit_plot} displays the cumulative fluid data and the fitted submodel, with the little noise in the data resulting in minimal uncertainty about the fitted value and a concentrated subposterior distribution.

To assess the importance of fully accounting for the uncertainty in $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$, we compare the posterior for $\psi_{2}$ obtained using the chained melding approach with the posterior obtained by fixing $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
Plugging in a point estimate reflects common applied statistical practice when combining submodels, particularly when a distributional approximation is difficult to obtain (as it is for $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$).
Additionally, standard survival models and software typically do not permit uncertainty in event times and indicators, rendering such a plug-in approach necessary.

Specifically, we fix $\phi_{1 \cap 2}$ to the median value[^median] for each individual under $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ and denote it $\widehat{\phi}_{1 \cap 2}$, and use the subposterior mean of $\pd_{3}(\phi_{2 \cap 3} \mid Y_{3})$ denoted $\widehat{\phi}_{2 \cap 3}$.
With these fixed values we sample $\pd(\psi_{2} \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$.
We also compare the melded posterior to the submodel marginal prior $\pd_{2}(\psi_{2})$, but we note that this comparison is difficult to interpret, as the melding process alters the prior for $\psi_{2}$.
Figure \ref{fig:psi_2_comparison_plot} displays the aforementioned densities for $(\theta_{3}, \theta_{17}, \gamma, \alpha) \subset \psi_{2}$, with $(\theta_{3}, \theta_{17})$ chosen as they exhibit the greatest sensitivity to the fixing of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$.
For the baseline coefficients ($\theta_{3}, \theta_{17}$) the chained melding posterior differs slightly in location from $\pd(\psi_{2} \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$, with a small increase in uncertainty.
A more pronounced change is visible for $\alpha$, where the melding process has added a notable degree of uncertainty and shifted the posterior leftwards.

[^median]: For each individual the samples of $(T_{i}, d_{i})_{i = 1}^{N}$-pairs are sorted by $T_{i}$, and the $\lfloor \frac{N}{2}\rfloor$\textsuperscript{th} tuple $(\widehat{T}_{i}, \widehat{d}_{i})$ is chosen as the median.

```{r psi_2_comparison_plot, fig.cap = 'Density estimates for a subset of $\\psi_{2}$. The submodel marginal prior $\\pd_{2}(\\psi_{2})$ is shown as the grey dotted line (note that this is not the marginal prior under the melded model). The figure also contains the subposteriors obtained from chained melding using PoE pooling (red, solid line) and logarithmic pooling (blue, solid line), as well as the posterior using the fixed values $\\pd(\\psi_{2} \\mid \\widehat{\\phi}_{1 \\cap 2}, \\widehat{\\phi}_{2 \\cap 3}, Y_{2})$ (black, dashed line).'}
knitr::include_graphics('plots/mimic-example/psi-2-method-comparison-small.pdf')
```

To investigate which part of the melding process causes this change in the posterior of $\alpha$, we consider fixing either one of $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ to their respective point estimates.
That is, we employ Markov melding as described in Section \ref{markov-melding}, using either logarithmic or PoE pooling, to obtain $\pd_{\text{meld}}(\alpha \mid \widehat{\phi}_{1 \cap 2}, Y_{2}, Y_{3})$ and $\pd_{\text{meld}}(\alpha \mid \widehat{\phi}_{2 \cap 3}, Y_{1}, Y_{2})$.
Figure \ref{fig:alpha_only_comparision_plot} displays the same distributions for $\alpha$ as Figure \ref{fig:psi_2_comparison_plot}, and adds the posteriors obtained using one fixed value ($\widehat{\phi}_{1 \cap 2}$ or $\widehat{\phi}_{2 \cap 3}$) whilst melding the other non-fixed parameter.

Evident for both choices of pooling is the importance of incorporating the uncertainty in $\phi_{1 \cap 2}$.
This is expected given the large uncertainty and multimodal nature of $\phi_{1 \cap 2}$ compared to $\phi_{2 \cap 3}$ (see Figure \ref{fig:pf_fit_and_fluid_fit_plot}).
We suspect that it is the multimodality in $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ that produces the shift in posterior mode of $\phi_{1 \cap 2}$, with the width of $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$ affecting the increase in uncertainty.
Because we prefer the chained melded posterior, under either pooling method, for its full accounting of uncertainty we conclude that $\pd(\alpha \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$ is both overconfident and biased.

```{r alpha_only_comparision_plot, fig.cap = 'Median (vertical line), 50\\%, 80\\%, 95\\%, and 99\\% credible intervals (least transparent to most transparent) for $\\alpha$. The marginal prior (grey, top row) and posterior using fixed $\\widehat{\\phi}_{1 \\cap 2}$ and $\\widehat{\\phi}_{2 \\cap 3}$ (black, bottom row) are as in Figure \\ref{fig:psi_2_comparison_plot}. For the chained melded posteriors (red and blue, rows 2 and 3) and the melded posteriors (red and blue, rows 4 -- 7), the tick label on the y-axis denotes the type of pooling used, and which of $\\phi_{1 \\cap 2}$ and/or $\\phi_{2 \\cap 3}$ are fixed.'}
knitr::include_graphics('plots/mimic-example/psi-2-alpha-only-compare.pdf')
```

The marginal changes to the components of $\psi_{2}$ visible in Figure \ref{fig:psi_2_comparison_plot} appear small, however the cumulative effect of such changes becomes apparent when inspecting the posterior of the survival function.
Figure \ref{fig:kap_meier_pc_plot} displays the model-based, mean survival function under the melded posterior (using PoE pooling), and corresponding draws of $\phi_{1 \cap 2}$ converted into survival curves using the Kaplan-Meier estimator.
Also shown are the Kaplan-Meier estimate of $\widehat{\phi}_{1 \cap 2}$ and the mean survival function computed using $\pd(\psi_{2} \mid \widehat{\phi}_{1 \cap 2}, \widehat{\phi}_{2 \cap 3}, Y_{2})$.
The posterior survival functions differ markedly, with the 95\% intervals overlapping only for small values of time.
It is also interesting to see that $\widehat{\phi}_{1 \cap 2}$, despite being a reasonable point estimate of $\pd_{1}(\phi_{1 \cap 2} \mid Y_{1})$, is not very likely under the melded posterior.
Figure \ref{fig:kap_meier_pc_plot} also suggests that the Weibull hazard is insufficiently flexible for this example.
We discuss the complexities of other hazards in Section \ref{conclusion}.

```{r kap_meier_pc_plot, fig.cap = 'Survival curves and mean survival function at time $t$. The red, stepped lines are draws of $\\phi_{1 \\cap 2}$ from the melded posterior using PoE pooling, converted into survival curves via the Kaplan-Meier estimator. The smooth red line and interval (posterior mean and 95\\% credible interval) denote the model-based, mean survival function obtained from the melded posterior (PoE pooling) values of $\\psi_{2}$ and $\\phi_{2 \\cap 3}$. The blue dashed line is the Kaplan-Meier estimate of $\\widehat{\\phi}_{1 \\cap 2}$, and the blue solid line and interval are the corresponding model-based estimate from $\\pd(\\psi_{2} \\mid \\widehat{\\phi}_{1 \\cap 2}, \\widehat{\\phi}_{2 \\cap 3}, Y_{2})$.'}
knitr::include_graphics('plots/mimic-example/kaplan-meier-pc.pdf')
```

# Conclusion

This paper introduces the chained Markov melded model.
In doing so we make explicit the notion of submodels related in a chain-like way, describe a generic methodology for joining together any number of such submodels and illustrate its application with our examples.
Our examples also demonstrate the importance of quantifying the uncertainty when joining submodels; not doing so can produce biased, over-confident inference.
We also present the choices, and their impacts, that users of chained Markov melding must make which include: the choice of pooling function, and where required the pooling weights; the choice of posterior sampler and the design thereof, including the apportionment of the pooled prior over the stages and stage-specific MCMC techniques.

We have introduced extensions to linear and logarithmic pooling to marginals of different but overlapping quantities.
Linear pooling, introduced in Section \ref{pooled-prior}, could be extended to induce dependence between the components of $\boldsymbol{\phi}$ using multivariate or vine copulas [@kurowicka_dependence_2011; @nelsen_introduction_2006], or other techniques [@lin_recent_2014].
Copula methods are particularly appealing as, depending on the choice of copula, they yield computationally cheap to evaluate expressions for the density function, are easy to sample, and induce correlation between an arbitrary number of marginals.

Our parallel multi-stage sampler currently only considers $\Nm = 3$ submodels, rather than the fully generic definition of chained Markov melding in Equation \eqref{eqn:melded-model-general}.
Whilst we anticipate needing more complex methods in large $\Nm$ settings, the value of $\Nm$ at which the performance of our multi-stage sampler becomes unacceptable will depend on the specific submodels and data under consideration.
A general method would consider a large and arbitrary number of submodels in a chain, and initially split the chain into more/fewer pieces depending on the computational resources available.
Designing such a method is complex, as it would have to:
\begin{itemize}[nosep]
\tightlist
\item avoid requiring the inverse of any component of $\boldsymbol{\phi}$ with a noninvertible definition,
\item estimate the relative cost of sampling each submodel's subposterior, to split the chain of submodels into steps/jobs of approximately the same computational cost,
\item decide the order in which pieces of the chain are combined.
\end{itemize} \noindent
These are substantial challenges.
It may be possible to use combine the ideas in @lindsten_divide-and-conquer_2017 and @kuntz_divide-and-conquer_2021, who propose a parallel Sequential Monte Carlo method, with the aforementioned constraints to obtain a generic methodology.
Ideally we would retain the ability to use existing implementations of the submodels, however the need to recompute the weights of the particles, and hence reevaluate previously considered submodels, may preclude this requirement.
Our current sampler is also sensitive to large differences in location or scale of the target distribution between the stages.
The impact of these differences can be ameliorated using the methodology of @manderson_numerically_2022, and, more generally, Sequential Monte Carlo samplers are likely to perform better in these settings.

Our chained Markov melding methodology is general and permits any form of uncertainty in the common quantities.
In Section \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times-1} we use our chained melded model to incorporate uncertainty in the event times and indicators into a survival submodel.
Some specific forms of uncertainty in the event times have been considered in previous work.
These include @wang_integrative_2020, who consider uncertain event times arising from record linkage, where the event time is assumed to be one of a finite number of event times arising from the record linkage; and @giganti_accounting_2020, @oh_considerations_2018, and @oh_raking_2021, who leverage external validation data to account for measurement error in the event time.
However, the general and Bayesian nature of our methodology readily facilitates any form of uncertainty in the event times and the event indicators; uncertainty in the latter is not considered in the cited papers.

The example in Section \ref{survival-analysis-with-time-varying-covariates-and-uncertain-event-times-1} has three more interesting aspects to discuss.
Firstly, the P/F ratio data used in the first submodel is obtained by finding all blood gas measurements from arterial blood samples.
Approximately $20\%$ of the venous/arterial labels are missing.
In these instances a logistic regression model, fit by the MIMIC team[^mimiclr], is used to predict the missing label based on other covariates.
It is theoretically possible to refit the model in a Bayesian framework and use the chained melded model to incorporate the uncertainty in the predicted sample label -- adding another 'link' to the chain.

[^mimiclr]: The coefficients, classification threshold, and the imputation used in the case of missing data are supplied in the `blood-gasses.sql` file in the GitHub repository accompanying this paper. No other information is available about this model (the data used to produce the coefficients, and the performance of the fitted model).

Secondly, the application of our multi-stage sampler to this example is similar to the two-stage approach used for joint longitudinal and time-to-event models [see @mauff_joint_2020 for a description of this approach].
In the two-stage approach, the longitudinal model is fit using standard MCMC methods in stage one, and the samples are reused in stage two when considering the time-to-event data.
This can significantly reduce the computational effort required to fit the joint model.
However, unlike our multi-stage sampler, the typical two-stage approach does not target the full posterior distribution, which can lead to biased estimates (though @mauff_joint_2020 extend the typical two-stage approach to reduce this bias).

Thirdly, we observe a lack of flexibility the baseline hazard, visible in Figure \ref{fig:kap_meier_pc_plot}.
More complex hazards could be employed, e.g. modelling the (log-)hazard using a (penalised) B-spline [@royston_flexible_2002; @rosenberg_hazard_1995; @rutherford_use_2015].
However, this increased flexibility precludes an analytic form for the survival function.
Whilst numerical integration is possible it is not trivial, particularly when the hazard is discontinuous, as our hazard is at the breakpoint.
Splines also have more coefficients than the single parameter of the Weibull hazard.
Identifiability issues arise with a small number of individuals, many of whom are censored, and are compounded when there are a relatively large number of other parameters $(\alpha, \boldsymbol{\theta})$.
Whilst we do not believe these costs are worth incurring for our example, for settings with a larger number of patients and more complicated longitudinal submodels the increased flexibility may be vital.

## Acknowledgements {-}

We thank Sarah L Cowan for assistance in understanding respiratory failure, and Anne Presanis and Brian Tom for many helpful discussions about the methodological aspects of this work.
We also thank Luiz Max Carvalho for comments on an earlier version of this paper.
This work was supported by The Alan Turing Institute under the UK Engineering and Physical Sciences Research Council (EPSRC) [EP/N510129/1] and the UK Medical Research Council [programme code MC\_UU\_00002/2].

<!-- This makes pandoc-citeproc put the references before the end of document. -->
<!-- <div id="refs"></div> -->

<!-- Now switch to alphabetical numbering for the appendix, and reset the counter. -->
\renewcommand{\thesubsection}{\Alph{subsection}}
\setcounter{subsection}{0}

# Appendices {-}

## Log pooling Gaussian densities

We can exactly compute $\pd_{\text{pool}}$ when logarithmically pooling Gaussian densities.
Noting that, in the one dimensional case, $\text{N}(\phi; \mu, \sigma^2)^{\lambda_{\modelindex}} = \text{N}(\phi; \mu, \frac{\sigma^2}{\lambda_{\modelindex}})$, we use the results of @bromiley_products_2003 and write
\input{tex-input/multiple-phi/0070-log-pooling-gaussian.tex}\noindent
hence $\pd_{\text{pool}}(\phi_{1 \cap 2}, \phi_{2 \cap 3}) = \text{N}((\phi_{1 \cap 2}, \phi_{2 \cap 3});\, \mu_{\text{log}}, \, \Sigma_{\text{log}})$.
The choice of $\lambda_{2}$ is critical; by controlling the contribution of $\pd_{2}$ to $\pd_{\text{pool}}$, $\lambda_{2}$ controls the degree of correlation present in the latter.
The left hand column of Figure \ref{fig:pooled_densities_plot} illustrates this phenomena.
When $\lambda_{1} = \lambda_{3} = 0 \implies \lambda_{2} = 1$, all correlation in $\pd_{2}$ is present in $\pd_{\text{pool}}$.
The correlation decreases for increasing values of $\lambda_{1}$ until $\lambda_{1} = \lambda_{3} = 0.5 \implies \lambda_{2} = 0$, where no correlation persists.

## Sequential sampler

\input{tex-input/multi-stage-sampler/0001-seq-sampler-dag.tex}

Figure \ref{fig:seq-sampler-dag} depicts graphically the strategy employed by the sequential sampler.
The sequential sampler assumes that the pooled prior decomposes such that
\input{tex-input/multi-stage-sampler/0002-sequential-sampler-decomposition.tex}\noindent
This is necessary to avoid sampling all components of $\boldsymbol{\phi}$ in the first stage.
All pooled priors trivially satisfy \eqref{eqn:sequential-sampler-decomposition}, as we can assume all but $\pd_{\text{pool}, 3}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ are improper, flat distributions.
However, including some portion of the pooled prior in each stage of the sampler can improve performance, and eliminate computational instabilities when submodel likelihoods contain little information.

### Stage one

Stage one of the sequential sampler targets
\input{tex-input/multi-stage-sampler/0020-stage-one-target.tex}\noindent
using a generic proposal kernel for both $\phi_{1 \cap 2}$ and $\psi_{1}$.
The corresponding acceptance probability for a proposed update from $(\phi_{1 \cap 2}, \psi_{1})$ to $(\phi_{1 \cap 2}^{*}, \psi_{1}^{*})$ is
\input{tex-input/multi-stage-sampler/0021-stage-one-acceptance-probability.tex}

### Stage two

The stage two target augments the stage one target by including the second submodel, corresponding prior marginal distribution, and an additional pooled prior term
\input{tex-input/multi-stage-sampler/0030-stage-two-target.tex}\noindent
A Metropolis-within-Gibbs strategy is employed, where the stage one samples are used as a proposal for $\phi_{1 \cap 2}$, whilst a generic proposal kernel is used for $\psi_{2}$ and $\phi_{2 \cap 3}$.
Thus the proposal distributions for $\phi_{1 \cap 2}^{*}$ and $(\phi_{2 \cap 3}^{*}, \psi_{2}^{*})$ are
\input{tex-input/multi-stage-sampler/0031-stage-two-gibbs-updates.tex}\noindent
The acceptance probability for this proposal strategy is
\input{tex-input/multi-stage-sampler/0032-stage-two-acceptance-probabilities.tex}\noindent
Our judicious choice of proposal distribution has resulted in a cancellation in Equation \eqref{eqn:stage-two-acceptance-probabilities-one} which removes all terms related to $\pd_{1}$.
Similarly, all terms related to $\pd_{1}$ are constant -- hence cancel -- in Equation \eqref{eqn:stage-two-acceptance-probabilities-two}.
This eliminates any need to re-evaluate the first submodel.

### Stage three

In stage three we target the full melded posterior
\input{tex-input/multi-stage-sampler/0044-stage-three-target.tex}\noindent
The target has now been broadened to include terms from the third submodel and the entirety of the pooled prior.
Again, we employ a Metropolis-within-Gibbs sampler, with proposals drawn such that
\input{tex-input/multi-stage-sampler/0045-stage-three-gibbs-updates.tex}\noindent
which leads to acceptance probabilities of
\input{tex-input/multi-stage-sampler/0046-stage-three-acceptance-probabilities.tex}\noindent
The informed choice of proposal distribution for ($\phi_{1 \cap 2}, \phi_{2 \cap 3}, \psi_{1}, \psi_{2}$) has allowed us to target the full melded posterior without needing to evaluate all submodels simultaneously.


## Normal approximation calculations

Substituting in the approximations of Section \ref{normal-approximations-to-submodel-components} to Equation \eqref{eqn:normal-approx-melded-posterior-target} yields the approximate melded posterior
\input{tex-input/multiple-normal-approximation/0020-normal-approximation-approximate-target.tex}\noindent
Noting that the product of independent normal densities is an unnormalised multivariate normal density with independent components, we rewrite Equation \eqref{eqn:normal-approximation-approximate-target} as
\input{tex-input/multiple-normal-approximation/0030-normal-approx-nu-de-form.tex}\noindent
The ratio of normal densities is also an unnormalised normal density, and hence Equation \eqref{eqn:normal-approx-nu-de-form} simplifies to
\input{tex-input/multiple-normal-approximation/0041-final-normal-approx-appendix.tex}

## Calculating the cumulative fluid balance from the raw fluid data

In the raw fluid data each patient has $\tilde{l} = 1, \ldots, \tilde{L}_{i}$ observations.
Each observation $\tilde{x}_{i, \tilde{l}}$ is typically a small fluid administration (e.g. an injection of some medicine in saline solution), or a fluid discharge (almost always urine excretion).
The observations have corresponding observation times $\tilde{u}_{i, \tilde{l}}$, with $\tilde{\boldsymbol{u}}_{i} = \{\tilde{u}_{i, 1}, \ldots, \tilde{u}_{i, \tilde{L}_{i}}\}$ and $\tilde{\boldsymbol{x}}_{i} = \{\tilde{x}_{i, 1}, \ldots, \tilde{x}_{i, \tilde{L}_{i}}\}$.
We code the fluid administrations/inputs as positive values, and the excretions/outputs as negative values.
Each patient has an enormous number of raw fluid observations $(L_{i} \ll \tilde{L}_{i})$ and it is computationally infeasible to consider them all at once.
We aggregate the raw fluid observations into 8-hourly changes in fluid balance.
From these 8-hourly changes we calculate the cumulative fluid balance.

Mathematically, we define an ordered vector of boundary values
\begin{equation}
  \boldsymbol{v}_{i} = (\lfloor \min\{\tilde{\boldsymbol{u}}_{i}\} \rfloor,  \lfloor \min\{\tilde{\boldsymbol{u}}_{i}\} \rfloor + \frac{1}{3}, \ldots, \lceil \max\{\tilde{\boldsymbol{u}}_{i}\} \rceil),
\end{equation}
noting that $\dim(\boldsymbol{v}_{i}) = L_{i} + 1$.
Because the observation times encoded as _days since ICU admission_ and we are interested in the 8-hourly changes, our floor and ceiling functions round down or up to the appropriate third respectively.
The raw fluid observations are then divided up into $L_{i}$ subsets of $\{\tilde{\boldsymbol{x}}_{i}, \tilde{\boldsymbol{u}}_{i}\}$ based on which boundary values the observation falls in between:
\begin{equation}
  V_{i, l} = \left\{
    \{\tilde{\boldsymbol{x}}_{i}, \tilde{\boldsymbol{u}}_{i}\}
    \mid
    v_{i, l} \leq \tilde{\boldsymbol{u}}_{i} < v_{i, l + 1}
  \right\},
\end{equation}
for $l = 1, \ldots, L_{i}$.
Denote $N^{V}_{i, l} = \lvert V_{i, l} \rvert \mathop{/} 2$ (dividing by two as $V_{i, l}$ contains both the observation and the observation time).
The $l$\textsuperscript{th} 8-hourly fluid change $\Delta_{i, l}$ and corresponding observation time $u_{i, l}$ can then be computed as
\begin{equation}
  \Delta_{i, l} = \sum_{s = 1}^{N^{V}_{i, l}} \tilde{x}_{i, s}, \,\, \text{s.t.} \,\, \tilde{x}_{i, s} \in V_{i, l}, \qquad
  u_{i, l} = \frac{1}{N^{V}_{i, l}} \sum_{s = 1}^{N^{V}_{i, l}} \tilde{u}_{i, s}, \,\, \text{s.t.} \,\, \tilde{u}_{i, s} \in V_{i, l}.
\end{equation}
Finally, the 8-hourly cumulative fluid balance data are computed by $x_{i, l} = \sum_{s = 1}^{l} \Delta_{i, s}$, and we assume they too are observed at $u_{i, l}$.

## Priors and justification for the cumulative fluid submodel

The parameters for the gamma prior for $\eta^{b}_{1, i}$ and $\eta^{a}_{1, i}$ are obtained by assuming that the 2.5-, 50-, and 97.5- percentiles are at 0.5, 5, and 20 [@belgorodski_rriskdistributions_2017].
A slope of $0.5$ (i.e. the change in cumulative fluid balance per day) is unlikely but possible due to missing data; a slope of $20$ is also unlikely but possible as extremely unwell patients can have very high respiratory rates and thus require large fluid inputs.

The prior for the breakpoint $\kappa_{i}$ is derived as follows.
Define $u_{i, (1)} = \min(\boldsymbol{u}_{i})$ and $u_{i, (n)} = \max(\boldsymbol{u}_{i})$, with $r_{i} = u_{i, (n)} - u_{i, (1)}$.
We reparameterise the breakpoint by noting that $\kappa_{i} = \kappa^{\text{raw}}_{i}r_{i} + u_{i, (1)}$, where $\kappa^{\text{raw}} \in [0, 1]$.
We then set $\kappa^{\text{raw}}_{i} \sim \text{Beta}(5, 5)$ to regularise the breakpoint towards the middle of each individual's stay in ICU.
This is crucial to ensure the submodel is identifiable when there is little evidence of a breakpoint in the data.
Note that this results in the following analytic expression for $\pd_{2}(\phi_{2 \cap 3})$
\begin{equation}
  \pd_{3}(\phi_{2 \cap 3}) = \prod_{i = 1}^{N} \pd(\eta^{b}_{1, i}) \pd(\eta^{a}_{1, i}) \pd(\kappa_{i}), \,\, \text{with} \,\,\,
  \pd(\kappa_{i}) = \pd_{\kappa^{\text{raw}}_{i}}\left(\frac{\kappa_{i} - u_{i, (1)}}{r_{i}}\right) \frac{1}{r_{i}}
\end{equation}
by the change of variables formula.

Specifying a prior for $\eta_{0, i}$, the cumulative fluid balance at $\kappa_{i}$, is difficult because it too depends on the length of stay.
Instead, we reparameterise so that $\eta_{0, i}$ is a function of the y-intercept $\eta_{0, i}^{\text{raw}}$.
\begin{equation}
  \eta_{0, i} =
    (\eta_{0, i}^{\text{raw}} + \eta^{b}_{1, i} \kappa_{i}) \boldsymbol{1}_{\{0 < \kappa_{i}\}} +
    (\eta_{0, i}^{\text{raw}} + \eta^{a}_{1, i} \kappa_{i}) \boldsymbol{1}_{\{0 \geq \kappa_{i}\}}
\end{equation}
We place a $\text{LogNormal}(1.61, 0.47^2)$ prior on $\eta_{0, i}^{\text{raw}}$.
These values are obtained assuming that, _a priori_, the $2.5\%, 50\%$, and $99\%$ percentiles of $\eta_{0, i}^{\text{raw}}$ are $0.5, 5$, and $15$ respectively [@belgorodski_rriskdistributions_2017].
This is a broad prior that reflects the numerous possible admission routes into the ICU.
We expect those admitted from the wards to have little pre-admission fluid data.
Those admitted from the operating theatre occasionally have their in-theatre fluid input recorded after admission into the ICU, with no easy way to distinguish these records in the data.

## Analytic form for the survival function

The hazard at arbitrary time $t$ is

\begin{gather*}
  h_{i}(t) = \gamma t^{\gamma - 1} \exp\left\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta} + \alpha \frac{\partial}{\partial t} m_{i}(t)\right\} \\
  m_{i}(t) = \eta_{0, i} + \eta^{b}_{1, i}(t - \kappa_{i})\boldsymbol{1}_{\{t < \kappa_{i}\}} + \eta^{a}_{1, i}(t - \kappa_{i})\boldsymbol{1}_{\{t \geq \kappa_{i}\}} \\
  \frac{\partial}{\partial t} m_{i}(t) = \eta^{b}_{1, i}\boldsymbol{1}_{\{t < \kappa_{i}\}} + \eta^{a}_{1, i}\boldsymbol{1}_{\{t \geq \kappa_{i}\}}.
\end{gather*}
Then, for $t > \kappa_{i}$, the cumulative hazard is
\begin{align*}
  \int_{0}^{t} h_{i}(u) \text{d}u
  &= \int_{0}^{t}
    \gamma u^{\gamma - 1}
    \exp\left\{
      \boldsymbol{w}_{i}^{\top}\boldsymbol{\theta} +
      \alpha \eta^{b}_{1, i}\boldsymbol{1}_{\{u < \kappa_{i}\}} +
      \alpha \eta^{a}_{1, i}\boldsymbol{1}_{\{u \geq \kappa_{i}\}}
    \right\}
    \text{d}u \\
  &= \gamma \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
    \int_{0}^{t}
      u^{\gamma - 1}
      \exp\left\{
        \alpha \eta^{b}_{1, i}\boldsymbol{1}_{\{u < \kappa_{i}\}} +
        \alpha \eta^{a}_{1, i}\boldsymbol{1}_{\{u \geq \kappa_{i}\}}
      \right\}
    \text{d}u \\
  &= \gamma \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
    \left[
      \int_{0}^{\kappa_{i}}
        u^{\gamma - 1}
        \exp\left\{
          \alpha \eta^{b}_{1, i}
        \right\}
      \text{d}u
      +
      \int_{\kappa_{i}}^{t}
        u^{\gamma - 1}
        \exp\left\{
          \alpha \eta^{a}_{1, i}
        \right\}
      \text{d}u
    \right] \\
  &= \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
    \left[
      \exp\left\{
        \alpha \eta^{b}_{1, i}
      \right\}
      \kappa_{i}^{\gamma}
      +
      \exp\left\{
        \alpha \eta^{a}_{1, i}
      \right\}
      (t^{\gamma} - \kappa_{i}^{\gamma})
    \right]
\end{align*}

and for $t < \kappa_{i}$

\begin{align*}
  \int_{0}^{t} h_{i}(u) \text{d}u
  &= \gamma \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
    \left[
      \int_{0}^{t}
        u^{\gamma - 1}
        \exp\left\{
          \alpha \eta^{b}_{1, i}
        \right\}
      \text{d}u
    \right] \\
  &= \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta}\}
    \left[
      \exp\left\{
        \alpha \eta^{b}_{1, i}
      \right\}
      t_{i}^{\gamma}
    \right] \\
  &= t_{i}^{\gamma} \exp\{\boldsymbol{w}_{i}^{\top}\boldsymbol{\theta} + \alpha \eta^{b}_{1, i}\}.
\end{align*}
The survival functions then have corresponding definitions for $t > \kappa_{i}$ and $t < \kappa_{i}$ as $S_{i}(t) = \exp\{-\int_{0}^{t} h_{i}(u) \text{d}u\}$.

## Survival submodel prior justification {#p2-prior-justification}

Our prior for $(\gamma, \alpha, \boldsymbol{\theta})$ must result in a plausible distribution for $\pd_{2, i}(T_{i} \mid d_{i} = 1)$, and a reasonable balance between $d_{i} = 1$ and $d_{i} = 0$ events.
The primary concern is unintentionally specifying a prior for which the bulk of $\pd_{2, i}(T_{i} \mid d_{i} = 1)$ is very close to zero.
In addition, certain extreme configurations of $(\gamma, \alpha, \boldsymbol{\theta})$ cause issues for the methodology of @crowther_simulating_2013, particularly the numerical root finding and numerical integration steps.
We would like to rule out such extreme configurations _a priori_.
Ideally we would encode this information a joint prior for $(\gamma, \alpha, \boldsymbol{\theta})$, but specifying the appropriate correlation structure for these parameters is prohibitively challenging.
Instead we focus on specifying appropriate marginals for each of $\gamma, \alpha$, and $\boldsymbol{\theta}$, and create visual prior predictive checks [@gabry_visualization_2019; @gelman_bayesian_2020] to ensure the induced prior for $(T_{i}, d_{i})$ is acceptable.

Before justifying our chosen marginal prior, we note that the $\exp\{\boldsymbol{x}_{i}\theta + \alpha \frac{\partial}{\partial T_{i}} m_{i}(T_{i})\}$ term implies that the priors for $\theta$ and $\alpha$ are on the log-scale.
Hence the magnitude of these parameters must be small, otherwise all event times would be very near zero or at infinity.
The asymmetric effect of the transformation from the log scale also implies that symmetric priors are not obviously sensible.
From these observations we deduce that $\theta$ and $\alpha$ must not be too large in magnitude, however if they are negative then they can be slightly larger.
Hence, we specify the skew-normal priors detailed in Section \ref{survival-submodel-pd_2}, noting that the skewness parameter for $\alpha$ is smaller, because $\frac{\partial}{\partial T_{i}} m_{i}(T_{i})$ is strictly positive and typically between 0.5 and 20, whilst $\boldsymbol{w}_{i}$ is standardised to be approximately standard normal.
Lastly, if $\gamma$ is too far away from $1$ (in either direction), then the event times are very small either because the hazard increases rapidly ($\gamma \gg 1$), or because almost all of the cumulative hazard is in the neighbourhood of 0 ($\gamma \ll 1$).
We specify a gamma distribution for $\gamma$ with the $1$\textsuperscript{th}-, $50$\textsuperscript{th}-, and $99$\textsuperscript{th}-percentiles of $\pd_{2}(\gamma)$ at $0.2, 1$, and $2$, allowing for a wide range of hazard shapes, but removing many of the extremes.


## Estimating submodel prior marginal distributions

For $\pd_{1}$, we note that $\pd_{1}(\phi_{1 \cap 2}) = \prod_{i = 1}^{N}\pd_{1, i}(T_{i}, d_{i})$, and that $\pd_{1, i}(T_{i}, d_{i})$ conditions on each individual's length of stay (in specifying the location of the knots), as well as the range, mean, and standard deviation of the P/F data (by standardising $\tilde{z}_{i, j}$).
Simple Monte Carlo samples are drawn from $\pd_{1}(\phi_{1 \cap 2})$ and used to estimate $\widehat{\pd}_{1}(\phi_{1 \cap 2})$.
Under the second submodel we obtain samples from $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ using the methodology of @crowther_simulating_2013 as implemented in `simsurv` [@brilleman_simsurv_2021].
These samples are use to estimate $\widehat{\pd}_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$.

### P/F submodel

We approximate $\pd_{1}(\phi_{1 \cap 2})$ using a mixture of discrete and continuous distributions, with a discrete spike at $C_{i}$ for the censored events and a beta distribution for the (rescaled) event times.
Monte Carlo samples of $T_{i}$ and $d_{i}$ are obtained from $\pd_{1, i}(T_{i}, d_{i})$ by drawing $\beta_{0, i}$ and $\boldsymbol{\zeta}_{i}$ from their respective prior distributions and then solving \eqref{eqn:event-time-model-def}.
Denoting the estimated mixture weight $\widehat{\pi}_{i} \in [0, 1]$, the density estimate is
\begin{equation}
  \widehat{\pd}_{1, i}(T_{i}, d_{i}) =
    \widehat{\pi}_{i} \, \text{Beta}\left(\frac{T_{i}}{C_{i}}; \widehat{a}, \widehat{b}\right) \frac{1}{C_{i}} \boldsymbol{1}_{\{d_{i} = 1\}} +
    (1 - \widehat{\pi}_{i}) \boldsymbol{1}_{\{d_{i} = 0, T_{i} = C_{i}\}}
  \label{eqn:pf-event-time-prior-dist}
\end{equation}
where $\widehat{\pi}_{i}, \widehat{a}_{i}$ and $\widehat{b}_{i}$ are maximum likelihood estimates obtained using the prior samples.
Examples of $\widehat{\pd}_{1, i}(T_{i}, d_{i})$ for a subset of individuals are displayed in Figure \ref{fig:pf_prior_fit}.

```{r pf_prior_fit, fig.cap = "Fitted distribution (curve) and Monte Carlo samples drawn from $\\pd_{1}(\\phi_{1 \\cap 2})$ (histogram) for a subset of the individuals in the cohort. The height of the atom at $C_{i}$ (red bar and point) has been set to $1 - \\widehat{\\pi}_{i}$"}
knitr::include_graphics("plots/mimic-example/pf-prior-plot-small.pdf")
```

### Survival submodel

Our estimate of $\pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ relies on the fact that
\begin{equation}
  \pd_{2}(\phi_{1 \cap 2}, \phi_{2 \cap 3}) = \prod_{i = 1}^{N}\pd_{2, i}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i}).
\end{equation}
As such we estimate $\pd_{2, i}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i})$ for each individual and take the product of these estimates.
Drawing samples from $\pd_{2, i}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i})$ is challenging: we use the approach proposed by @crowther_simulating_2013 as implemented in @brilleman_simsurv_2021.
Inspecting the samples reveals correlation between $(T_{i}, d_{i})$ and $(\kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i})$ that we would like to capture in our estimate.
To do so, we fit a mixture of multivariate normal distributions to transformations of the continuous parameters with support on $\mathbb{R}$,
\begin{equation}
  \begin{gathered}
    \tilde{T}_{i} = \text{Logit}\left(\frac{T_{i}}{C_{i}}\right), \quad
    \tilde{\kappa}_{i} = \text{Logit}\left(\frac{\kappa_{i} - u_{i, (1)}}{u_{i, (n)} - u_{i, (1)}}\right), \\
    \tilde{\eta}^{b}_{1, i} = \log(\eta^{b}_{1, i}), \quad
    \tilde{\eta}^{a}_{1, i} = \log(\eta^{a}_{1, i}).
  \end{gathered}
\end{equation}
The resulting density estimate, with estimated mixture weight $\widehat{\theta}_{i} \in [0, 1]$, is
\begin{align*}
  \widehat{\pd}_{2}(T_{i}, d_{i}, \kappa_{i}, \eta^{b}_{1, i}, \eta^{a}_{1, i}) =
    \Big[&
      (\widehat{\theta}_{i})
      \text{N}\left(
        \left[\tilde{T}_{i}, \tilde{\kappa}_{i}, \tilde{\eta}^{b}_{1, i}, \tilde{\eta}^{a}_{1, i} \right]^{\mathsf{T}}; \widehat{\mu}_{1, i}, \widehat{\Sigma}_{1, i}
      \right)
      \boldsymbol{1}_{\{d_{i} = 1\}}
      + \\
      & (1 - \widehat{\theta}_{i})
      \text{N}\left(\left[\tilde{\kappa}_{i}, \tilde{\eta}^{b}_{1, i}, \tilde{\eta}^{a}_{1, i} \right]^{\mathsf{T}}; \widehat{\mu}_{0, i}, \widehat{\Sigma}_{0, i} \right)
      \boldsymbol{1}_{\{d_{i} = 0, T_{i} = C_{i}\}}
    \Big]
    J_{i},
\end{align*}
where $\widehat{\theta}_{i}, \widehat{\mu}_{1, i}, \widehat{\Sigma}_{1, i}, \widehat{\mu}_{0, i}$, and $\widehat{\Sigma}_{0, i}$ are maximum likelihood estimates, and the Jacobian correction $J_{i}$ is
\begin{equation}
  J_{i} = \left[
    \left(
      \frac{1}{C_{i} - T_{i}} +
      \frac{1}{T_{i}}
    \right)^{d_{i}}
    \left(
      \frac{1}{u_{i, (n)} - \kappa_{i}} +
      \frac{1}{\kappa_{i} - u_{i, (1)}}
    \right)
    \left(
      \frac{1}{\eta^{b}_{1, i}}
    \right)
    \left(
      \frac{1}{\eta^{a}_{1, i}}
    \right)
  \right].
\end{equation}

We assess the fit of this estimate by drawing samples from $\widehat{\pd}_{2, i}(\phi_{1 \cap 2}, \phi_{2 \cap 3})$ and comparing them to the Monte Carlo samples drawn using `simsurv`.
Our visual assessment is displayed in Figure \ref{fig:surv_prior_plot_fit} for individual $i = 19$.
The normal approximation seems to fit the samples well, with the shape of $\pd_{2, i}(T_{i} \mid d_{i} = 1)$ closely matching that of the Monte Carlo samples, and with a similar mix of $d_{i} = 0$ and $d_{i} = 1$ samples.

We also require an estimate of $\pd_{2}(\phi_{1 \cap 2})$ for experiments discussed in Section \ref{results}.
This is obtained using the samples generated under the survival submodel prior and the methodology of Section \ref{pf-submodel}.
The raw samples and fit are displayed in Figure \ref{fig:surv_prior_phi_12_marginal_plot_fit}.

```{r surv_prior_phi_12_marginal_plot_fit, fig.cap = "Fitted distribution (curve) and Monte Carlo samples drawn from $\\pd_{2}(\\phi_{1 \\cap 2})$ (histogram) for a subset of the individuals in the cohort. The height of the atom at $C_{i}$ (red bar and point) has been set to $1 - \\widehat{\\pi}_{i}$"}
knitr::include_graphics("plots/mimic-example/submodel-2-phi-12-marginal-fit-plot-small.pdf")
```

```{r surv_prior_plot_fit, out.width = '\\textwidth', fig.cap = 'Monte Carlo (MC) samples from $\\pd_{2, i}(\\phi_{1 \\cap 2}, \\phi_{2 \\cap 3})$ obtained using \\texttt{simsurv} and samples from the fitted normal approximation (NA) for $i = 19$. The panels on the off diagonal elements contain a 2D kernel density estimate for $d_{i} = 1$ and the samples for $d_{i} = 0$. Diagonal and lower-triangular panels are on their original scales, whilst the upper-triangular panels are on the log scale.'}
knitr::include_graphics("plots/mimic-example/p3-prior-pairs/pairs-19.png")
```

## Cohort selection criteria

This appendix details the cohort selection criteria and our rationale for them.
In the text we speak of the $i$\textsuperscript{th} individual.
This is because in our final data set (the data that results from applying the following criteria) we are dealing with unique individuals, however some individuals in MIMIC have multiple ICU stays.
In this appendix $i$ represents a single stay in ICU.

1. Each ICU stay must have at least 12 \pfratio\ observations ($J_{i} \geq 12$), with the first 6 being greater than 350 ($z_{i, j} > 350$ for $j = 1, \dots, 1$).
    - This is to ensure we have enough data to fit a B-spline with 7 internal knots. The restriction on the first 6 observations is to avoid selecting those who have already started to experience respiratory failure prior to ICU admission.
1. The time between any 2 consecutive \pfratio\ observations cannot exceed 2 days.
    - This is because we believe this lack of \pfratio\ observations is likely to be an error in the data: e.g. what appears as a single ICU stay in the data is actually two or more separate stays.
1. The fluid observations must be after ICU admission (some observations are entered as 'Pre-admission intake') and cannot be associated with fluid administered in the operating room (OR). Note that this does not mean all OR fluid administrations are removed, as some are mis-labelled.
1. There must be sufficient temporal overlap between the fluid data and the \pfratio\ data. Specifically,
  \begin{equation}
    \frac{
      \max\left\{0, \min\left[\max(\boldsymbol{t}_{i}), \max(\tilde{\boldsymbol{u}}_{i})\right] - \max\left[\min(\boldsymbol{t}_{i}), \min(\tilde{\boldsymbol{u}}_{i})\right]\right\}
    } {
      \max\left[\max(\boldsymbol{t}_{i}), \max(\tilde{\boldsymbol{u}}_{i})\right] - \min\left[\min(\boldsymbol{t}_{i}), \min(\tilde{\boldsymbol{u}}_{i})\right]
    }
    > 0.9
    \label{eqn:overlap-def}
  \end{equation}
    - The numerator of \eqref{eqn:overlap-def} is strictly positive, and the denominator ensures that the quantity is bounded between 0 and 1.
    - We cannot investigate the relationship between the rate of fluid intake and respiratory failure if the latter occurs without sufficient fluid data surrounding the event.

## Baseline covariate information

The baseline covariate vector $\boldsymbol{w}_{i}$ contains the median of the measurements taken in the first 24 hours of the ICU stay, which are then stardardised, of the following covariates: Anion gap, Bicarbonate, Creatinine, Chloride, Glucose, Hematocrit, Hemoglobin, Platelet, Partial thromboplastin time, International normalized ratio, Prothrombin time, Sodium, blood Urea nitrogen, White blood cell count, Age at ICU admission, and Gender.

## Updating $\phi_{1 \cap 2}$ and $\phi_{2 \cap 3}$ in stage two individual-at-a-time {#one-at-a-time}

The parallel sampler described in Section \ref{parallel-sampler} is a MH-within-Gibbs scheme, with each iteration sampling from the  conditionals of the melded model
\begin{equation}
\begin{gathered}
  \pd_{\text{meld}}(\phi_{1 \cap 2}, \psi_{1} \mid \boldsymbol{Y}, \psi_{2}, \phi_{2 \cap 3}, \psi_{3}), \\
  \pd_{\text{meld}}(\phi_{2 \cap 3}, \psi_{3} \mid \boldsymbol{Y}, \phi_{1 \cap 2}, \psi_{1}, \psi_{2}), \\
  \pd_{\text{meld}}(\psi_{2} \mid \boldsymbol{Y}, \phi_{1 \cap 2}, \psi_{1}, \phi_{2 \cap 3}, \psi_{3}).
\end{gathered}
\end{equation}
We would like to update the first two conditionals 'individual-at-a-time'.
For simplicity we will assume the pooled prior is formed using product-of-experts pooling, and focus on the first conditional $\pd_{\text{meld}}(\phi_{1 \cap 2}, \psi_{1} \mid \boldsymbol{Y}, \psi_{2}, \phi_{2 \cap 3}, \psi_{3})$.
Similar arguments apply to the second conditional and other pooling types.
Recall that
\begin{equation}
\begin{aligned}
\phi_{1 \cap 2} &=
  (\phi_{1 \cap 2, 1}, \ldots, \phi_{1 \cap 2, N})  \\
  &= (\phi_{1 \cap 2, 1}(\chi_{1, 1}), \ldots, \phi_{1 \cap 2, N}(\chi_{1, N})) \\
  &= \left((T_{1}, d_{1}), \ldots, (T_{N}, d_{N})\right),
\end{aligned}
\end{equation}
and $\psi_{1} = (\psi_{1, 1}, \ldots, \psi_{1, N}) = (\omega_{i})_{i = 1}^{N}$.
We would like to update $\phi_{1 \cap 2}$ and $\psi_{1}$ by updating $(T_{i}, d_{i}, \omega_{i})$ for one individual at a time, for a total of $N$ 'sub-steps'.

### Sub-step 1

Suppose we are at step $t - 1$ of the Markov chain, and we are proposing values for step $t$.
Without any loss of generality we assume that we are updating individual $i = 1$ in sub-step 1 -- in practice we update the individuals in a random order for each iteration of the MH-within-Gibbs scheme.

Our target is
\begin{equation}
  \pd_{1}(\chi_{1, 1}, \psi_{1, 1} \mid Y_{1}, (\chi_{1, i}, \psi_{1, i})_{i = 2}^{N})
  \pd_{2}(\phi_{1 \cap 2, 1}(\chi_{1, 1}) \mid Y_{2}, (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = 2}^{N}, \psi_{2}, \phi_{2 \cap 3}).
  \label{eqn:sub-step-target}
\end{equation}
The model, detailed in Section \ref{pf-ratio-submodel-b-spline-pd_1}, uses the conditional independence between individuals to factorise such that
\begin{equation}
  \pd_{1}(\chi_{1}, \psi_{1} \mid Y_{1}) = \prod_{i = 1}^{N} \pd_{1}(\chi_{1, i}, \psi_{1, i} \mid Y_{1}),
\end{equation}
which implies[^fluidproperty] $(\chi_{1, i}, \psi_{1, i}\indep \chi_{1, i^{'}}, \psi_{1, i^{'}}) \mid Y_{1}$ for $i \neq i^{'}$, hence
\begin{equation}
  \pd_{1}(\chi_{1, 1}, \psi_{1, 1} \mid Y_{1}, (\chi_{1, i}, \psi_{1, i})_{i = 2}^{N}) =
  \pd_{1}(\chi_{1, 1}, \psi_{1, 1} \mid Y_{1}).
\end{equation}
It will be convenient to rewrite \eqref{eqn:sub-step-target} as
\begin{equation}
  \pd_{1}(\chi_{1, 1}, \psi_{1, 1} \mid Y_{1})
  \frac {
    \pd_{2}(\phi_{1 \cap 2, 1}(\chi_{1, 1}), (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = 2}^{N}, \psi_{2}, \phi_{2 \cap 3} \mid Y_{2})
  } {
    \pd_{2}((\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = 2}^{N}, \psi_{2}, \phi_{2 \cap 3} \mid Y_{2})
  }.
  \label{eqn:sub-step-one-target}
\end{equation}

[^fluidproperty]: This property is also true of $\pd_{3}$.

Suppose there are $K_{1}$ stage one samples from $\pd_{1}(\chi_{1}, \psi_{1} \mid Y_{1})$, and each sample has a corresponding $\phi_{1 \cap 2}$.
We propose $\psi_{1, 1}^{*}$ and $\chi_{1, 1}^{*}$ (hence $\phi_{1 \cap 2, 1}^{*})$ by sampling a random integer $k_{1}^{*}$ from $\{1, \ldots, K_{1}\}$, retrieving the corresponding values of $\psi_{1}$ and $\chi_{1}$, and ignoring $(\chi_{1, 2}, \ldots, \chi_{1, N}, \psi_{1, 2}, \ldots, \psi_{1, N})$.
Given stage one samples from the correct stationary distribution, obtained from the post-warmup samples of a well mixed set of Markov chains, such a proposal mechanism is approximately equivalent to proposing from $\pd_{1}(\chi_{1, 1}, \psi_{1, 1} \mid Y_{1})$.
The quality of the approximation depends on the quality of the stage one samples.
Under this proposal, noting that the denominator term in \eqref{eqn:sub-step-one-target} does not depend on the proposed parameters, the acceptance probability for this sub-step is
\begin{equation}
\begin{aligned}
\alpha&\left((\chi_{1, 1}^{*}, \psi_{1, 1}^{*})_{k_{1}^{*}}, (\chi_{1, 1}, \psi_{1, 1}) \right) \\
&=
\frac {
  \pd_{1}(\chi_{1, 1}^{*}, \psi_{1, 1}^{*} \mid Y_{1})
  \pd_{2}(\phi_{1 \cap 2, 1}(\chi_{1, 1}^{*}), (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = 2}^{N}, \phi_{2 \cap 3}, \psi_{2} \mid Y_{2})
} {
  \pd_{1}(\chi_{1, 1}, \psi_{1, 1} \mid Y_{1})
  \pd_{2}(\phi_{1 \cap 2, 1}(\chi_{1, 1}), (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = 2}^{N}, \phi_{2 \cap 3}, \psi_{2} \mid Y_{2})
}
\frac{
  \pd_{1}(\chi_{1, 1}, \psi_{1, 1} \mid Y_{1})
} {
  \pd_{1}(\chi_{1, 1}^{*}, \psi_{1, 1}^{*} \mid Y_{1})
} \\
&= \frac {
  \pd_{2}(\phi_{1 \cap 2, 1}(\chi_{1, 1}^{*}), (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = 2}^{N}, \phi_{2 \cap 3}, \psi_{2} \mid Y_{2})
} {
  \pd_{2}(\phi_{1 \cap 2, 1}(\chi_{1, 1}), (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = 2}^{N}, \phi_{2 \cap 3}, \psi_{2} \mid Y_{2})
},
\end{aligned}
\end{equation}
which does not depend on $\pd_{1}$.
We store the value of $k_{1}^{*}$ associated with the proposal (if it is accepted) in order to resample the stage one values of $\psi_{1, 1}$ to reflect the information in the other submodels.

### Sub-step $n$

At sub-step $n$, for $1 < n \leq N$ with $(\chi_{1, i})_{i = N + 1}^{N} = \varnothing$, we have updated $((\chi_{1, i}, \psi_{1, i})_{i = 1}^{n - 1})$ to $((\chi_{1, i}^{[t]}, \psi_{1, i}^{[t]})_{i = 1}^{n - 1})$.
Thus the target is
\begin{align}
  \pd_{1} & \left(\chi_{1, n}, \psi_{1, n} \mid Y_{1}, (\chi_{1, i}^{[t]}, \psi_{1, i}^{[t]})_{i = 1}^{n - 1}, (\chi_{1, i}, \psi_{1, i})_{i = n + 1}^{N}\right) \\
  & \times \pd_{2} \left(\phi_{1 \cap 2, n}(\chi_{1, n}) \mid Y_{2}, \psi_{2}, \phi_{2 \cap 3}, (\phi_{1 \cap 2, i}(\chi_{1, i}^{[t]}))_{i = 1}^{n - 1}, (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = n + 1}^{N}\right)
  \nonumber \\
  = & \pd_{1} \left(\chi_{1, n}, \psi_{1, n}, \mid Y_{1} \right)
  \frac {
    \pd_{2} \left(\phi_{1 \cap 2, n}(\chi_{1, n}), \psi_{2}, \phi_{2 \cap 3}, (\phi_{1 \cap 2, i}(\chi_{1, i}^{[t]}))_{i = 1}^{n - 1}, (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = n + 1}^{N} \mid Y_{2}\right)
  } {
    \pd_{2} \left(\psi_{2}, \phi_{2 \cap 3}, (\phi_{1 \cap 2, i}(\chi_{1, i}^{[t]}))_{i = 1}^{n - 1}, (\phi_{1 \cap 2, i}(\chi_{1, i}))_{i = n + 1}^{N} \mid Y_{2}\right)
  }
  \label{eqn:two-to-N-target}
\end{align}
We propose $\chi_{1, n}^{*}, \psi_{1, n}^{*}$ in the same manner as the previous section.
The corresponding acceptance probability is
\begin{equation}
\begin{aligned}
\alpha ((\chi_{1, n}^{*}, \psi_{1, n}^{*}), & (\chi_{1, n}, \psi_{1, n}) )  \\
= &\frac {
  \pd_{1} \left(\chi_{1, n}^{*}, \psi_{1, n}^{*} \mid Y_{1} \right)
} {
  \pd_{1} \left(\chi_{1, n}, \psi_{1, n} \mid Y_{1} \right)
} \\
& \times \frac {
  \pd_{2}(\phi_{1 \cap 2, n}(\chi_{1, n}^{*}), \psi_{2}, \phi_{2 \cap 3}, (\chi_{1, i}^{[t]})_{i = 1}^{n - 1}, (\chi_{1, i})_{i = n + 1}^{N} \mid Y_{2})
} {
  \pd_{2}(\phi_{1 \cap 2, n}(\chi_{1, n}), \psi_{2}, \phi_{2 \cap 3}, (\chi_{1, i}^{[t]})_{i = 1}^{n - 1}, (\chi_{1, i})_{i = n + 1}^{N} \mid Y_{2})
} \\
& \times \frac{
  \pd_{1}(\chi_{1, n}, \psi_{1, n} \mid Y_{1})
} {
  \pd_{1}(\chi_{1, n}^{*}, \psi_{1, n}^{*} \mid Y_{1})
} \\
= &\frac {
  \pd_{2}(\phi_{1 \cap 2, n}(\chi_{1, n}^{*}), \psi_{2}, \phi_{2 \cap 3}, (\chi_{1, i}^{[t]})_{i = 1}^{n - 1}, (\chi_{1, i})_{i = n + 1}^{N} \mid Y_{2})
} {
  \pd_{2}(\phi_{1 \cap 2, n}(\chi_{1, n}), \psi_{2}, \phi_{2 \cap 3}, (\chi_{1, i}^{[t]})_{i = 1}^{n - 1}, (\chi_{1, i})_{i = n + 1}^{N} \mid Y_{2})
}.
\end{aligned}
\end{equation}