Abstract

The INLA package supports different kinds of data likelihoods, including the inla.surv, which is a generalization of the Surv from the survival package, and mdata. It also support joint modeling different kind of outcomes. In this vignette we illustrate the case when we have three outcomes, of different type, and model them jointly.

Introduction

Usually correlations between outcomes are modeled by sharing common model terms in the linear predictor. These terms can be covariates, random effects or sums of these. In some cases, the likelihood requires more information than just the vector of observations. As an example, the likelihood for survival models requires both observed times and information about censoring.

In this vignette we consider a joint model for three outcomes, simulate data from this model and illustrate how to fit the joint model accounting for shared terms. The linear predictor for each outcome includes covariates and a spatial random effect modeled using the SPDE approach.

The first outcome is known at an aggregated level, in a setting were we model it considering a kind of likelihood designed in INLA to account for all the information available with respect to the aggregation. The second outcome is assumed to be censored, so that we have to consider the observed value and the censoring information. The third outcome is just the usual kind of data: one scalar per data unit.

The spatial domain

The SPDE models available in INLA could be applied for processes in d\mathbb{R}^d or 𝕊d\mathbb{S}^d, for d=1,2d=1,2, and the theory in Lindgren, Rue, and Lindström (2011) include higher dimentional domains. We consider a spatial domain, D2D \in \mathbb{R}^2 to keep the example simple. Without loss off generality we consider DD as a rectangle defined as

bb <- rbind(c(0, 10), c(0, 7))

Next, we store the length of each side and its average for later use

rxy <- apply(bb, 1, diff)
rr <- mean(rxy)

We sample a set of nn locations, l1,...,lnl_1, ..., l_n, in this domain, liDl_i \in D, completely at random as follows:

n <- 3000
loc <- cbind(
    runif(n, bb[1, 1], bb[1, 2]),
    runif(n, bb[2, 1], bb[2, 2]))

A spatial field

Instead of simulating from a model to play as a spatial smooth term, we define a smooth function on this domain as follows

sfn <- function(a, b)
    sin(a - b) + cos(a + b)

This function is evaluated at the nn locations and visualized with

u <- sfn(2 * pi * loc[, 1] / rr,
         2 * pi * loc[, 2] / rr) 
summary(u)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> -1.99813 -0.77879 -0.02521 -0.04554  0.65591  1.99664
par(mar = c(2, 2, 0, 0), mgp = c(1.5, 0.5, 0), las = 1)
plot(loc, asp = 1, pch = 19, cex = 0.5 + (2 + u)/4, xlab = "", ylab = "")

Spatial sub-domains

In order to define the first outcome, let us partition our domain into a set of sub-regions, r1,r2,...,rgr_1, r_2, ..., r_g, so that riri=r_i \cap r_i = \oslash and i=1grj=D\cup_{i=1}^g r_j = D.

For simplicity of coding, but without loss of generality, we assume these small sub-regions to be pixels based in the following set of centers:

h <- 1 ## size of the side of each sub-region
group.ce <- as.matrix(expand.grid(
    seq(bb[1, 1] + h/2, bb[1, 2], h),
    seq(bb[2, 1] + h/2, bb[2, 2], h)))
(ng <- nrow(group.ce))
#> [1] 70

We can avoid the need of dealing with the general spatial objects and operations available in R and identify each location lil_i to each sub-region rjr_j with an integer vector as

group.id <- ceiling(loc[,1]/h) + 
  ceiling(rxy[1]/h) * (ceiling(loc[,2]/h)-1)

This will be used as a grouping index, so that each group is made of the observations falling in each of the pixels. We will visualize the locations with color depending on this grouping later.

Covariates

Let us consider a set of three covariates for each location as

xxx <- cbind(x1 = runif(n), x2 = runif(n), x3 = runif(n))

We average the covariates by group as we will consider that we have data at the group level to model the first outcome.

xxx.g <- aggregate(xxx, list(g = group.id), mean)
str(xxx.g)
#> 'data.frame':    70 obs. of  4 variables:
#>  $ g : num  1 2 3 4 5 6 7 8 9 10 ...
#>  $ x1: num  0.478 0.535 0.399 0.583 0.467 ...
#>  $ x2: num  0.424 0.567 0.483 0.532 0.508 ...
#>  $ x3: num  0.492 0.497 0.483 0.496 0.48 ...

Outcomes

We will consider three outcomes.

First outcome: aggregated Gaussian

The first outcome is Gaussian distributes, with unknown mean and variance, and a known scaling factor, ss, such that yiN(μi,siσ2). y_i \sim \textrm{N}(\mu_i, s_i\sigma^2).

We assume 𝛈y={μ1,...,μn}\boldsymbol{\eta}_y = \{\mu_1, ..., \mu_n\} and 𝛈y=𝐗𝛃y+𝐮. \boldsymbol{\eta}_y = \mathbf{X}\boldsymbol{\beta}_y + \mathbf{u}. This is being evaluated considering the covariate and random effect averaged per group:

beta.y <- c(5, 0, -3, 3) 
u.g <- tapply(u, group.id, mean)
eta.y.g <- drop(cbind(1, as.matrix(xxx.g[, 2:ncol(xxx.g)])) %*% beta.y) + u.g

We will simulate the scaling factor for each observation with

s <- rgamma(n, 7, 3)
summary(s)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.209   1.673   2.215   2.327   2.857   7.060
sigma.y <- 1
y <- rnorm(n, eta.y.g[group.id], sqrt(sigma.y/s))
summary(y)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.566   4.143   4.944   4.958   5.764  10.431

We simulated yiy_i for each location lil_i. However, we consider that observations of this variable are available only at the aggregated level, at each of the sub-regions rjr_j. Furthermore, for each sub-region rjr_j, we know

  • how many observations were taken in each sub-region, nj=i=1nI(lirj)n_j = \sum_{i=1}^n I(l_i \in r_j)
  • the weighted average y¯j\overline{y}_j = (nj)1i,lirjsiyi(n_j)^{-1}\sum_{i,l_i\in r_j} s_iy_i,
  • the sum o the scaling factors, mjm_j = i,lirjsi\sum_{i,l_i\in r_j} s_i,
  • the half of the sum of the log of the scaling factors, 12i,lirjlog(si)\frac{1}{2}\sum_{i,l_i\in r_j} \log(s_i),
  • and the sample variance within each unit, defined as vjv_j = 1nji,lirjn(siyi2y¯j2)\frac{1}{n_j}\sum_{i,l_i\in r_j}^n(s_i y_i^2-\overline{y}_j^2)

The function inla.agaussian computes these five statistics, given yiy_i and sis_i for each group. We use this function to compute these statistics, as follows

library(INLA)
agg <- lapply(1:ng, function(g) {
    ig <- which(group.id==g)
    if(length(ig)>0) 
        return(inla.agaussian(y[ig], s[ig]))
    return(inla.mdata(list(NA, 0, 1, 1, NA))) ### a deal with missing
})
str(YY <- Reduce('rbind', lapply(agg, unlist))) ## five columns matrix
#>  num [1:70, 1:5] 0.577 0.34 0.495 0.476 0.415 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:70] "init" "" "" "" ...
#>   ..$ : chr [1:5] "Y1" "Y2" "Y3" "Y4" ...

Second outcome: Survival

For the second outcome we choose a Weibull distribution with the following parametrization fW(wi)=αwiα1λiαe(λiwi)α. f_W(w_i) = \alpha w_i^{\alpha - 1} \lambda_i^\alpha \textrm{e}^{-(\lambda_i w_i)^\alpha}.

We define the linear predictor as 𝛈w={log(λ1),...,log(λn)}\boldsymbol{\eta}_w = \{\log(\lambda_1), ..., \log(\lambda_n)\} and 𝛈w=𝐗βw+γw𝐮 \boldsymbol{\eta}_w = \mathbf{X}\beta_w + \gamma_w\mathbf{u} and we consider the following code to sample from a Weibull distribution

beta.w <- c(1, -1, 0, 1)
alpha <- 2
gamma.w <- 0.5
lambdaw <- exp(cbind(1, xxx) %*% beta.w + gamma.w * u)
we0 <- rweibull(n, shape = alpha, scale = 1/lambdaw)
summary(we0)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> 0.003639 0.160146 0.298233 0.410024 0.531580 3.814975

However, we suppose that some observations are censored

summary(u.ev <- runif(n, 0.3, 1)) ## censoring factor
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.3001  0.4823  0.6595  0.6559  0.8347  0.9998
table(event <- rbinom(n, 1, 0.5)) ## censored (=0) or event (=1)
#> 
#>    0    1 
#> 1507 1493
summary(we <- ifelse(event == 1, we0, we0 * u.ev)) ## censored outcome
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> 0.002706 0.121893 0.238876 0.338448 0.439904 3.814975

Third outcome: Count under exposure

We consider that we have different exposure at each location

summary(ee <- rgamma(n, 10, 2))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.012   3.893   4.858   4.998   5.966  11.994

We assume an outcome following a Poisson distribution with OiPoisson(δiEi). O_i \sim \textrm{Poisson}(\delta_i E_i).

We assume the linear predictor as 𝛈p={log(δ1),...,log(δn)}\boldsymbol{\eta}_p = \{\log(\delta_1),\;...,\; \log(\delta_n)\} and 𝛈p=𝐗βp+γp𝐮 \boldsymbol{\eta}_p = \mathbf{X}\beta_p + \gamma_p\mathbf{u}

This outcome is being simulated with

beta.p <- c(2, 1, -1, 0)
gamma.p <- -0.3
delta.p <- exp(cbind(1, xxx) %*% beta.p + gamma.p * u)
po <- rpois(n, delta.p * ee)
summary(po)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    3.00   23.00   36.00   42.94   55.00  234.00

Data model setup

We now have to setup the objects in order to fit the model.

SPDE model setup

We have to define a mesh over the domain. We do it considering the following code

mesh <- inla.mesh.2d(
    loc.domain = matrix(bb[c(1,3,3,1,1, 2,2,4,4,2)], ncol = 2), 
    offset = c(.0001, .5) * rr, ## extensions size
    max.edge = c(.05, .3) * rr, 
    cutoff = 0.025 * rr)
#> Warning: `inla.mesh.2d()` was deprecated in INLA 23.08.18.
#>  Please use `fmesher::fm_mesh_2d_inla()` instead.
#>  For more information, see
#>   https://inlabru-org.github.io/fmesher/articles/inla_conversion.html
#>  To silence these deprecation messages in old legacy code, set
#>   `inla.setOption(fmesher.evolution.warn = FALSE)`.
#>  To ensure visibility of these messages in package tests, also set
#>   `inla.setOption(fmesher.evolution.verbosity = 'warn')`.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
mesh$n
#> [1] 1152

This mesh and the locations colored by the group defined for the first outcome can be visualized with

par(mar = c(0, 0, 1, 0), mgp = c(1.5, 0.5, 0))
plot(mesh, asp = 1, edge.color = gray(0.5))
points(loc, pch = 19, col = group.id, cex = 0.3 + (2 + u)/4)

The SPDE model is defined with

spde <- inla.spde2.pcmatern(
    mesh,
    prior.range=c(1, 0.1),
    prior.sigma=c(1, 0.1))

Linear predictor components

We will fit a joint model considering the linear predictor with terms as how it was simulated. The inla.stack() function in meant to me used when working with SPDE models. However, it also helps when considering multiple likelihood data as detailed below.

One way to deal with multiple likelihoods in INLA is defining matrix data for the outcome where each column is used for each likelihood. However, in this vignette we are dealing with more complex cases. Therefore, he need to use list in the left hand side of the formula. This is the key point to implement this example.

The right hand side of the formula includes each term in the linear predictor. When working with multiple likelihood, each term in each likelihood should be named uniquely. For example, if two likelihoods include the same covariate but with a different coefficient, each one has to have its own name. Therefore, we will pass the covariate x1 as x1y, for the first outcome, x1w for the second outcome, and x1p for the third outcome. Similarly for the other ones, including the intercept.

In our example, the SPDE model is in the linear predictor for the first outcome, a copy from this random effect in in the linear predictor for the second outcome, and another copy of this term is in the linear predictor for the third outcome.

The model formula considering all of these details is defined as

fff <- list(
  inla.mdata(Y),
  inla.surv(w, ev),
  o) ~ 0 +
  a1 + x1y + x2y + x3y + f(s, model = spde) +
  a2 + x1w + x2w + x3w + f(sc1, copy = "s", fixed = FALSE) +
  a3 + x1p + x2p + x3p + f(sc2, copy = "s", fixed = FALSE)

Building data stacks

The data has to be organized so it contains all the information needed, that is all the outcomes, covariates and random effect indexes. For the SPDE models it also has to include the projector matrices.

The main work of the inla.stack() function is to check and organize outcomes, effects and projector matrices in order to guarantee that the dimentions match. Additionally, inla.stack() will deal with the problem of arranging the outcomes to work properly with multiple likelihoods.

For the first outcome, we need to supply a five column matrix, each with one of the five statistics previously computed for the ng groups. For the effects we define an intercept, covariates with names according to what the formula specify and the index for the SPDE model. The projector matrices is one for the fixed effect, which is just 1, and the one for the SPDE model.

We will include a different tag for each stack to keep track. This is useful to collect results associated to each outcome. For each outcome we can also define an indicator vector to identify it. This is useful for the case of having some missing data. In this case the indicator can be used to identify which link to be used for making the predictions.

The stack for the first outcome can be defined as

stackY <- inla.stack(
    tag = 'y',
    data = list(Y = YY),
    effects= list(
        data.frame(a1 = 1,
                   x1y = xxx.g$x1, 
                   x2y = xxx.g$x2, 
                   x3y = xxx.g$x3),
        s = 1:mesh$n),
    A=list(1,
           inla.spde.make.A(mesh, group.ce))
)

For the second outcome, we have to supply two vectors (observed time and censoring) to build the inla.surv data, the covariates, with names according to the terms in the formula, and the projector matrices, the single 1 for the covariates and the projector for the shared SPDE term.

The stack for the second outcome can be defined as

stackW <- inla.stack(
    tag = 'w', 
    data = list(w = we,
                ev = event), 
    effects = list(
        data.frame(a2 = 1,
                   x1w = xxx[, 1], 
                   x2w = xxx[, 2], 
                   x3w = xxx[, 3]),
        sc1 = 1:mesh$n),
    A = list(1,
             inla.spde.make.A(mesh, loc))
)

For the third outcome we have to supply the vector of counts and the vector with the exposure. The effects and projector are similar with the previous others, just that we have to account for its names.

The stack for the third outcome can be defined as

stackP <- inla.stack(
    tag = 'p', 
    data = list(o = po,
                E = ee),
    effects = list(
        data.frame(a3 = 1,
                   x1p = xxx[, 1], 
                   x2p = xxx[, 2], 
                   x3p = xxx[, 3]),
        sc2 = 1:mesh$n),
    A = list(1,
             inla.spde.make.A(mesh, loc))
)

Fitting the model and some results

We now fit the model and look at some results.

Model fitting

To fit the model we start by joining all the data into one data stack

stacked <- inla.stack(stackY, stackW, stackP)

We now supply this to the inla() function in order to fit the model

result <- inla(
   formula = fff,
    family = c("agaussian", "weibullsurv", "poisson"),
    data = inla.stack.data(stacked),
    E = E, 
    control.predictor = list(
        A=inla.stack.A(stacked)),
    control.family = list(
        list(),
        list(variant = 1),
        list()), 
   control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE)
)
result$cpu
#> NULL

Summaries

We now compare the true value for each fixed effect with its posterior summary.

round(cbind(
    true = c(beta.y, beta.w, beta.p),
    result$summary.fix), 2)
#>     true  mean   sd 0.025quant 0.5quant 0.975quant  mode kld
#> a1     5  5.70 1.96       1.64     5.70       9.76  5.70   0
#> x1y    0  0.17 0.35      -0.51     0.17       0.85  0.17   0
#> x2y   -3 -3.11 0.37      -3.84    -3.11      -2.38 -3.11   0
#> x3y    3  2.53 0.43       1.69     2.53       3.37  2.53   0
#> a2     1  1.03 1.02      -1.09     1.02       3.15  1.02   0
#> x1w   -1 -0.99 0.04      -1.07    -0.99      -0.91 -0.99   0
#> x2w    0 -0.01 0.04      -0.09    -0.01       0.07 -0.01   0
#> x3w    1  1.07 0.04       1.00     1.07       1.15  1.07   0
#> a3     2  1.84 0.60       0.59     1.85       3.09  1.85   0
#> x1p    1  1.00 0.01       0.98     1.00       1.02  1.00   0
#> x2p   -1 -0.98 0.01      -1.00    -0.98      -0.96 -0.98   0
#> x3p    0  0.01 0.01      -0.01     0.01       0.03  0.01   0

For the hyper-parameters, we have one for the agaussian likelihood, one for the weibulsuv, two for the SPDE model and two that define the scaling for the shared SPDE terms in the second and third outome.

round(cbind(true = c(1/sigma.y^2, alpha, NA, NA, gamma.w, gamma.p), 
            result$summary.hy), 2)
#>                                            true  mean   sd 0.025quant 0.5quant
#> Precision for the AggGaussian observations  1.0  1.01 0.03       0.96     1.01
#> alpha parameter for weibullsurv[2]          2.0  2.27 0.06       2.18     2.27
#> Range for s                                  NA 15.38 2.96      10.73    15.00
#> Stdev for s                                  NA  1.72 0.27       1.28     1.69
#> Beta for sc1                                0.5  0.54 0.02       0.50     0.54
#> Beta for sc2                               -0.3 -0.31 0.01      -0.32    -0.31
#>                                            0.975quant  mode
#> Precision for the AggGaussian observations       1.06  1.01
#> alpha parameter for weibullsurv[2]               2.41  2.24
#> Range for s                                     22.31 14.07
#> Stdev for s                                      2.34  1.60
#> Beta for sc1                                     0.57  0.54
#> Beta for sc2                                    -0.30 -0.31

Results considering each outcome

To collect outputs associated with each data, we need to consider the tag from each data stack to get the data index from the joint stack. For example, the data index for the third outcome is

idx1 <- inla.stack.index(stacked, tag = "p")$data

We can see the summary for the fitted values for the first few observations of this outcome with

head(cbind(true = delta.p, 
           result$summary.fitted.values[idx1, ]), 3)
#>                             true      mean        sd 0.025quant  0.5quant
#> fitted.APredictor.3071  5.920626  5.889987 0.1674931   5.567970  5.887713
#> fitted.APredictor.3072  4.326089  4.242086 0.1142313   4.022118  4.240658
#> fitted.APredictor.3073 10.401938 10.690993 0.3198927  10.077774 10.686015
#>                        0.975quant      mode
#> fitted.APredictor.3071   6.224950  5.883205
#> fitted.APredictor.3072   4.470182  4.237828
#> fitted.APredictor.3073  11.332578 10.676140

For access the DIC, WAIC and log score can consider the family indicator stored in the DIC output as follows

str(result$dic)
#> List of 14
#>  $ dic              : num 25497
#>  $ p.eff            : num 173
#>  $ mean.deviance    : num 25324
#>  $ deviance.mean    : num 25151
#>  $ dic.sat          : num 6359
#>  $ mean.deviance.sat: num 6186
#>  $ deviance.mean.sat: num 6010
#>  $ family.dic       : num [1:3] 6216.5 -53.1 19333.4
#>  $ family.dic.sat   : num [1:3] 94.8 3153.2 3113.8
#>  $ family.p.eff     : num [1:3] 37.8 23.8 111.3
#>  $ family           : num [1:6070] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ local.dic        : num [1:6070] 103.5 92.1 115.5 101.7 108.9 ...
#>  $ local.dic.sat    : num [1:6070] 1.44 1.25 1.14 1.92 1.24 ...
#>  $ local.p.eff      : num [1:6070] 0.676 0.543 0.627 0.527 0.643 ...
str(result$waic)
#> List of 4
#>  $ waic       : num 24277
#>  $ p.eff      : num 148
#>  $ local.waic : num [1:6070] 72.9 72.5 72.5 73.2 72.5 ...
#>  $ local.p.eff: num [1:6070] 0.382 0.229 0.185 0.574 0.193 ...
str(result$cpo)
#> List of 3
#>  $ cpo    : num [1:6070] 3.27e-23 1.05e-20 9.66e-26 6.48e-23 2.57e-24 ...
#>  $ pit    : num [1:6070] 0.882 0.279 0.347 0.973 0.499 ...
#>  $ failure: num [1:6070] 0 0 0 0 0 0 0 0 0 0 ...
result$dic$family.dic ### DIC for each outcome
#> [1]  6216.45015   -53.14955 19333.39448
tapply(result$waic$local.waic, 
       result$dic$family, ## taken from dic family id
       sum) ## WAIC for each outcome
#>           1           2           3 
#>  5000.23765   -54.04986 19330.61060
tapply(result$cpo$cpo, 
       result$dic$family, ## taken fron dic family id
        function(x) 
         c(nlCPO = -sum(log(x)))) ## -sum log CPO for each outcome
#>          1          2          3 
#> 3109.47483  -26.98744 9665.64534

References

Lindgren, F., H. Rue, and J. Lindström. 2011. “An Explicit Link Between Gaussian Fields and Gaussian Markov Random Fields: The Stochastic Partial Differential Equation Approach (with Discussion).” J. R. Statist. Soc. B 73 (4): 423–98.