Scaling IGMRF-models in R-INLA

Intrinsic Gaussian Markov random fields (IGMRFs) are widely used as priors to model latent dependency structures in . Examples of this type of models include the , , , , and the models, see for further descriptions. All of these models can be described in terms of having an improper Gaussian density with a scaled precision matrix that reflects the neighbourhood structure of the model. The scaling is represented as a random precision parameter, and applying these models the user has to define a hyperprior for this parameter. This is a rather delicate issue and such hyperpriors are often selected in a rather ad-hoc and subjective manner. One problem is that a specific fixed hyperprior for the precision parameter does not have the same interpretation for different IGMRFs.

The aim of this tutorial is to introduce and demonstrate a recently implemented option in named , which can be added when specifying the inla formula-call for the different IGMRFs. For example, we might specify the formula as

formula = y ~  f(idx, model = "rw2", scale.model = TRUE, hyper=..)

By defining , the -model is scaled to have a generalized variance equal to 1, see Section 2 for further explanations. The new option is also available using ,

f(idx, model = "..", group = g, control.group = list(model = "rw2", scale.model = TRUE))

which is relevant for the , and the -models. Note that by default for all models to make the option backwards-compatible.

The purpose of scaling the models is to ensure that a fixed hyperprior for the precision parameter has a similar interpretation for different types of IGMRFs. In general, IGMRFs can be seen as models that reflect local deviation from an unspecified overall level. More specifically, the models considered here penalize local deviation from a constant level (, , ), a line () or a plane (). A priori, the hyperprior chosen for the precision parameter will influence how large we allow this local deviation to be. It seems reasonable to select hyperpriors such that different random effects in a regression model are assumed to have a similar range for this deviation. By scaling the models, this is achieved by assigning the same fixed hyperprior to the precisions of all IGMRFs in the regression model. If we do this using unscaled IGMRFs (as often done in the literature), the influence using different hyperprior choices is less controllable as the variances of the IGMRFs differ.

The hyperpriors for scaled models will also be invariant to the shape and size of the graph for a specific IGMRF and to different scalings of covariates. By mapping the random precision to the marginal variance of an IGMRF, we also get an alternative interpretation of different parameter choices for a hyperprior, as these can be viewed in terms of the local deviation, see Section 2 for further details.

The new option is based on ideas introduced in Sørbye and Rue (2014) and these ideas have also been applied in Papoila et al. (2014). Note that the implementation in Sørbye and Rue (2014) is slightly different as there the hyperprior is assigned to scaled precision parameters, while the current option scales the IGMRFs directly. The results are the same, but the new option makes the ideas in Sørbye and Rue (2014) very easy to apply. We demonstrate this in Section 3, analysing two semiparametric regression models. Concluding remarks are given in Section 4.

An IGMRF can be defined as a vector ${x}=(x_1,\ldots , x_n)$ , having an improper Gaussian density with a sparse precision matrix which is not of full rank. In Rue and Held (2005), the order of an IGMRF is defined as the rank deficiency of its precision matrix. This implies that a zero-mean IGMRF of $k$ th order is specified by $\begin{equation*} \pi({x}) = (2\pi)^{-(n-k)/2}(|{Q}|^*)^{1/2} \exp\{-\frac{1}{2}{x}^TQ{x}\}, \end{equation*}$ where the precision matrix ${Q}$ is assumed to be semi-positive definite. $|{Q}|^*$ denotes the generalized determinant equal to the product of the $n-k$ non-zero eigenvalues of ${Q}$ . The marginal variances of ${x}$ are given by the diagonal elements of the generalized inverse matrix ${Q}^-$ , that is $\sigma^2(x_i) = Q^-_{ii}, \quad i=1,\ldots, n.$ Different IGMRFs can be described by expressing the precision matrix as ${Q}=\tau{R}$ . Here, $\tau$ denotes the random precision parameter while the matrix ${R}$ reflects the specific neighbourhood structure of the model which can be represented as a labelled graph with nodes and edges. This implies that the marginal variances/standard deviations of the IGMRF will depend on both the structure and size of ${R}$ . For example, we can consider the marginal standard deviations of the and -models when the precision is fixed as $\tau= 1$ . Due to the different structure matrices of these two models, the levels for the marginal standard deviations are quite different, see Figure 1. This illustrates that it seems unreasonable to assign the exact same hyperprior for the precision parameters of these two models when these parameters are random.

Notice that for a fixed precision $\tau$ , the marginal variances of the components of ${x}$ can be expressed as a function of $\tau$ by $\begin{equation} \sigma^2_{\tau}(x_i)=\frac{\sigma^2_{\{\tau=1\}}(x_i)}{\tau},\quad i=1,\ldots , n.\label{eq:sigma.xi} \end{equation}$ To characterize the level of the marginal variances when $\tau=1$ , we define the generalized variance of ${x}$ as the geometric mean $$\begin{equation*} \sigma^2_{\mbox{\scriptsize{GV}}}(x) = \exp\left(\frac{1}{n}\sum_{i=1}^n \log(\sigma^2_{\{\tau=1\}}(x_i))\right). \label{eq:sigma.gv} \end{equation*}$$ When , IGMRFs are scaled such that $\sigma^2_{\mbox{\scriptsize{GV}}}(x) =1$. This implies that the precision parameters of different models have similar interpretation which makes it reasonable to assign the same fixed hyperprior to precision parameters of different IGMRFs in a regression model. Note that in Sørbye and Rue (2014), the generalized variance was referred to as the reference variance $\sigma^2_{\mbox{\scriptsize{ref}}}(x)$. The hyperprior was then assigned to the scaled precision $\tau/\sigma^2_{\mbox{\scriptsize{ref}}}(x)$ to achieve the same interpretation for different models.

# Functions to calculate the marginal standard deviations of rw1 and rw2
ss.rw1.sdref = function(values,tau) { 
    n = length(values)
    stopifnot(!missing(n)) 
    stopifnot(n > 3) 
    y = NA
    
    formula = y ~ -1 + f(values, model = "rw1", constr = TRUE, 
                         values = values, diagonal = 1e-10, 
                         hyper = list(prec = list(initial = log(tau), fixed = TRUE))) 
    r = inla(formula, data = data.frame(y, values), family = "gaussian",
             control.family = list(fixed = TRUE), 
             control.compute = list(return.marginals = F), verbose = F)
    sd = r$summary.random$values$sd 
    return (sd) 
    }

ss.rw2.sdref = function(values,tau) { 
    n = length(values)
    stopifnot(!missing(n)) 
    stopifnot(n > 3)
    y = NA 
    
    A1 = matrix(1, n, 1) 
    A2 = matrix(0, n, 1) 
    for(i in 1:n) A2[i, ]= i 
    
    A = rbind(c(A1), c(A2)) 
    e = rep(0, 2) 
    extraconstr = list(A = A, e= e)
    
    formula = y ~ -1 + f( values, model="rw2", constr = FALSE,
                          values = values, diagonal = 1e-10, extraconstr = extraconstr,
                          hyper = list(prec = list(initial = log(tau), fixed = TRUE)))
    
    r = inla(formula, data = data.frame(y, values), family = "gaussian", 
             control.family = list(fixed = TRUE),
             control.compute = list(return.marginals = F), verbose=F)
    
    sd = r$summary.random$values$sd 
    return (sd) 
}

# The marginal standard deviations for n nodes
n = 100 
sd.rw1 = ss.rw1.sdref(1:n, 1) 
sd.rw2 = ss.rw2.sdref(1:n, 1)

As already mentioned, IGMRFs penalize local deviation from an unspecified overall level. The marginal standard deviation of the models, integrating out the random precision, give information on how large we allow this local deviation to be. Following Sørbye and Rue (2014), we apply the identity in () also when $\tau$ is random and define an upper limit $U$ for the marginal standard deviation of a model by $$\begin{equation} P(\sigma(x_i)>U)\approx P\left(\tau\frac{\sigma^2_{\mbox{\scriptsize{GV}}}(x)}{U^2}\right)=\alpha, \label{eq:tau} \end{equation}$$ where $\alpha$ is a fixed small probability. For scaled models, this implies that the parameter choices for the hyperprior of $\tau$ has the same interpretation in terms of $U$ , for all models ${x}$ .

In , precision parameters are assigned Gamma distributions by default (but any other distribution can be specified by the user). Assume that $\tau\sim \mbox{Gamma}(a,b)$ , where $a$ denotes the shape parameter and $b$ the inverse-scale parameter. Then $b\tau\sim\mbox{Gamma}(a,1)$ and () implies that $$\begin{equation} \frac{b\sigma^2_{\mbox{\scriptsize{GV}}}(x)}{U^2} = F^{-1}(\alpha,a,1), \label{eq:upper.limit} \end{equation}$$ where $F^{-1}(\cdot)$ denotes the inverse cumulative distribution function of the Gamma distribution. For scaled models, we again notice that the interpretation of $a$ and $b$ stays the same for all ${x}$ and the upper limit $U$ is easily calculated as

U = sqrt(b/qgamma(alpha, a, 1))

For example, if we choose the default parameter values in , $a = 1$ and $b = 5\cdot 10^{-5}$ , the upper limit equals $U\approx 0.22$ when $\alpha =0.001$ . Notice that the explicit value of $U$ has to be interpreted with care as this depends on the selected value of $\alpha$ . Also, $U$ is not uniquely defined and different combinations of $a$ and $b$ giving the same values of $U$ , do not in general give the same estimated results.

The new option makes it very easy to compare results using unscaled and scaled models. Here we consider two examples, analysing semiparametric regression models. Other examples are given in Sørbye and Rue (2014) and Papoila et al. (2014).

We first consider a simple semiparametric regression model in which $n$ observations are generated by $\begin{equation} y_i = f(x_i) + \epsilon_i,\quad i=1,\ldots, n. \label{eq:regr} \end{equation}$ The noise terms, $\epsilon_i$ , are assumed independent and normally distributed with mean 0 and a fixed standard deviation $\sigma$ . The aim is to estimate the underlying smooth function $f(.)$ .

Applying the -model as a prior, the formula-call using the scaled model is defined as

formula = y ~  f(x, model = "rw2", scale.model = TRUE, hyper=...)

where the hyperprior assigned to the precision parameter is specified in . Here, we just select the default hyperprior, that is $\tau \sim \mbox{Gamma}(1,5\cdot 10^{-5})$ .

Let the unknown true function $f(.)$ be defined as $f(x)=\frac{x}{t}+\sin(\frac{2\pi x}{t})-0.5,\quad x\in(0,t).$ We generate $n = 101$ observations based on () in which $\sigma=0.5$ and call :

result = inla(formula, family = "gaussian", data = data.frame(y, x))

We then plot the estimated function

result$summary.random$x$mean

both using the unscaled and scaled models for three different intervals of

x

, see Figure 2.

Using the scaled model, the results using a fixed hyperprior is invariant to different scalings of the covariate and we get exactly the same estimated curve in all three cases. For the unscaled model, the marginal variance will decrease/increase as the distance between nodes decreases/increases. By using the same fixed hyperprior for the three cases, we implicitly assume very different values for the upper limit $U$ in (), see Table 1. In the extreme case of assuming a very low value of $U$ , the resulting estimate is just a straight line.

As noted in Akerkar, Martino, and Rue (2010) and Sørbye and Rue (2014), a given hyperprior for the precision of unscaled models can be adjusted to account for different scalings of the covariate, adjusting the inverse-scale parameter of the Gamma distribution. However, by scaling the models, we don’t have to think about this at all as a given fixed prior will give the same estimated curve.

The linear predictor of a latent Gaussian model often consists of a combination of random effects modelled by different IGMRFs. As an example we will take a look at the Munich rental data analysed in Section 4.2.2 in Rue and Held (2005), also given as an example in Volume I at . The response variable in this dataset is rent per square meter (in Euros) for a flat and the covariates include spatial location, floor space, year of construction and various indicator variables. Smooth effects of the covariates are modelled using the -model, while the spatially structured effect for location is modelled using the -model. Previously, this data has been analysed using a fixed Gamma $(1,0.01)$ for all the precision parameters in the model. The inverse-scale parameter of the Gamma distribution can be adjusted to account for different types of IGMRFs by scaling this parameter with the generalized variance, see Sørbye and Rue (2014), but it is easier to just use the new option of equal to TRUE or FALSE.

To simplify notation let

hyper.prec = list(prec = list(param = c(1, 0.01)))

The formula call (see ) is then specified by

data(Munich)
g = system.file("demodata/munich.graph", package="INLA")

## Note that here we want to have an estimator of the effect of year
## also the for years where we have no observation, therefore we give a
## vector with all possible values assumed by the covariate year, that
## is seq(1918,2001)

formula = rent ~ -1 +
   f(location, model = "besag", graph = g, initial = 1, hyper = hyper.prec, 
     scale.model = option) +
   f(year, model = "rw2", values = seq(1918, 2001), hyper = hyper.prec, 
     scale.model = option) +
   f(floor.size, model = "rw2", hyper = hyper.prec, scale.model = option) +
   Gute.Wohnlage + Beste.Wohnlage + Keine.Wwv + Keine.Zh +
   Kein.Badkach  + Besond.Bad + Gehobene.Kueche +
   zim1 + zim2 + zim3 + zim4 + zim5 + zim6

Further, the call to inla is given as

mod = inla(formula, data = Munich, control.fixed=list(prec=1))

Figure 3 illustrates the resulting estimated effects of the covariates ‘Year of construction’ and ‘Floor size’ using the scaled and unscaled models. Applying the given hyperprior for the scaled models, we have assumed that the upper limit in () is $U\approx3.2$ . This seems to give quite reasonable results, while the estimated curves are more wiggly using the unscaled models. The results for the spatially structured effects are very similar using the scaled and unscaled models (results not shown).

Using the scaled models, we can now study sensitivity to the parameters of the Gamma-prior simultaneously for the different random effects modelled by IGMRFs, instead of doing this separately for each of the effects. For example, we can choose to keep the shape parameter fixed at $a=1$ , while the inverse-scale parameter $b$ is calculated to give specific values of $U$ . The estimated functions for the effects of the two covariates ‘Year of construction’ and ‘Floor size’ are given in Figure 4. Here, the value of $U$ ranges from $0.001$ to $30$ which corresponds to values of $b$ in the range $(10^{-9},0.9)$ . We conclude that the results are not very sensitive to changes in the inverse-scale parameter when $a=1$ . This also applies to the spatially structured effect which does not change significantly for these different parameter choices (results not shown). Naturally, we could also consider other values of $a$ .

By scaling all IGMRFs to a unit generalized variance, we avoid that the precision parameters of these models have different interpretation. This makes it reasonable to assign the same hyperprior to precision parameters of different IGMRF-models, which greatly simplifies the issue of selecting hyperpriors and performing sensitivity analysis. These ideas were introduced in Sørbye and Rue (2014), in which the precision parameter was scaled by the generalized variance. The new option in makes these ideas easy to apply as we don’t have to calculate the generalized variance for each model. We simply set the -option equal to in the inla formula-call for the relevant terms.

If you are now convinced that all IGMRF-models should in fact be scaled, you can do this in an even simpler way than already described. By specifying the following global option

inla.setOption(scale.model.default = TRUE)

all the IGMRF-models are scaled automatically. This also includes the situation using the argument mentioned in the introduction. Hyperpriors still have to be selected with care using scaled models, but the issue of selecting hyperpriors is simplified and we believe that hyperpriors can be chosen in a more meaningful and controlled way.

Akerkar, R., S. Martino, and H. Rue. 2010. “Implementing Approximate Bayesian Inference for Survival Analysis Using Integrated Nested Laplace Approximations.” Preprint Statistics, Norwegian University of Science and Technology 1: 1–38.

Papoila, A. L., A. Riebler, A. Amaral-Turkman, R. São-João, C. Ribeiro, C. Geraldes, and A. Miranda. 2014. “Stomach Cancer Incidence in Southern Portugal 1998-2006: A Spatio-Temporal Analysis.” Biometrical Journal 56: 403–15.

Rue, H., and L. Held. 2005. Gaussian Markov Random Fields. Chapman & Hall/CRC, Boca Raton.

Sørbye, S. H., and H. Rue. 2014. “Scaling Intrinsic Gaussian Markov Random Field Priors in Spatial Modelling.” Spatial Statistics 8: 39–51. https://doi.org/http://dx.doi.org/10.1016/j.spasta.2013.06.004.

Sigrunn Holbek Srbye

2014