# Bayesian Estimation of Signal Detection Models, Part 4

This post is the fourth part in a series of blog posts on Signal Detection models (Macmillan and Creelman 2005; McNicol 2005). In the first part, I described how to estimate the equal variance Gaussian SDT (EVSDT) model for a single participant, using Bayesian (generalized linear and nonlinear) modeling techniques. In the second part, I described how to estimate the equal variance Gaussian SDT model for multiple participants simultaneously, using hierarchical Bayesian models. In the third post, I showed how to esimate equal- and unequal variances SDT models for confidence rating data for a single participant.

However, we almost always want to discuss our inference about the population, not individual subjects. Further, if we wish to discuss individual subjects, we should place them in the context of other subjects. A multilevel (aka hierarchical, mixed) model accomplishes these goals by including population- and subject-level parameters.

I will again describe the software implementation in R using the brms package (Bürkner 2017; R Core Team 2017). This blog post will be shorter than the previous installments; I assume you’re familiar with the material covered in those posts.

# Hierarchical UVSDT model

## Example data set

We’ll use a data set of 48 subjects’ confidence ratings on a 6 point scale: 1 = “sure new”, …, 6 = “sure old” (Koen et al. 2013). This data set is included in the R package MPTinR (Singmann and Kellen 2013).

In this experiment (Koen et al. 2013), participants completed a study phase, and were then tested under full attention, or while doing a second task. Here, we focus on the rating data provided in the full attention condition. Below, I reproduce the aggregate rating counts for old and new items from the Table in the article’s appendix. (It is useful to ensure that we are indeed using the same data.)

For complete R code, including pre-processing the data, please refer to the source code of this blog post. I have omitted some of the less important code from the blog post for clarity.

## Model syntax

Here’s the brms syntax we used for estimating the model for a single participant:

uvsdt_m <- bf(y ~ isold, disc ~ 0 + isold)

With the above syntax we specifed seven parameters: Five intercepts (aka ‘thresholds’ in the cumulative probit model) on y1; the effect of isold on y; and the effect of isold on the discrimination parameter disc2. There are five intercepts (thresholds), because there are six response categories.

We extend the code to a hierarchical model by specifying that all these parameters vary across participants (variable id in the data).

uvsdt_h <- bf(y ~ isold + (isold |s| id),
disc ~ 0 + isold + (0 + isold |s| id))

Recall from earlier posts that using |s| leads to estimating correlations among the varying effects. There will only be one standard deviation associated with the thresholds; that is, the model assumes that subjects vary around the mean threshold similarly for all thresholds.

## Prior distributions

I set a N(1, 3) prior on dprime, just because I know that in these tasks performance is usually pretty good. Perhaps this prior is also influenced by my reading of the paper! I also set a N(0, 1) prior on a: Usually this parameter is found to be around $$-\frac{1}{4}$$, but I’m ignoring that information.

The t(7, 0, .33) priors on the between-subject standard deviations reflect my assumption that the subjects should be moderately similar to one another, but also allows larger deviations. (They are t-distributions with seven degrees of freedom, zero mean, and .33 standard deviation.)

Prior <- c(prior(normal(1, 3), class = "b", coef = "isold"),
prior(normal(0, 1), class = "b", coef = "isold", dpar = "disc"),
prior(student_t(7, 0, .33), class = "sd"),
prior(student_t(7, 0, .33), class = "sd", dpar = "disc"),
prior(lkj(2), class = "cor"))

## Estimate and summarise parameters

We can then estimate the model as before. Be aware that this model takes quite a bit longer to estimate (note that I have reduced the iterations to 500 from the default 2000).

fit <- brm(uvsdt_h,
data = d,
prior = Prior,
control = list(adapt_delta = .9), inits = 0,
cores = 4, iter = 500,
file = here::here("static/data/sdtmodel4-1"))

We then display numerical summaries of the model’s parameters. Note that the effective sample sizes are modest, and Rhats indicate that we would benefit from drawing more samples from the posterior. For real applications, I recommend more than 500 iterations per chain.

summary(fit)
##  Family: cumulative
##   Links: mu = probit; disc = log
## Formula: y ~ isold + (isold | s | id)
##          disc ~ 0 + isold + (0 + isold | s | id)
##    Data: d (Number of observations: 9502)
## Samples: 4 chains, each with iter = 500; warmup = 250; thin = 1;
##          total post-warmup samples = 1000
##
## Group-Level Effects:
## ~id (Number of levels: 48)
##                           Estimate Est.Error l-95% CI u-95% CI Eff.Sample
## sd(Intercept)                 0.35      0.04     0.28     0.43        210
## sd(isold)                     0.79      0.10     0.62     1.01        321
## sd(disc_isold)                0.46      0.05     0.38     0.55        319
## cor(Intercept,isold)         -0.47      0.12    -0.68    -0.22        121
## cor(Intercept,disc_isold)     0.35      0.13     0.08     0.58        273
## cor(isold,disc_isold)        -0.76      0.07    -0.87    -0.61        455
##                           Rhat
## sd(Intercept)             1.03
## sd(isold)                 1.00
## sd(disc_isold)            1.01
## cor(Intercept,isold)      1.03
## cor(Intercept,disc_isold) 1.02
## cor(isold,disc_isold)     1.00
##
## Population-Level Effects:
##              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept[1]    -0.60      0.05    -0.70    -0.49        200 1.00
## Intercept[2]     0.20      0.05     0.09     0.30        197 1.00
## Intercept[3]     0.69      0.05     0.60     0.80        203 1.00
## Intercept[4]     1.04      0.05     0.94     1.14        212 1.01
## Intercept[5]     1.49      0.05     1.39     1.59        230 1.00
## isold            1.86      0.12     1.63     2.10        119 1.04
## disc_isold      -0.38      0.07    -0.52    -0.25        112 1.05
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s first focus on the “Population-level Effects”: The effects for the “average person”. isold is d’, and is very close to the one reported in the paper (eyeballing Figure 3 in Koen et al. (2013); this d’ is not numerically reported in the paper). disc_isold is, because of the model’s parameterization, $$-\mbox{log}(\sigma_{signal}) = -a$$. The paper discusses $$V_o = \sigma_{signal}$$, and therefore we transform each posterior sample of our -a to obtain samples from $$V_o$$’s posterior distribution.

samples <- posterior_samples(fit, "b_") %>%
mutate(Vo = exp(-b_disc_isold))

We can then plot density curves (Gabry 2017) for each of the Population-level Effects in our model, including $$V_o$$. Figure 1 shows that our estimate of $$V_o$$ corresponds very closely to the one reported in the paper (Figure 3 in Koen et al. (2013)).

library(bayesplot)
mcmc_areas(samples, point_est = "mean", prob = .8)

### Heterogeneity parameters

Although the “population-level estimates”, which perhaps should be called “average effects”, are usually the main target of inference, they are not the whole story, nor are they necessarily the most interesting part of it. It has been firmly established that, when allowed to vary, the standard deviation of the noise distribution is greater than 1. However, the between-subject variability of this parameter has received less interest. Figure 2 reveals that the between-subject heterogeneity of a is quite large: The subject-specific effects have a standard deviation around .5.

samples_h <- posterior_samples(fit, c("sd_", "cor_"))
mcmc_areas(samples_h, point_est = "mean", prob = .8)

Figure 2 also tells us that the subject-specific d’s and as are correlated ("cor_id__isold__disc_isold"). We can further investigate this relationship by plotting the subject specific signal-SDs and d’s side by side:

As can be seen in the ridgeline plots (Wilke 2017) in Figure 3, participants with greater $$\sigma_{signal}$$ tend to have greater d’: Increase in recognition sensitivity is accompanied with an increase in the signal distribution’s variability. The density plots also make it clear that we are much less certain about individuals whose values (either one) are greater, as shown by the spread out posterior distributions. Yet another way to visualize this relationship is with a scatterplot of the posterior means Figure 4.

# Conclusion

Estimating EVSDT and UVSDT models in the Bayesian framework with the brms package (Bürkner 2017) is both easy (relatively speaking) and informative. In this post, we estimated a hierarchical nonlinear cognitive model using no more than a few lines of code. Previous literature on the topic (e.g. Rouder et al. (2007)) has focused on simpler (EVSDT) models with more complicated implementations–hopefully in this post I have shown that these models are within the reach of a greater audience, provided that they have some familiarity with R.

Another point worth making is a more general one about hierarchical models: We know that participants introduce (random) variation in our models. Ignoring this variation is clearly not good (Estes 1956). It is more appropriate to model this variability, and use the resulting parameters to draw inference about the heterogeneity in parameters (and more generally, cognitive strategies) across individuals. Although maximum likelihood methods provide (noisy) point estimates of what I’ve here called between-subject heterogeneity parameters, the Bayesian method allows drawing firm conclusions about these parameters.

# References

Bürkner, Paul-Christian. 2017. “Brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

Estes, W.K. 1956. “The Problem of Inference from Curves Based on Group Data.” Psychological Bulletin 53 (2): 134–40. https://doi.org/10.1037/h0045156.

Gabry, Jonah. 2017. Bayesplot: Plotting for Bayesian Models. http://mc-stan.org/.

Koen, Joshua D., Mariam Aly, Wei-Chun Wang, and Andrew P. Yonelinas. 2013. “Examining the Causes of Memory Strength Variability: Recollection, Attention Failure, or Encoding Variability?” Journal of Experimental Psychology: Learning, Memory, and Cognition 39 (6): 1726–41. https://doi.org/10.1037/a0033671.

Macmillan, Neil A., and C. Douglas Creelman. 2005. Detection Theory: A User’s Guide. 2nd ed. Mahwah, N.J: Lawrence Erlbaum Associates.

McNicol, Don. 2005. A Primer of Signal Detection Theory. Psychology Press.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Rouder, Jeffrey N., Jun Lu, Dongchu Sun, Paul Speckman, Richard D. Morey, and Moshe Naveh-Benjamin. 2007. “Signal Detection Models with Random Participant and Item Effects.” Psychometrika 72 (4): 621–42. https://doi.org/10.1007/s11336-005-1350-6.

Singmann, Henrik, and David Kellen. 2013. “MPTinR: Analysis of Multinomial Processing Tree Models in R.” Behavior Research Methods 45 (2): 560–75. https://doi.org/10.3758/s13428-012-0259-0.

Wilke, Claus O. 2017. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.

1. Recall that intercepts are automatically included, but can be explicitly included by adding 1 to the formula’s right hand side.

2. 0 + ... removes the model’s intercept.

##### Matti Vuorre
###### Postdoctoral Researcher

Postdoctoral Researcher at the Oxford Internet Institute