How to Compare Two Groups with Robust Bayesian Estimation Using R, Stan and brms

Happy New Year 2017 everybody! 2017 will be the year when social scientists finally decided to diversify their applied statistics toolbox, and stop relying 100% on null hypothesis significance testing (NHST). We now recognize that different scientific questions may require different statistical tools, and are ready to adopt new and innovative methods. A very appealing alternative to NHST is Bayesian statistics, which in itself contains many approaches to statistical inference. In this post, I provide an introductory and practical tutorial to Bayesian parameter estimation in the context of comparing two independent groups’ data.

More specifically, we’ll focus on the t-test. Everyone knows about it, everyone uses it. Yet, there are (arguably!) better methods for drawing inferences from two independent groups’ metric data (Kruschke, 2013; Morey & Rouder, 2015). Let’s talk about how “Bayesian estimation supersedes the t-test” (Kruschke, 2013).

Kruschke (2013, p.573) writes:

“When data are interpreted in terms of meaningful parameters in a mathematical description, such as the difference of mean parameters in two groups, it is Bayesian analysis that provides complete information about the credible parameter values. Bayesian analysis is also more intuitive than traditional methods of null hypothesis significance testing (e.g., Dienes, 2011).”

In that article (Bayesian estimation supersedes the t-test) Kruschke (2013) provided clear and well-reasoned arguments favoring Bayesian parameter estimation over null hypothesis significance testing in the context of comparing two groups, a situation which is usually dealt with a t-test. It also introduced a robust model for comparing two groups, which modeled the data as t-distributed, instead of a Gaussian distribution. The article provided R code for running the estimation procedures, which could be downloaded from the author’s website or as an R package (Kruschke & Meredith, 2015).

The R code and programs work well for this specific application (estimating the robust model for one or two groups’ metric data). However, modifying the code to handle more complicated situations is not easy, and the underlying estimation algorithms don’t necessarily scale up to handle more complicated situations. Therefore, today I’ll introduce easy to use, free, open-source, state-of-the-art computer programs for Bayesian estimation, in the context of comparing two groups’ metric (continuous) data. The programs are available for the R programming language–so make sure you are familiar with R basics (e.g. Vuorre, 2016). I provide R code (it’s super easy, don’t worry!) for t-tests and Bayesian estimation in R using the R package brms (Buerkner, 2016), which uses the powerful Stan MCMC program (Stan Development Team, 2016) under the hood.

These programs supersede older Bayesian inference programs because they are easy to use (brms is an interface to Stan, which is actually a programming language in itself), fast, and are able to handle models with thousands of parameters. Learning to implement basic analyses such as t-tests, and Kruschke’s robust model, with these programs is very useful because (obviously) you’ll then be able to do Bayesian statistics in practice, and will be prepared to understand and implement more complex models.

Understanding the results of Bayesian estimation requires understanding some basics of Bayesian statistics, which I won’t describe here at all. If you are not familiar with Bayesian statistics, please read Kruschke’s excellent article (his book is also very good, Kruschke, 2014; see also McElreath, 2016). In fact, you should read the paper anyway, it’s very good.

First, I’ll introduce the basics of t-tests in some detail, and then focus on understanding them as specific instantiations of linear models. If that sounds familiar, skip ahead to Bayesian Estimation of the t-test, where I introduce the brms package for estimating models using Bayesian methods. Following that, we’ll use “distributional regression” to obtain Bayesian estimates of the unequal variances t-test model. Finally, we’ll learn how to estimate Kruschke’s (2013) BEST model using brms.

The t in a t-test

We’ll begin with t-tests, using example data from Kruschke’s paper (p. 577):

“Consider data from two groups of people who take an IQ test. Group 1 (N1=47) consumes a “smart drug,” and Group 2 (N2=42) is a control group that consumes a placebo."

I’ve decided to call the control group “Group 0”, and the treatment group “Group 1”, because this coding makes it natural to think of the control group as a “reference group”, and any “effect” we’ll estimate will be associated with the treatment group. These data are visualized as histograms, below:

Histograms of the two groups' IQ scores.

Figure 1: Histograms of the two groups’ IQ scores.

Equal variances t-test

These two groups’ IQ scores could be compared with a simple equal variances t-test (which you shouldn’t use; Lakens, 2015), also known as Student’s t-test. I have the two groups’ IQ scores in R as two vectors called group_0 and group_1, so doing a t-test is as easy as

t.test(group_0, group_1, var.equal=T)
##  Two Sample t-test
## data:  group_0 and group_1
## t = -1.5587, df = 87, p-value = 0.1227
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.544155  0.428653
## sample estimates:
## mean of x mean of y 
##  100.3571  101.9149

We interpret the t-test in terms of the observed t-value, and whether it exceeds the critical t-value. The critical t-value, in turn, is defined as the extreme \(\alpha / 2\) percentiles of a t-distribution with the given degrees of freedom. The current situation is illustrated below:

t distribution with 87 degrees of freedom, and observed t-value. The dashed vertical lines indicate the extreme 2.5 percentiles. We would reject the null hypothesis of no difference if the observed t-value exceeded these percentiles.

Figure 2: t distribution with 87 degrees of freedom, and observed t-value. The dashed vertical lines indicate the extreme 2.5 percentiles. We would reject the null hypothesis of no difference if the observed t-value exceeded these percentiles.

The test results in an observed t-value of 1.56, which is not far enough in the tails of a t-distribution with 87 degrees of freedom to warrant rejecting the null hypothesis (given that we are using \(\alpha\) = .05, which may or may not be an entirely brilliant idea (e.g. Rouder, Morey, Verhagen, Province, & Wagenmakers, 2016)). Note that R also reports a 95% CI for the estimated difference between the two groups.

Unequal variances t-test

Next, we’ll run the more appropriate, unequal variances t-test (also known as Welch’s t-test), which R gives by default:

t.test(group_0, group_1)
##  Welch Two Sample t-test
## data:  group_0 and group_1
## t = -1.6222, df = 63.039, p-value = 0.1098
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.4766863  0.3611848
## sample estimates:
## mean of x mean of y 
##  100.3571  101.9149

Note that while R gives Welch’s t-test by default, SPSS gives both. If you’re using SPSS, make sure to report the Welch’s test results, instead of the equal variances test. Here, the conclusion with respect to rejecting the null hypothesis of equal means is the same. However, notice that the results are numerically different, as they should, because these two t-tests refer to different models!

As a side note, I recently learned that this problem (estimating and testing the difference between two means when the variances are not assumed equal) is unsolved: only approximate solutions are known.

It is of course up to you, as a researcher, to decide whether you assume equal variances or not. But note that we almost always allow the means to be different (that’s the whole point of the test, really), while many treatments may just as well have an effect on the standard deviations.

The first take-home message from today is that there are actually two t-tests, each associated with a different statistical model. And to make clear what the difference is, we must acquaint ourselves with the models.

Describing the model(s) underlying the t-test(s)

We don’t usually think of t-tests (and ANOVAs) as models, but it turns out that they are just linear models disguised as tests (see here and here). Recently, there has been a tremendous push for model/parameter estimation, instead of null hypothesis significance testing (e.g. Cumming, 2014; Kruschke, 2014; see also the brilliant commentary by Gigerenzer, 2004), so we will benefit from thinking about t-tests as linear models. Doing so will facilitate “[interpreting data] in terms of meaningful parameters in a mathematical description” (Kruschke, 2013), and seamlessly expanding our models to handle more complicated situations.

The equal variances t-test models metric data with three parameters: Mean for group A, mean for group B, and one shared standard deviation (i.e. the assumption that the standard deviations [we usually refer to variances, but whatever] are equal between the two groups.)

We call the metric data (IQ scores in our example) \(y_{ik}\), where \(i\) is a subscript indicating the \(i^{th}\) datum, and \(k\) indicates the \(k^{th}\) group. So \(y_{19, 1}\) would be the 19th datum, belonging to group 1. Then we specify that \(y_{ik}\) are Normally distributed, \(N(\mu_{ik}, \sigma)\), where \(\mu_{ik}\) indicates the mean of group \(k\), and \(\sigma\) the common standard deviation.

\[y_{ik} \sim N(\mu_{ik}, \sigma)\]

Read the formula as “Y is normally distributed with mean \(\mu_{ik}\) (mu), and standard deviation \(\sigma\) (sigma)”. Note that the standard deviation \(\sigma\) doesn’t have any subscripts: we assume it is the same for the \(k\) groups. Note also that you’ll often see the second parameter in the parentheses as \(\sigma^2\), referring to the variance.

The means for groups 0 and 1 are simply \(\mu_0\) and \(\mu_1\), respectively, and their difference (let’s call it \(d\)) is \(d = \mu_0 - \mu_1\). The 95% CI for \(d\) is given in the t-test output, and we can tell that it differs from the one given by Welch’s t-test.

It is unsurprising, then, that if we use a different model (the more appropriate unequal variances model; Lakens, 2015), our inferences may be different. Welch’s t-test is the same as Student’s, except that now we assume (and subsequently estimate) a unique standard deviation \(\sigma_{ik}\) for both groups.

\[y_{ik} \sim N(\mu_{ik}, \sigma_{ik})\]

This model makes a lot of sense, because rarely are we in a situation to a priori decide that the variance of scores in Group A is equal to the variance of scores in Group B. If you use the equal variances t-test, you should be prepared to justify and defend this assumption. (Deciding between models–such as between these two t-tests–is one way in which our prior information enters and influences data analysis. This fact should make you less suspicious about priors in Bayesian analyses.)

Armed with this knowledge, we can now see that “conducting a t-test” can be understood as estimating one of these two models. By estimating the model, we obtain t-values, degrees of freedom, and consequently, p-values.

However, to focus on modeling and estimation, it is easier to think of the t-test as a specific type of the general linear model, (aka linear regression). We can re-write the t-test in an equivalent way, but instead have a specific parameter for the difference in means by writing it as a linear model. (For simplicity, I’ll only write the equal variances model):

\[y_{ik} \sim N(\mu_{ik}, \sigma)\] \[\mu_{ik} = \beta_0 + \beta_1 Group_{ik}\]

Here, \(\sigma\) is just as before, but we now model the mean with an intercept (Group 0’s mean, \(\beta_0\)) and the effect of Group 1 (\(\beta_1\)). To understand whats going on, let’s look at the data, Group is an indicator variable in the data, for each row of Group 0’s data Group is zero, and for each row of Group 1’s data Group is one.

##     Group  IQ
## 1       0  99
## 2       0 101
## 3       0 100
## 4       0 101
## ...   ... ...
## 86      1 101
## 87      1 104
## 88      1 100
## 89      1 101

With this model, \(\beta_1\) directly tells us the estimated difference in the two groups. And because it is a parameter in the model, it has an associated standard error, t-value, degrees of freedom, and a p-value. This linear model and can be estimated in R with the following line of code:

olsmod <- lm(IQ ~ Group, data = d)

The key input here is a model formula, which in R is specified as outcome ~ predictor (DV ~ IV). Using the lm() function, we estimated a linear model predicting IQ from an intercept (automatically included) and a Group parameter Group, which is the effect of group 1. I called this object olsmod for Ordinary Least Squares Model.

R has it’s own model formula syntax, which is well worth learning. The formula in the previous model, IQ ~ Group means that we want to regress IQ on an intercept (which is implicitly included), and group (Group). Besides the formula, we only need to provide the data, which is contained in the object I’ve conveniently called d.

You can verify that the results are identical to the equal variances t-test above.

## Call:
## lm(formula = IQ ~ Group, data = d)
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.9149  -0.9149   0.0851   1.0851  22.0851 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 100.3571     0.7263 138.184   <2e-16 ***
## Group         1.5578     0.9994   1.559    0.123    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 4.707 on 87 degrees of freedom
## Multiple R-squared:  0.02717,    Adjusted R-squared:  0.01599 
## F-statistic:  2.43 on 1 and 87 DF,  p-value: 0.1227

Focus on the Group row in the estimated coefficients. Estimate is the point estimate (best guess) of the difference in means (\(d = 101.9149 - 100.3571 = 1.5578\)). t value is the observed t-value (identical to what t.test() reported), and the p-value (Pr(>|t|)) matches as well. The (Intercept) row refers to \(\beta_0\), which is group 0’s mean.

This way of thinking about the model, where we have parameters for one group’s mean, and the effect of the other group, facilitates focusing on the important parameter, the difference, instead of individual means. However, you can of course compute the difference from the means, or the means from one mean and a difference.

Bayesian estimation of the t-test

Equal variances model

Next, I’ll illustrate how to estimate the equal variances t-test using Bayesian methods. We use brms (Buerkner, 2016), and the familiar R formula syntax which we used with the OLS model.

Estimating this model with R, thanks to the Stan and brms teams (Stan Development Team, 2016; Buerkner, 2016), is as easy as the linear regression model we ran above. If you haven’t yet installed brms, you need to install it first by running install.packages("brms"). Then, to access its functions, load the brms package to the current R session.


The most important function in the brms package is brm(), for Bayesian Regression Model(ing). The user needs only to input a model formula, just as above, and a data frame that contains the variables specified in the formula. brm() then translates the model into Stan language, and asks Stan to compile the model into C++ and estimate it (see Kruschke, 2014; McElreath, 2016 for details about estimation). The result is an R object with the estimated results (and much more). We run the model and save the results to mod_eqvar for equal variances model:

mod_eqvar <- brm(
  IQ ~ Group, 
  data = d,
  file = here::here("static/data/iqgroup")
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: IQ ~ Group 
##    Data: d (Number of observations: 89) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept   100.36      0.73    98.93   101.77       3934 1.00
## Group         1.56      0.98    -0.41     3.51       3688 1.00
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     4.76      0.37     4.10     5.53       4280 1.00
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Notice that the model contains three parameters, one of which is the shared standard deviation sigma. Compare the output of the Bayesian model to the one estimated with lm() (OLS):


The point estimates (posterior means in the Bayesian model) and standard errors (SD of the respective posterior distribution) are pretty much identical.

We now know the models behind t-tests, and how to estimate the equal variances t-test using the t.test(), lm(), and brm() functions. We also know how to run Welch’s t-test using t.test(). However, estimating the general linear model version of the unequal variances t-test model is slightly more complicated, because it involves specifying predictors for \(\sigma\), the standard deviation parameter.

Unequal variances model

We only need a small adjustment to the equal variances model to specify the unequal variances model:

\[y_{ik} \sim N(\mu_{ik}, \sigma_{ik})\] \[\mu_{ik} = \beta_0 + \beta_1 Group_{ik}\]

Notice that we now have subscripts for \(\sigma\), denoting that it varies between groups. In fact, we’ll write out a linear model for the standard deviation parameter!

\[\sigma_{ik} = \gamma_0 + \gamma_1 Group_{ik}\]

The model now includes, instead of a common \(\sigma\), one parameter for Group 0’s standard deviation \(\gamma_0\) (gamma), and one for the effect of Group 1 on the standard deviation \(\gamma_1\), such that group 1’s standard deviation is \(\gamma_0 + \gamma_1\). Therefore, we have 4 free parameters, two means and two standard deviations. (The full specification would include prior distributions for all the parameters, but that topic is outside of the scope of this post.) Let’s estimate!

brm() takes more complicated models by wrapping them inside bf() (short for brmsformula()), which is subsequently entered as the first argument to brm().

uneq_var_frm <- bf(IQ ~ Group, sigma ~ Group)

You can see that the formula regresses IQ on Group, such that we’ll have an intercept (implicitly included), and an effect of Group 1. Remarkably, we are also able to model the standard deviation sigma, and we regress it on Group (it will also have an intercept and effect of group).

mod_uneqvar <- brm(
  data = d, 
  file = here::here("static/data/iqgroup-uv")
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: IQ ~ Group 
##          sigma ~ Group
##    Data: d (Number of observations: 89) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept         100.36      0.38    99.59   101.10       4895 1.00
## sigma_Intercept     0.93      0.11     0.72     1.16       4293 1.00
## Group               1.54      1.03    -0.49     3.53       2273 1.00
## sigma_Group         0.88      0.15     0.57     1.16       4036 1.00
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The model’s output contains our 4 parameters. Intercept is the mean for group 0, Group 1 is the “effect of group 1”. The sigma_Intercept is the standard deviation of Group 0, sigma_Group is the effect of group 1 on the standard deviation (the SD of Group 1 is sigma_Intercept + sigma_Group). The sigmas are implicitly modeled through a log-link (because they must be positive). To convert them back to the scale of the data, they need to be exponentiated. After taking the exponents of the sigmas, the results look like this:

parameterEstimateEst.Errorl-95% CIu-95% CI

For comparison, here is the “observed SD” of group 0:

## [1] 2.52

Keep in mind that the parameters refer to Group 0’s mean (Intercept) and SD (sigma), and the difference between groups in those values (Group) and (sigma_Group). We now have fully Bayesian estimates of the 4 parameters of the unequal variances t-test model. Because p-values have no place in Bayesian inference, they are not reported in the output. However, you can calculate a quantity that is equivalent to a one-sided p-value from the posterior distribution: Take the proportion of posterior density (MCMC samples) above/below a reference value (0). This is definitely not the most useful thing you can do with a posterior distribution, but the fact that it numerically matches a one-sided p-value is quite interesting:

# Posterior distribution of Group effect
x <-, pars = "b_Group")[,1]
# Proportion of MCMC samples below zero
round((sum(x < 0) / length(x)), 3)
## [1] 0.068
# One sided p-value from t-test
round(t.test(group_0, group_1, data = d, alternative = "less")$p.value, 3)
## [1] 0.055

I’m showing this remarkable fact (Marsman & Wagenmakers, no date) not to persuade you to stick with p-values, but to alleviate fears that these methods would always produce discrepant results.

Although this model is super easy to estimate with brm() (which, I should emphasize, uses Stan for the estimation procedures), the model seems, frankly speaking, strange. I am just not used to modeling variances, and I’ll bet a quarter that neither are you. Nevertheless, there it is!

Finally, let’s move on to Kruschke’s (2013) “Robust Bayesian Estimation” model.

Robust Bayesian Estimation

Kruschke’s robust model is a comparison of two groups, using five parameters: One mean for each group, one standard deviation for each group, just as in the unequal variances model above. The fifth parameter is a “normality” parameter, \(\nu\) (nu), which means that we are now using a t-distribution to model the data. Using a t-distribution to model the data, instead of a Gaussian, means that the model (and therefore our inferences) are less sensitive to extreme values (outliers). Here’s what the model looks like:

\[y_{ik} \sim T(\nu, \mu_{ik}, \sigma_{ik})\]

Read the above formula as “Y are random draws from a t-distribution with ‘normality’ parameter \(\nu\), mean \(\mu_{ik}\), and standard deviation \(\sigma_{ik}\)”. We have a linear model for the means and standard deviations:

\[\mu_{ik} = \beta_0 + \beta_1 Group_{ik}\]


\[\sigma_{ik} = \gamma_0 + \gamma_1 Group_{ik}\]

This model, as you can see, is almost identical to the unequal variances t-test, but instead uses a t distribution (we assume data are t-distributed), and includes the normality parameter. Using brm() we can still use the unequal variances model, but have to specify the t-distribution. We do this by specifying the family argument to be student (as in Student’s t)

mod_robust <- brm(
  bf(IQ ~ Group, sigma ~ Group),
  data = d, 
  file = here::here("static/data/iqgroup-robust")
##  Family: student 
##   Links: mu = identity; sigma = log; nu = identity 
## Formula: IQ ~ Group 
##          sigma ~ Group
##    Data: d (Number of observations: 89) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample  Rhat
## Intercept        100.521     0.206  100.140  100.925       6459 0.999
## sigma_Intercept    0.002     0.194   -0.380    0.380       4289 1.001
## Group              1.033     0.430    0.198    1.885       2701 1.001
## sigma_Group        0.674     0.252    0.185    1.183       4111 1.000
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Eff.Sample  Rhat
## nu    1.867     0.481    1.170    3.061       3110 1.001
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The effect of Group is about one unit, with a 95% Credible Interval from 0.2 to 1.8.

Finally, let’s compare the results to those in Kruschke’s paper (2013, p.578). Before we do this, I’ll convert the estimated parameters to means and standard deviations (instead of the “regression effects” produced by default.) Recall that I recoded the group labels used by Kruschke in the paper, what he calls group 2 is group 0 (control group) in our analyses, but group 1 is still group 1. In the following I transform the results and compute HDIs to obtain results most compatible with Kruschke:


Notice that Kruschke reports modes (2013, p. 578), but our point estimates are means. The results with respect to the group means are identical to two decimal points; the standard deviations are slightly more discrepant, because the paper reports modes, but we focus on posterior means.

Finally, here is how to estimate the model using the original code (Kruschke & Meredith, 2015):

BEST <- BESTmcmc(group_0, group_1)
## MCMC fit results for BEST analysis:
## 100002 simulations saved.
##           mean     sd  median    HDIlo   HDIup  Rhat n.eff
## mu1    100.526 0.2143 100.525 100.1061 100.951 1.000 58842
## mu2    101.548 0.3785 101.550 100.8077 102.302 1.000 60761
## nu       1.836 0.4754   1.760   1.0111   2.725 1.001 24508
## sigma1   1.051 0.2075   1.031   0.6702   1.462 1.000 36411
## sigma2   2.059 0.4342   2.021   1.2565   2.917 1.000 29389
## 'HDIlo' and 'HDIup' are the limits of a 95% HDI credible interval.
## 'Rhat' is the potential scale reduction factor (at convergence, Rhat=1).
## 'n.eff' is a crude measure of effective sample size.

This output reports posterior means and HDI limits, which we report above. You can verify that they match very closely to each other. This BESTmcmc() function is great, but with brms you are able to estimate a vast variety of models.


Well, that ended up much longer than what I intended. The aim was both to illustrate the ease of Bayesian modeling in R using brms (Buerkner, 2016) and Stan (Stan Development Team, 2016), and highlight the fact that we can easily move from simple t-tests to more complex (and possibly better) models.

If you’ve followed through, you should be able to conduct Student’s (equal variances) and Welch’s (unequal variances) t-tests in R, and to think about those tests as instantiations of general linear models. Further, you should be able to estimate these models using Bayesian methods.

You should now also be familiar with Kruschke’s robust model for comparing two groups’ metric data, and be able to implement it in one line of R code. This model was able to find credible differences between two groups, although the frequentist t-tests and models reported p-values well above .05. That should be motivation enough to try robust (Bayesian) models on your own data.

Further reading

I didn’t take any space here to discuss the interpretation of Bayesian statistics. For this, I recommend Kruschke (2014), McElreath (2016). See also Etz, Gronau, Dablander, Edelsbrunner, & Baribault (2016) for an introduction to Bayesian statistics.


Buerkner, P.-C. (2016). brms: Bayesian Regression Models using Stan. Retrieved from
Cumming, G. (2014). The New Statistics Why and How. Psychological Science, 25(1), 7–29.
Dienes, Z. (2011). Bayesian Versus Orthodox Statistics: Which Side Are You On? Perspectives on Psychological Science, 6(3), 274–290.
Etz, A., Gronau, Q. F., Dablander, F., Edelsbrunner, P. A., & Baribault, B. (2016). How to become a Bayesian in eight easy steps: An annotated reading list. ResearchGate. Retrieved from
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 142(2), 573–603.
Kruschke, J. K. (2014). Doing Bayesian Data Analysis: A Tutorial Introduction with R (2nd Edition). Burlington, MA: Academic Press.
Lakens, D. (2015, January 26). The 20% Statistician: Always use Welch’s t-test instead of Student’s t-test. Retrieved from
Marsman, M., & Wagenmakers, E.-J. (no date). Three Insights from a Bayesian Interpretation of the One-Sided P Value. Retrieved from
McElreath, R. (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press.
Morey, R. D., & Rouder, J. (2015). BayesFactor: Computation of Bayes Factors for Common Designs. Retrieved from
Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016). Is There a Free Lunch in Inference? Topics in Cognitive Science, 8(3), 520–547.
Stan Development Team. (2016). Stan: A C++ Library for Probability and Sampling, Version 2.14.1. Retrieved from
Vuorre, M. (2016, December 5). Introduction to Data Analysis using R. Retrieved from

Matti Vuorre
Postdoctoral Researcher

Postdoctoral Researcher at the Oxford Internet Institute