In this example, we examine the ZOIB model in the context of one binary predictor variable (Group A vs B, a “between subjects” manipulation).

## The model

We will model the data as ZOIB, and use `group`

as a predictor of the mean and precision of the beta distribution, the zero-one inflation probability , and the conditional one-inflation probability . In other words, in this model `group`

may affect the mean and/or precision of the assumed beta distribution of the continuous ratings (0, 1), and/or the probability with which a binary rating is given, and/or the probability that a binary rating is 1. How do we estimate this model?

It might not come as a surprise that we estimate the model with bayesian methods, using the R package brms (Bürkner 2017). Previously, I have discussed how to estimate signal detection theoretic models, “robust models”, and other multilevel models using this package. I’m a big fan of brms because of its modeling flexibility and post-processing functions: With concise syntax, you can fit a wide variety of possibly nonlinear, multivariate, and multilevel models, and analyze and visualize the models’ results.

Let’s load the package, and start building our model.

The R formula syntax allows a concise representation of regression models in the form of `response ~ predictors`

. For a simple normal (i.e. gaussian) model of the mean of `Ratings`

as a function of `group`

, you could write `Ratings ~ group, family = gaussian`

. However, we want to predict the four parameters of the ZOIB model, and so will need to expand this notation.

The brms package allows modeling more than one parameter of an outcome distribution. Specifically, we want to predict so-called “distributional parameters”, and `bf()`

allows predicting them in their own formulas. Implicitly, `Ratings ~ group`

means that you want to model the *mean* of `Ratings`

on `group`

. Therefore, to model , , and , we will give them their own regression formulas within a call to `bf()`

:

```
zoib_model <- bf(
Rating ~ group,
phi ~ group,
zoi ~ group,
coi ~ group,
family = zero_one_inflated_beta()
)
```

The four sub-models of our model are, in order of appearance: 1. the model of the beta distribution’s mean (read, “predict `Rating`

’s mean from `group`

”). Then, 2. the model of `phi`

; the beta distribution’s precision. 3. `zoi`

is the zero-one inflation (); that is, we model the probability of a binary rating as a function of `group`

. 4. `coi`

is the conditional one-inflation: Given that a response was {0, 1}, the probability of it being 1 is modelled on `group`

.

As is usual in R’s formula syntax, the intercepts of each of these formulas are implicitly included. (To make intercepts explicit, use e.g. `Rating ~ 1 + group`

.) Therefore, this model will have 8 parameters; the intercepts are Group A’s mean, `phi`

, `zoi`

, and `coi`

. Then, there will be a Group B parameter for each of them, indicating the extent to which the parameters differ for Group B versus Group A.

If `group`

has a positive effect on (the mean of) `Rating`

, we may conclude that the continuous rating’s mean differs as function of Group. On the other hand, if `coi`

is affected by `group`

, Group has an effect on the binary {0, 1} ratings. If group has no effects on any of the parameters, we throw up our hands and design a new study.

Finally, we specified `family = zero_one_inflated_beta()`

. Just like logistic regression, ZOIB regression is a type of generalized linear model. Therefore, each distributional parameter is modeled through a link function. The mean, zoi, and coi parameters are modeled through a logit link function. Phi is modeled through a log link function. These link functions can be changed by giving named arguments to `zero_one_inflated_beta()`

. It is important to keep in mind the specific link functions, we will need them when interpreting the model’s parameters.

To estimate this model, we pass the resulting `zoib_model`

to `brm()`

, with a data frame from the current R environment, 4 CPU cores for speed, and a file argument to save the resulting model to disk. The last two arguments are optional.

```
fit <- brm(
formula = zoib_model,
data = dat,
cores = 4,
file = "brm-zoib"
)
```

brms estimates the regression model using bayesian methods: It will return random draws from the parameters’ posterior distribution. It takes less than a minute to draw samples from this model. Let’s then interpret the estimated parameters (i.e. the numerical summaries of the posterior distribution):

```
summary(fit)
## Family: zero_one_inflated_beta
## Links: mu = logit; phi = log; zoi = logit; coi = logit
## Formula: Rating ~ group
## phi ~ group
## zoi ~ group
## coi ~ group
## Data: dat (Number of observations: 100)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.33 0.16 0.03 0.64 1.00 7221 2600
## phi_Intercept 1.50 0.24 1.00 1.94 1.00 6306 3106
## zoi_Intercept -0.80 0.32 -1.44 -0.18 1.00 7341 2983
## coi_Intercept 0.62 0.56 -0.40 1.75 1.00 6324 3166
## groupB 0.91 0.21 0.50 1.32 1.00 6770 2984
## phi_groupB 0.48 0.33 -0.15 1.14 1.00 5750 2654
## zoi_groupB 0.08 0.43 -0.75 0.91 1.00 7812 3178
## coi_groupB -0.87 0.75 -2.35 0.52 1.00 6093 2866
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

First, the summary of this model prints a paragraph of information about the model, such as the outcome family (ZOIB), link functions, etc. The regression coefficients are found under the “Population-Level Effects:” header. The columns of this section are “Estimate”, the posterior mean or point estimate of the parameter. “Est.Error”, the posterior standard deviation, or so called standard error of the parameter. Then, the lower and upper limit of the 95% Credible Interval. The two last columns are diagnostics of the model fitting procedure.

The first four rows of this describe the parameters for the baseline group (Group A). `Intercept`

is the logit-transformed mean of the beta distribution for Group A’s ratings (the subset of ratings that were (0, 1)). Next, `phi_Intercept`

describes the precision of the beta distribution fitted to Group A’s slider responses, on the scale of the (log) link function. `zoi_Intercept`

is the zero or one inflation of Group A’s data, on the logit scale. `coi_Intercept`

is the conditional one inflation; out of the 0 or 1 ratings in Group A’s data, describing the proportion of ones (out of the 0/1 responses)?

These parameters are described on the link scale, so for each of them, we can use the inverse link function to transform them to the response scale. Precision (`phi_Intercept`

) was modeled on the log scale. Therefore, we can convert it back to the original scale by exponentiating. For the other parameters, which were modeled on the logit scale, we can use the inverse, which is `plogis()`

.

However, before converting the parameters, it is important to note that the estimates displayed above are summaries (means, quantiles) of the posterior draws of the parameters on the link function scale. Therefore, we cannot simply convert the summaries. Instead, we must transform each of the posterior samples, and then re-calculate the summaries. The following code accomplishes this “transform-then-summarize” procedure for each of the four parameters:

```
posterior_samples(fit, pars = "b_")[,1:4] %>%
mutate_at(c("b_phi_Intercept"), exp) %>%
mutate_at(vars(-"b_phi_Intercept"), plogis) %>%
posterior_summary() %>%
as.data.frame() %>%
rownames_to_column("Parameter") %>%
kable(digits = 2)
```

b_Intercept |
0.58 |
0.04 |
0.51 |
0.66 |

b_phi_Intercept |
4.59 |
1.10 |
2.72 |
6.99 |

b_zoi_Intercept |
0.31 |
0.07 |
0.19 |
0.45 |

b_coi_Intercept |
0.64 |
0.12 |
0.40 |
0.85 |

We can then interpret these summaries, beginning with `b_Intercept`

. This is the estimated mean of the beta distribution fitted to Group A’s (0, 1) rating scale responses (with its standard error, lower- and upper limits of the 95% CI). Then, `b_Phi_Intercept`

is the precision of the beta distribution. `zoi`

is the zero-one inflation, and `coi`

the conditional one inflation.

To make `b_zoi_Intercept`

concrete, we should be able to compare its posterior mean to the observed proportion of 0/1 values in the data:

```
mean(dat$Rating[dat$group=="A"] %in% 0:1) %>% round(3)
## [1] 0.311
```

Above we calculated the proportion of zeros and ones in the data set, and found that it matches the estimated value. Similarly, for `coi`

, we can find the corresponding value from the data:

```
mean(dat$Rating[dat$group=="A" & dat$Rating %in% 0:1] == 1) %>%
round(3)
## [1] 0.643
```

Let’s get back to the model summary output. The following four parameters are the effects of being in group B on these parameters. Most importantly, `groupB`

is the effect of group B (versus group A) on the mean of the ratings’ assumed beta distribution, in the logit scale. Immediately, we can see that the parameter’s 95% Credible Interval does not include zero. Traditionally, this parameter would be called “significant”; group B’s (0, 1) ratings are on average greater than group A’s.

To transform this effect back to the data scale, we can again use `plogis()`

. However, it is important to keep in mind that the effect’s size on the original scale depends on the intercept, getting smaller as the intercept increases (just like in any other generalized linear model.) The following bit of code transforms this effect and its uncertainty back to the original scale.

```
h <- c("B - A" = "plogis(Intercept + groupB) = plogis(Intercept)")
hypothesis(fit, h)
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 B - A 0.19 0.05 0.1 0.28 NA NA *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
```

The data were simulated with the `rzoib()`

function, and I set . Therefore, the results of the t-tests and nonparametric tests were misses; a true effect was missed. On the other hand, the ZOIB regression model detected the true effect of group on the beta distribution’s mean.

Finally, let’s visualize this key finding using the `conditional_effects()`

function from brms.

```
plot(
conditional_effects(fit, dpar = "mu"),
points = TRUE,
point_args = list(width = .05, shape = 1)
)
```

Comparing Figure 5 to Figure 4 reveals the fundamental difference of the normal t-test model, and the ZOIB model: The ZOIB regression (Figure 5) has found a large difference between the continuous part of the slider ratings’ means because it has treated the data with an appropriate model. By conflating the continuous and binary data, the t-test did not detect this difference.

In conclusion, this example showed that ZOIB results in more informative, and potentially more accurate, inferences from analog scale (“slider”) data. Of course, in this simulation we had the benefit of knowing the true state of matters: The data were simulated from a ZOIB model. Nevertheless, we have reasoned that by respecting the major features of slider scale data, the ZOIB is a more accurate representation of it, and was therefore able to detect a difference where the t-test did not. Next, I put this conjecture to a test by conducting a small simulation study.