- What is
*Analysis of Variance*and why is it used? - Types of ANOVA
- One-way ANOVA
- Two-way/factorial ANOVA
- Exercises
- References

- Analysis of Variance (ANOVA) is a group of methods that is used to establish whether or not the means of three or more groups are reliably different
- In this sense, ANOVA is closely related to
`t-tests`

(see Topic 4):- To find out if three groups
`A`

,`B`

,`C`

have different means, we could use t-tests to compare A to B, A to C, and C to B.

- However, such a procedure would lead to an issue that is known as the
`multiple comparisons problem`

in statistics

- To find out if three groups

`Multiple comparisons problem`

:- The probability (i.e., p-value) that is calculated in an hypothesis test is only valid under the assumption that it is the only hypothesis test that is being computed
- With each additional t-test that we perform on the same data, we increase the probability of finding a significant result that is actually not significant (i.e., a
`Type I`

error):- For each test, there’s a 95% chance that we do not commit a Type I error (since alpha = 0.05)
- If the tests are independent, we can multiply the probabilities to obtain the overall probability for our group of three tests:
`0.95 * 0.95 * 0.95=0.86`

- By doing three tests, our p-value/
`familywise error rate`

has increased from 0.05 on each individual test to 0.14 on the group of tests. Since 0.14 > 0.05, we can no longer reject the null hypothesis - The
`familywise error rate`

can be calculated as follows:`family error=1-(0.95)^Number of tests`

:- 4 tests: 0.18
- 10 tests: 0.40

- ANOVA does away with the need for multiple t-testing: the group means are compared simultaneously (Field et al.,2012: 399)

- Historically, ANOVA has been used heavily in psychology, psycholinguistics, and in experimental studies more generally
- Linear regression has been used more heavily in corpus-based linguistics, and more generally, in fields that rely on real-world observations rather than experiments
- ANOVA is closely related to linear regression. As a matter of fact, we can use
`lm`

with only categorical variables to perform certain types of ANOVA

- ANOVA is a method of measuring the overall significance of differences in means, it provides no insights into:
- The sizes of the effects
- Which groups caused the differences to be significant

- Typically ANOVA is used to compare several groups that undergo different experimental conditions
- In almost every case, ANOVA could (and perhaps, should) be substituted for linear regression:
- Linear regression is a more flexible, widely applicable, and general method
- Linear regression provides insights into the significance,the effect sizes of the different experimental conditions, which groups caused the result to be significant, and integrates this information into a single model
- Linear regression allows to make predictions

- ANOVA slowly appears to become superseded by linear regression (especially mixed-effects linear regression) in current-day psycholinguistics
- Our goal here will be to gain a basic high-level understanding of the technique

- ANOVA is not a single method, but rather a family of methods:
- Independent one-way ANOVA: used to compare the means of three and more groups. Its use is similar to that of the t-test
- Independent two-way/factorial ANOVA: used to compare the means of groups created by two or more factors, as well as their interactions
- Repeated-measures and mixed ANOVA: used when the observations are not independent.
- These techniques will not be covered here, as mixed-effects regression is much more appropriate for these kinds of settings. See Levshina (2015: Chap. 8)

- Which type of ANOVA that is applied depends on the overall design of the study:
`Within-subject design`

: The same subjects are tested in several experimental conditions:- E.g., Subjects perform a task, they receive a priming stimulus, and they perform another task. We have data on the subjects in our two conditions: primed vs. unprimed
- You need a
`repeated-measures`

or a`mixed`

ANOVA

`Between-group/between-subject design`

: Different subjects are assigned to different experimental conditions- E.g., Two groups of subjects perform the same task, but one group is presented with a priming stimulus before they perform the task.
- We can analyze the data with a
`one-way`

or a`two-way/factorial`

ANOVA

- We will be working again with the extended version of the dataset by Balota et al. (2007) that we have seen before (data provided by Levshina, 2015)
`Word`

`Length`

: word length`SUBTLWF`

: normalized word frequency`POS`

: part-of-speech`Mean_RT`

: Mean reaction time

- Research question:
*Does POS affect Mean_RT?*

```
library(readr)
library(dplyr)
library(ggplot2)
dataSet <- read_csv("http://www.jeroenclaes.be/statistics_for_linguistics/datasets/class7_Balota_et_al_2007.csv")
glimpse(dataSet)
```

```
## Observations: 880
## Variables: 5
## $ Word <chr> "rackets", "stepmother", "delineated", "swimmers", "um...
## $ Length <int> 7, 10, 10, 8, 6, 5, 5, 8, 8, 6, 8, 12, 8, 6, 7, 3, 3, ...
## $ SUBTLWF <dbl> 0.96, 4.24, 0.04, 1.49, 1.06, 3.33, 0.10, 0.06, 0.43, ...
## $ POS <chr> "NN", "NN", "VB", "NN", "NN", "NN", "VB", "NN", "NN", ...
## $ Mean_RT <dbl> 790.87, 692.55, 960.45, 771.13, 882.50, 645.85, 760.29...
```

- To obtain by-POS means, we can use
`dplyr`

and its`group_by`

and`summarise`

functions

```
dataSet %>%
group_by(POS) %>%
summarise(means=mean(Mean_RT))
```

```
## # A tibble: 3 x 2
## POS means
## <chr> <dbl>
## 1 JJ 822.9145
## 2 NN 787.5959
## 3 VB 754.3316
```

- With
`ggplot2`

we can easily draw up a boxplot to compare the groups visually

```
ggplot(dataSet, aes(x=POS, y=Mean_RT)) +
geom_boxplot()
```

- The null hypothesis of a one-way ANOVA is that none of the groups are different from one another
- The alternative hypothesis states that at least two groups are different, so the test is non-directional
- ANOVA is quite robust, but it is advisable to check the following assumptions:
- The observations are independent
- The response variable is ratio- or interval-scaled
- The scores in the groups are normally distributed
- The variance is homogeneous, i.e., the variances of the different groups should be equal

- If the assumptions are not met, you can either opt to use a linear regression model (if the data satisfy the assumptions of linear regression modeling) or you can use other types of ANOVA techniques (see Levshina, 2015: Chap. 8), which we will not cover here

- To ensure that our observations are independent, we should exclude
`Mean_RT`

scores that were obtained for the same word

`dataSet <- dataSet[!duplicated(dataSet$Word),]`

- To compute by-POS shapiro.tests, we can use the
`group_by`

, and`summarise`

functions from the`dplyr`

package - Note that we extract the p-value from the test with
`$p.value`

```
dataSet %>%
group_by(POS) %>%
summarise(shapiro.test=shapiro.test(Mean_RT)$p.value)
```

```
## # A tibble: 3 x 2
## POS shapiro.test
## <chr> <dbl>
## 1 JJ 1.603195e-05
## 2 NN 1.823641e-12
## 3 VB 3.473464e-05
```

- By specifying
`POS`

as the`group`

,`fill`

, and`color`

argument in our`ggplot`

call, we can obtain by-POS density plots

`ggplot(dataSet, aes(x=Mean_RT, group=POS, color=POS, fill=POS)) + geom_line(stat="density")`

- It is obvious that our data contains quite a few outliers (see the boxplot we drew earlier as well)
- To remove outliers within our POS groups, we use
`filter`

in conjunction with`group_by`

to remove atypical observations based on MAD-scores - We set the cutoff at
`2`

to be on the strict side

```
dataSet<- dataSet %>%
group_by(POS) %>%
filter(abs(((Mean_RT - median(Mean_RT))/mad(Mean_RT))) <= 2)
```

- The result looks much better, but it is not perfect. However, as the sample size is large, this should not cause too much trouble.

```
dataSet %>%
group_by(POS) %>%
summarise(shapiro.test=shapiro.test(Mean_RT)$p.value)
```

```
## # A tibble: 3 x 2
## POS shapiro.test
## <chr> <dbl>
## 1 JJ 0.049109616
## 2 NN 0.005275898
## 3 VB 0.031028559
```

```
ggplot(dataSet, aes(x=Mean_RT, group=POS, color=POS, fill=POS)) +
geom_line(stat="density")
```

- To evaluate whether the variances are homogeneous across the groups, we can use the
`leveneTest`

from the`car`

package - If p < 0.05, the groups have non-constant variances

```
library(car)
leveneTest(Mean_RT ~ POS, data=dataSet)
```

```
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.8434 0.05881 .
## 807
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

- One-way independent ANOVA’s can be performed by calling
`lm`

with only one factor predictor - Rather than calling
`summary`

on the model to inspect it, we can use`anova`

to get an analysis of variance table - The test statistic we are after here is the F-value. In the context of ANOVA, this statistic is interpreted as the ratio of the average between-group variability and the average within-group variability
- This is what ANOVA measures: is there significantly more variation between groups than there is within groups?

```
anv<-lm(Mean_RT ~ POS, dataSet)
anova(anv)
```

```
## Analysis of Variance Table
##
## Response: Mean_RT
## Df Sum Sq Mean Sq F value Pr(>F)
## POS 2 293004 146502 16.717 7.698e-08 ***
## Residuals 807 7072398 8764
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

- Another way of generating the same result is by calling
`summary`

on the result of the`aov`

function

```
anv<-aov(Mean_RT ~ POS, dataSet)
summary(anv)
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## POS 2 293004 146502 16.72 7.7e-08 ***
## Residuals 807 7072398 8764
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

- Since an ANOVA will only provide information on the overall significance of the differences between groups, we have to perform additional tests (called
`post-hoc tests`

in the ANOVA literature) to find out which pairwise differences are significant - One such test is the
`Tukey Honest Significant Difference test`

, it requires an`aov`

object - It provides the differences between the means (
`diff`

), their 95% confidence intervals (`lwr`

and`upr`

), as well as the p-value of the difference. Here we find that the differences between all groups are significant

`TukeyHSD(anv, "POS")`

```
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Mean_RT ~ POS, data = dataSet)
##
## $POS
## diff lwr upr p adj
## NN-JJ -32.39262 -53.11334 -11.671907 0.0007530
## VB-JJ -60.82503 -85.52794 -36.122123 0.0000000
## VB-NN -28.43241 -47.86687 -8.997948 0.0017998
```

- Full name of the test:
`one-way independent ANOVA`

- F-statistic and its degrees of freedom, usually written as
`F(2,877)=13.12`

- P-value
- Differences in the group means and their confidence intervals

- A two-way/factorial ANOVA can be used to simultaneously measure the influence of two or more factors on the group means
- Let us first prepare some data

```
dataSet <- dataSet %>%
mutate(Length_groups=as.factor(ntile(Length, 2)))
dataSet %>%
group_by(POS, Length_groups) %>%
summarise(mean=mean(Mean_RT))
```

```
## # A tibble: 6 x 3
## # Groups: POS [?]
## POS Length_groups mean
## <chr> <fctr> <dbl>
## 1 JJ 1 752.4481
## 2 JJ 2 838.3771
## 3 NN 1 736.1165
## 4 NN 2 790.0333
## 5 VB 1 702.1491
## 6 VB 2 767.4033
```

- The assumptions of the independent two-way ANOVA are identical to those of the one-way ANOVA:
- The observations are independent
- The response variable is ratio- or interval-scaled
- The scores in the groups are normally distributed
- The variance is homogeneous, i.e., the variances of the different groups should be equal

- Adjectives and Verbs appear to be normally distributed, but the nouns have a somewhat different shape
- Still, the fact that we have a large sample means that we do not have to worry about this too much, at least if the other assumptions are met

```
dataSet %>%
group_by(POS, Length_groups) %>%
summarise(shapiro.test=shapiro.test(Mean_RT)$p.value)
```

```
## # A tibble: 6 x 3
## # Groups: POS [?]
## POS Length_groups shapiro.test
## <chr> <fctr> <dbl>
## 1 JJ 1 0.31835465
## 2 JJ 2 0.06440925
## 3 NN 1 0.05135718
## 4 NN 2 0.03095467
## 5 VB 1 0.21072007
## 6 VB 2 0.04178584
```

```
ggplot(dataSet, aes(x=Mean_RT, group=Length_groups, color=Length_groups, fill=Length_groups)) +
geom_line(stat="density") +
facet_wrap(~POS)
```

```
library(car)
leveneTest(Mean_RT ~ Length_groups, data=dataSet)
```

```
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.537 0.4639
## 808
```

`leveneTest(Mean_RT ~ POS, data=dataSet)`

```
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.8434 0.05881 .
## 807
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

- To perform an independent two-way ANOVA, we need to specify
`sum`

contrasts on our factor variables - Then we can use the
`aov`

function to compute the ANOVA - Just like
`lm`

models,`aov`

models can (but do not have to) include interactions

```
library(car)
options(contr = c("contr.Sum"))
mod <- aov(Mean_RT ~ POS * Length_groups, dataSet)
```

- Next we should specify how the pairwise comparisons should be computed. This is especially important for unbalanced samples (i.e., when the different combinations of categories do not have the same amount of observations)
- In the ANOVA-literature the different options are known as
`Type I`

,`Type II`

, and`Type III`

`sum of squares`

:`Type I`

: For factors A, B we test the main effects of A and B, followed by the effect of their interaction`Type II`

: For factor A, B, we test the main effects of A and B.`Type III`

: For factors A, B we the test the main effects of A and B, while controlling for the other variables and their interactions

`Type III`

is the most useful in many cases. You can compute the ANOVA with`Type III`

sum of squares by calling the`Anova`

function from the`car`

package on an`aov`

model- Here we find that the interaction is not significant

```
library(car)
Anova(mod, type="III")
```

```
## Anova Table (Type III tests)
##
## Response: Mean_RT
## Sum Sq Df F value Pr(>F)
## (Intercept) 41331002 1 5306.7189 < 2.2e-16 ***
## POS 112195 2 7.2026 0.0007936 ***
## Length_groups 269509 1 34.6037 5.926e-09 ***
## POS:Length_groups 29376 2 1.8858 0.1523706
## Residuals 6261897 804
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

- As we did for the one-way independent ANOVA, we can also perform the
`Tuskey Honest Significant Difference test`

for the two-way independent ANOVA. - In this case we will want to specify the variable that is in focus
- Keep in mind that the p-values that are reported by the
`TukeyHSD`

test are based on`Type I`

sum of squares, so you may get results that look different from your`Anova`

output

`TukeyHSD(mod, "POS")`

```
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Mean_RT ~ POS * Length_groups, data = dataSet)
##
## $POS
## diff lwr upr p adj
## NN-JJ -32.39262 -51.92639 -12.85885 0.0003145
## VB-JJ -60.82503 -84.11288 -37.53718 0.0000000
## VB-NN -28.43241 -46.75360 -10.11121 0.0008338
```

- Full name of the test:
`two-way independent ANOVA`

- F-statistics and their degrees of freedom, usually written as
`F(2,877)=13.12`

- P-value
- Differences in the group means and their confidence intervals

- Please go to http://www.jeroenclaes.be/statistics_for_linguistics/class9.html and perform the exercises

- Field, A., Miles, J., & Field, Z. (2012).
*Discovering statistics using R*. New York, NY/London: SAGE. - Levshina, N. (2015).
*How to do Linguistics with R: Data exploration and statistical analysis*. Amsterdam/Philadelphia, PA: John Benjamins.