Upcoming Assignments/Quizzes
Assignments | Open Time | Due Time |
---|---|---|
ANOVA Article Analysis Activity | October 22nd (1:00 am EST) | October 28th (11:55 pm EST) |
Module 8 Data Quiz | October 26th (1:00 am EST) | October 28th (11:55 pm EST) |
Module 9 Conceptual Quiz | October 26th (1:00 am EST) | October 28th (11:55 pm EST) |
Notes from Discussion Board/Office Hours
Relationship between the \(F\)-statistic, p-value, and null hypothesis
In sub-module 9.3, Dr. Baiser covers how to test hypotheses using ANOVA. To do this, we calculate our observed \(F\)-statistic using the mean square among groups and mean square within group from our observed data, and compare that to the distribution of possible \(F\)-statistics (i.e. the \(F\) distribution) based on the degrees of freedom (df) in the numeration and denominator of our \(F\)-statistic to determine how significance of our observed value.
Let’s make some plots to visualize this comparison step-by-step. I’ll use the same example from the sub-module 9.3 lecture. Let’s start from when we calculate our observed \(F\)-statistics (pg. 15 from 9.3 notes), which I’ll call f_obs
. Based on our calculations of the mean squares we determined that \(F_{obs} = 5.11\).
Now let’s draw our \(F\)-distribution. Recall that this is determined by the dfs in the numerator (\(df_{num}\)) and the denominator (\(df_{den}\)) of our \(F\)-statistic. If we have \(a\) number of treatments and \(n\) number of replicates, than \(df_{num} = a - 1\) and \(df_{den} = n(a-1)\). In our example, \(a=3\) and \(n=4\) (pg. 8), therefore \(df_{num} = 2\) and \(df_{num} = 9\). With this information we can draw our \(F\)-distribution by creating a vector of possible values of \(F\) and passing those into the df()
function in .
library(tidyverse)
library(ggpubr)
# Possible values of F-stat:
x = seq(from = 0, to = 10, by = 0.01)
# Probability of possible values of F-stat
y = df(x = x, df1 = 2, df2 = 9)
ggplot() +
geom_line(aes(x, y)) +
labs(x = "F-Statistic", y = "Probability") +
theme_pubclean()
This curve shows the possible values for the \(F\)-statistic (shown on the x-axis) and the probability of observing those values (y-axis) if the null hypothesis were true (based on the dfs we specified). We can use this to determine if we should reject or fail to reject the null hypothesis by comparing f_obs
to a theoretical \(F\)-statistic based on a critical value \(\alpha\), which you’ll recall is often set to \(\alpha = 0.05\). This \(F\)-statistic, which we will call f_crit
, will correspond to having a p-value of exactly 0.05.
It is important to note that we working with a density function, which means that we are interested in the area under the curve. We can not simply draw a line with a y-intercept of 0.05 to find f_crit
. Instead we need to find the “quantile” of our area of interest (5% or 0.05). Luckily the qf()
can calculate quantile for the \(F\)-distribution:
f_crit <- qf(p = 0.05, df1 = 2, df2 = 9, lower.tail = F)
Which determines that f_crit
is equal to 4.26. Note that we set lower.tail = F
because were are using a one-way test on the high end. Now we can draw the area under the curve that represents the “rejection region”:
ggplot(data.frame(x,y)) +
geom_line(aes(x, y)) +
stat_function(fun = df,
args = list(df1 = 2, df2 = 9),
xlim = c(f_crit, 10),
geom = "area",
fill = "red",
alpha = 0.6) +
labs(x = "F-Statistic", y = "Probability") +
theme_pubclean()
Finally, let’s add f_obs
to our plot:
f_obs = 5.11
ggplot(data.frame(x,y)) +
geom_line(aes(x, y)) +
stat_function(fun = df,
args = list(df1 = 2, df2 = 9),
xlim = c(f_crit, 10),
geom = "area",
fill = "red",
alpha = 0.6) +
geom_vline(aes(xintercept = f_obs), color = "darkblue", linetype = 2) +
labs(x = "F-Statistic", y = "Probability") +
theme_pubclean()
As you can see, f_obs
falls in the rejection region, and therefore we will reject the null hypothesis that there is no difference between our treatments. As a final note, we can also calculate the p-value associated with f_obs
using the pf()
function:
p_value <- pf(f_obs, df1 = 2, df2 = 9, lower.tail = F)
round(p_value, 3)
## [1] 0.033