Normality test in R
Today I learned the hard way that the Shapiro-Wilkes test for normality in R only works on a sample size between 3 and 5,000. The fact I’m just learning this today does make me a little sad about the sample sizes I’ve worked with so far. :(
However, the Anderson-Darling test from the “nortest” library seems like a good drop-in replacement.
library(nortest) ad.test(ModelData)$p.value
Specifically, I had been using the Shapiro-Wilkes test in a handy function I wrote that automatically decides whether a T-Test or Wilcox test is appropriate given distributional assumptions. The full (and recently updated) function is below.
# Get binomial p-value library(ggpubr) library(nortest) compare_means = function(binary_group, var, data){ # check t-test assumption # Anderson-Darling normality test for control control_assumption = ad.test(data[[var]][which(data[[binary_group]]==0)])$p.value>0.05 # Anderson-Darling normality test for positive positive_assumption = ad.test(data[[var]][which(data[[binary_group]]==2)])$p.value>0.05 t = t.test(data[[var]][which(data[[binary_group]]==0)] , data[[var]][which(data[[binary_group]]==1)] , var.equal = FALSE ) w = wilcox.test(data[[var]][which(data[[binary_group]]==0)] , data[[var]][which(data[[binary_group]]==1)] , conf.int = TRUE) output=data.frame( variable = var , control_mean = t$estimate[1] , positive_mean = t$estimate[2] , ttest_different = t$p.value<0.05 , wilcox_different = w$p.value<0.05 , ttest_p = t$p.value , wilcox_p = w$p.value , control_normality_assumption = control_assumption , positive_normality_assumption = positive_assumption , t_difference_estimate_low95 = t$conf.int[[1]][1] , t_difference_estimate_high95 = t$conf.int[[2]][1] , wilcox_difference_estimate_low95 = w$conf.int[[1]][1] , wilcox_difference_estimate_high95 = w$conf.int[[2]][1] ) rownames(output) <- NULL return(output) }