Normality test in R

Sep 1

Today I learned the hard way that the Shapiro-Wilkes test for normality in R only works on a sample size between 3 and 5,000. The fact I’m just learning this today does make me a little sad about the sample sizes I’ve worked with so far. :(

However, the Anderson-Darling test from the “nortest” library seems like a good drop-in replacement.

library(nortest)
ad.test(ModelData)$p.value

Specifically, I had been using the Shapiro-Wilkes test in a handy function I wrote that automatically decides whether a T-Test or Wilcox test is appropriate given distributional assumptions. The full (and recently updated) function is below.

# Get binomial p-value
library(ggpubr)
library(nortest)

compare_means = function(binary_group, var, data){

  # check t-test assumption
  # Anderson-Darling normality test for control
  control_assumption = ad.test(data[[var]][which(data[[binary_group]]==0)])$p.value>0.05
  # Anderson-Darling normality test for positive
  positive_assumption = ad.test(data[[var]][which(data[[binary_group]]==2)])$p.value>0.05
  
  
  t = t.test(data[[var]][which(data[[binary_group]]==0)]
             , data[[var]][which(data[[binary_group]]==1)]
             , var.equal = FALSE
  )
  
  
  w = wilcox.test(data[[var]][which(data[[binary_group]]==0)]
                  , data[[var]][which(data[[binary_group]]==1)]
                  , conf.int = TRUE)
  
  output=data.frame(
    variable = var
    , control_mean = t$estimate[1]
    , positive_mean = t$estimate[2]
    , ttest_different = t$p.value<0.05
    , wilcox_different = w$p.value<0.05
    , ttest_p = t$p.value
    , wilcox_p = w$p.value
    , control_normality_assumption = control_assumption
    , positive_normality_assumption = positive_assumption
    , t_difference_estimate_low95 = t$conf.int[[1]][1]
    , t_difference_estimate_high95 = t$conf.int[[2]][1]
    , wilcox_difference_estimate_low95 = w$conf.int[[1]][1]
    , wilcox_difference_estimate_high95 = w$conf.int[[2]][1]
  )
  rownames(output) <- NULL
  return(output)
}

Ryan Melvin

Normality test in R

Missing Data Imputation in R

My PR to Ray was accepted