Missing Data Imputation in R
There are many, many ways to impute missing data in R. CRAN has an article detailing them and the accompanying packages: https://cran.r-project.org/web/views/MissingData.html. Today, I wanted to do some rapid prototyping of ideas on a dataset with about 16,000 observations that had multiple instances of missing data. I started trying things on the list from CRAN. It was a good reminder that R packages are written for and by statisticians.
I just wanted something that would take in a data frame with missing values and output a data frame with those missing values filled in. I didn’t really care how for the ideas I wanted to test. I wanted to worry about those details later. Remember, I was after rapid prototyping. But just about every package I tried didn’t want to do it that way. Oh no, I needed to be made to think about every step int he imputation process. And I was forced to do so by most packages because they give you functions that walk you through the process, forcing you to make many choices along the way.
Finally, I found two packages that seem to do what I wanted to do — rather than making me do what the pedantic package programmer thought I should do. Sheesh! VIM (R package, not the text editor) offers the kNN() function that promises to return a data frame with no missing values. And MI offered a similar functionality (once you get the data in its missing_data.frame format).