Software for multiple imputation

This list given below is an update of Appendix A of Van Buuren S. (2012), Flexible Imputation of Missing Data, pp. 263-267.


Amelia by James Honaker, Gary King and Matthew Blackwell creates multiple imputations based on the multivariate normal model. Specialities include overimputation (remove observed values and impute) and time series imputation.
BaBooN by Florian Meinfelder that generates multiple imputations by chained equations. The package specializes in predictive mean matching for categorical data, and in imputation in data fusion situations where many records have the same missing data pattern.
cat by Joseph L. Schafer implements multiple imputation of categorical data according to the log-linear model as described in Chapters 7 and 8 of Schafer (1997).
Hmisc by Frank E. Harrell Jr contains several functions to diagnose, create and analyze multiple imputations. The major imputation functions are transcan() and aregImpute(). These functions can automatically transform the data. The function fit.mult.impute() combines analysis and pooling and can read mids objects created by mice.
kmi by Arthur Allignol performs a Kaplan-Meier multiple imputation, specifically designed to impute missing censoring times.
mi by Andrew Gelman, Jennifer Hill, Yu-Sung Su, Masanao Yajima and Maria Grazia Pittau implements a chained equations approach based on Bayesian regression methods. The software allows detailed examination of the fitted imputation model.
mice by Stef van Buuren and Karin Groothuis-Oudshoorn contributed the chained equations, or MICE algorithm. The package allows for a flexible setup of the imputation model using a predictor matrix and passive imputation.
MImix by Russell Steele, Naisyin Wang and Adrian Raftery implements a special pooling method using a mixture of normal distributions.
mitools by Thomas Lumley provides tools for analyzing and combining results from multiply imputed data.
MissingDataGUI by Xiaoyue Cheng, Dianne Cook, Heike Hofmann provides numeric and graphical summaries for the missing values from both discrete and continuous variables. Removed from CRAN.
missMDA by Francois Husson and Julie Josse contains the function MIPCA() that draws multiple imputations from principal components analysis.
miP by Paul Brix can read imputed data created by Amelia, mi and mice to visualize several aspects of the missing data.
mirf by Yimin Wu, B. Aletta, S. Nonyane and Andrea S. Foulkes provides a function mirf() that create multiple imputations using random forests. Removed from CRAN.
mix by Joseph L. Schafer implements the imputation methods based on the general location model as described in Chapter 9 of Schafer (1997).
norm by Joseph L. Schafer implements multiple imputation based on the multivariate normal model as described in Chapters 5 and 6 of Schafer (1997).
pan by Joseph L. Schafer implements multiple imputation for multivariate panel or clustered data using the linear mixed model.
VIM by Matthias Templ, Andreas Alfons and Alexander Kowarik introduced tools to visualize missing data before imputation. Imputation functions include hotdeck() and irmi(), both loosely based on a chained equations approach.
Zelig by Kosuke Imai, Gary King and Olivia Lau comes with a general zelig() function that supports analysis and pooling of multiply imputed data.
There are many R packages that contain methods for single imputation: arrayImpute, ForImp, imputation, impute, imputeMDR, mtsdi, missForest, robCompositions, rrcovNA, sbgcop, SeqKnn and yaImpute. The functions in these packages typically estimate the missing values in some way, rather than taking random draws.


S-PLUS S+MissingData is the most extensive implementation of the techniques described in Schafer (1997). The library has functions to fit the multivariate Gaussian, log-linear, and general location models using EM algorithm and data augmentation (DA) algorithms. The DA algorithms also produce multiple imputations. The library builds upon Schafer's code, but in some cases uses different algorithms. For example, the EM algorithm to fit the Gaussian model uses a Cholesky decomposition of the covariance rather than sweeps as in Schafer's imp.norm() function.
Hmisc by Frank E. Harrell Jr. was traditionally included as one of the standard libraries in S-PLUS.


The ice package by Patrick Royston is a user-contributed Stata package that provides an elegant implementation of multiple imputation by chained equations.
Stata 11 introduced the new multiple imputation command mi. This is a rich implementation of multiple imputation, including useful options for data manipulation.
Stata 12 extends the functionality of mi with the mi impute chained command, which essentially brings the functionality of the ice package in the Stata mi framework.


Since V8.2, PROC MI implements multiple imputation by the multivariate normal model. V9.3 released in the fall of 2011 added fuly conditional specification (FCS).
Since V8.2, PROC MIANALYZE takes the results of the complete-data analysis per dataset (e.g. by PROC LOGISTIC; BY _IMPUTATION_ pools the results. V9.0 adds specification of model effects and custom hypotheses of the parameters.
IVEware is a SAS callable software application that implements multiple imputation using chained equations, called sequential regressions in IVEware. The software allows for flexible imputation models, and includes the ability to specify bounds and data transformations. It has a dedicated command for creating fully synthetic data sets, and routines for regression under complex sampling designs.


Since SPSS17, MULTIPLE IMPUTATION, a part of the Missing Values module, supports multiple imputations by chained equations. Imputation and analysis can be done in a largely automatic fashion and is well integrated with the software for complete-data analysis. For algorithms see specific documentation.
In AMOS it is possible to generate multiple imputations under the large array of models supported by AMOS.
tw.sps is an SPSS macro by Joost van Ginkel that implements two-way imputation. This macro can also generate multiple imputations, and is geared towards imputing missing questionnaire items.

Other software

SOLAS 4.0 is a stand-alone package dedicated to multiple imputation. SOLAS implements five univariate methods to generate multiple imputations. SOLAS is the only software that implements propensity score matching. The software implement a non-iterative chained equations method. First, those cells that destroy the monotone pattern are imputed, followed by a second set of imputations of the missing data in the monotone part. Automatic pooling is done for complete-data statistics. Several pre- and post-imputation diagnostic plots are available.

Mplus Version 6 implements routines to generate, analyze and pool multiply imputed data. Multivariate imputations can created under a joint model based on the variance-covariance matrix (default) or by a form of conditional specification. Mplus embeds multiple imputation using an unrestricted imputation model that is specified behind the scenes (called H1 imputation). It is possible to specify a custom imputation model in conjunction with the Bayesian estimator (called H0 imputation).
NORM is a freeware program by Joseph L. Schafer that imputes missing data under the multivariate normal distribution. The latest version is V2.03 for Windows. The program contains routines to pool parameter estimates.
REALCOM-IMPUTE is a MLwiN 2.15 macro that generate imputations for data with two-level structures. The imputation model is an extension of the joint modeling approach to mixed numerical and categorical data with multilevel structure.
WinMICE (.exe) is a freeware Windows program by Gert Jacobusse that implements multiple imputation under the linear mixed model for two-level data using chained equations. It also contains some features from mice. Version 2005.