Skip to Main content Skip to Navigation
Conference papers

Imputation multiple pour données mixtes par analyse factorielle

Abstract : Abstract. Accounting for more and more data complicates increasingly their analysis. This complexity results in variables of various types, the presence of missing data, and a large number of variables and / or observations. The application of statistical methods in this context is usually tricky. The purpose of this presentation is to propose a new multiple imputation method based on the factorial analysis of mixed data (FAMD). FAMD is a suitable factor analysis method for datasets with quantitative and qualitative variables, the number of which may or may not exceed the number of observations. By virtue of its properties, the development of a multiple imputation method based on FAMD allows inference from quantitative and qualitative incomplete variables, in large and small dimension. The proposed multiple imputation method uses a bootstrap approach to reflect the uncertainty on the principal components and eigenvector of FAMD, used here to predict (impute) the data. Each bootstrap replication then provides a prediction for incomplete data of the dataset. Next, these predictions are noised to reflect the distribution of the data. We thus obtain as many imputed tables as bootstrap replicates. After recalling the principles of multiple imputation, we will present our methodology. The proposed method will be evaluated by simulation and compared to the reference multiple imputation methods : sequential imputation by generalized linear model, imputation by non-parametric Bayesian joint model, and by general location model. The proposed method provides unbiased point estimates of various parameters of interest as well as confidence intervals at the expected coverage. In addition, it can be applied to datasets of various type and of various sizes, in particular to deal with cases where the number of observations is smaller than the number of variables.
Document type :
Conference papers
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download
Contributor : Catherine Cliquet Connect in order to contact the contributor
Submitted on : Friday, November 8, 2019 - 2:20:41 PM
Last modification on : Wednesday, September 28, 2022 - 5:54:07 AM
Long-term archiving on: : Monday, February 10, 2020 - 4:45:29 AM


Files produced by the author(s)


  • HAL Id : hal-02355840, version 1


Vincent Audigier, François Husson, Julie Josse, Matthieu Resche-Rigon. Imputation multiple pour données mixtes par analyse factorielle. JdS2019 - 51es Journées de Statistique de la Société Française de Statistique, Société Française de Statistique, Jun 2019, Vandœuvre-lès-Nancy, France. ⟨hal-02355840⟩



Record views


Files downloads