A Comparison of Six Methods for Missing Data Imputation

Peter  Schmitt; Jonas  M; el; Mickael  Guedj

A Comparison of Six Methods for Missing Data Imputation

Abstract

Peter Schmitt, Jonas Mandel and Mickael Guedj

Missing data are part of almost all research and introduce an element of ambiguity into data analysis. It follows that we need to consider them appropriately in order to provide an efficient and valid analysis. In the present study, we compare 6 different imputation methods: Mean, K-nearest neighbors (KNN), fuzzy K-means (FKM), singular value decomposition (SVD), bayesian principal component analysis (bPCA) and multiple imputations by chained equations (MICE). Comparison was performed on four real datasets of various sizes (from 4 to 65 variables), under a missing completely at random (MCAR) assumption, and based on four evaluation criteria: Root mean squared error (RMSE), unsupervised classification error (UCE), supervised classification error (SCE) and execution time. Our results suggest that bPCA and FKM are two imputation methods of interest which deserve further consideration in practice.

Avertissement: Ce résumé a été traduit à l'aide d'outils d'intelligence artificielle et n'a pas encore été examiné ni vérifié

Partagez cet article

Faits saillants de la revue

Indexé dans

Index Copernic
Google Scholar
Sherpa Roméo
Base de données des revues académiques
Ouvrir la porte J
JournalSeek de génamique
Clés académiques
JournalTOC
RechercheBible
Infrastructure nationale du savoir de Chine (CNKI)
Annuaire des périodiques d'Ulrich
Accès à la recherche mondiale en ligne sur l'agriculture (AGORA)
Bibliothèque de revues électroniques
Recherche de référence
Université Hamdard
EBSCO AZ
Répertoire d’indexation des résumés pour les revues
OCLC-WorldCat
Catalogue en ligne SWB
Bibliothèque virtuelle de biologie (vifabio)
Publons
Euro Pub

Journal de biométrie et biostatistique