k-Means Walk: Unveiling Operational Mechanism of a Popular Clustering Approach for Microarray Data

Victor Chukwudi Osamor; Ezekiel Femi Adebiyi; Ebere Hezekiah Enekwa

k-Means Walk: Unveiling Operational Mechanism of a Popular Clustering Approach for Microarray Data

Abstract

Victor Chukwudi Osamor, Ezekiel Femi Adebiyi and Ebere Hezekiah Enekwa

Since data analysis using technical computational model has profound influence on interpretation of the final results, basic understanding of the underlying model surrounding such computational tools is required for optimal experimental design by target users of such tools. Despite wide variation of techniques associated with clustering, cluster analysis has become a generic name in bioinformatics and is seen to discover the natural grouping(s) of a set of patterns, points or sequences. The aim of this paper is to analyze k-means by applying a step-by-step k-means walk approach using graphic-guided analysis to provide clear understanding of the operational mechanism of the k-means algorithm. Scattered graph was created using theoretical microarray gene expression data which is a simplified view of a typical microarray experiment data. We designate the centroid as the first three initial data points and applied Euclidean distance metrics in the k-means algorithm leading to assignment of these three data points as reference point to each cluster formation. A test is conducted to determine if there is a shift in centroid before the next iteration is attained. We were able to trace out those data points in same cluster after convergence. We observed that, as both the dimension of data and gene list increases for hybridization matrix of microarray data, computational implementation of k-means algorithm becomes more rigorous. Furthermore, the understanding of this approach will stimulate new ideas for further development and improvement of the k-means clustering algorithm especially within the confines of the biology of diseases and beyond. However, the major advantage will be to give improved cluster output for the interpretation of microarray experimental results, facilitate better understanding for bioinformaticians and algorithm experts to tweak k-means algorithm for improved run-time of clustering.

Avertissement: Ce résumé a été traduit à l'aide d'outils d'intelligence artificielle et n'a pas encore été examiné ni vérifié

Partagez cet article

Faits saillants de la revue

Indexé dans

Indice source CAS (CASSI)
Index Copernic
Google Scholar
Sherpa Roméo
Base de données des revues académiques
JournalSeek de génamique
JournalTOC
CiterFactor
Bibliothèque de revues électroniques
Recherche de référence
Université Hamdard
EBSCO AZ
Répertoire d’indexation des résumés pour les revues
Catalogue mondial des revues scientifiques
OCLC-WorldCat
Direction des chercheurs
Catalogue en ligne SWB
Bibliothèque virtuelle de biologie (vifabio)
Publons
Dtu le trouve
Fondation genevoise pour l'enseignement et la recherche médicale

Journal d'informatique et de biologie des systèmes