Identifying Extreme Observations, Outliers and Noise in Clinical and Genetic Data

Author(s): Concepcion Arenas, Claudio Toma, Bru Cormand, Itziar Irigoien.

Journal Name: Current Bioinformatics

Volume 12 , Issue 2 , 2017

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: Currently, a major challenge is the treatment and interpretation of actual data. Data sets are often high-dimensional, have small number of observations and are noisy. Furthermore, in recent years, many approaches have been suggested for integrating continuous with categorical/ordinal data, in order to capture the information which is lost in independent studies.

Objective: The aim of this paper is to develop a statistical tool for the detection of outliers adapted to any kind of features and to high-dimensional data.

Method: Data is an nxp data matrix (n<<p) where the rows correspond to observations, the columns correspond to any kind of features. The new procedure is based on the distances between all the observations and offers a ranking by assigning each observation a value reflecting its degree of outlyingness. It was evaluated by simulation and by using actual data from clinical and genetic studies.

Results: The simulation studies showed that the procedure correctly identified the outliers, was robust in front of the masking effect and was useful in the detection of noise. With simulated two-sample microarray data sets, it correctly detected outliers, especially when many genes showed increased expression only for a small number of samples. The method was applied to adult lymphoid malignancies, human liver cancer and autism multiplex families’ data sets obtaining good and valuable results.

Conclusion: The actual and simulation studies show the efficiency of the procedure, offering a useful tool in those applications where the detection of outliers or noise is relevant.

Keywords: Biomedical data, data depth, gene expression, microarray, noise, outlier, robust estimation.

Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 12
ISSUE: 2
Year: 2017
Page: [101 - 117]
Pages: 17
DOI: 10.2174/1574893611666160606161031
Price: $58

Article Metrics

PDF: 15
HTML: 2