Abstract
Electrocardiogram signal analysis is very difficult to classify cardiac
arrhythmia using machine learning methods. The ECG datasets normally come with
multiple missing values. The reason for the missing values is the faults or distortion.
When performing data mining, missing value imputation is the biggest task for data
preprocessing. This problem could arise due to incomplete medical datasets if the
incomplete missing values and cases were removed from the original database. To
produce a good quality dataset for better analyzing the clinical trials, the suitable
missing value imputation method is used. In this paper, we explore the different
machine-learning techniques for the computed missing value in the electrocardiogram
dataset. To estimate the missing imputation values, the collected data contains feature
dimensions with their attributes. The experiments to compute the missing values in the
dataset are carried out by using the four feature selection methods and imputation
methods. The implemented results are shown by combined features using IG
(information gain), GA (genetic algorithm) and the different machine learning
classifiers such as NB (naïve bayes), KNN (K-nearest neighbor), MLP (Multilayer
perception), and RF (Random forest). The GA (genetic algorithm) and IG (information
gain) are the best suitable methods for obtaining the results on lower dimensional
datasets with RMSE (Root mean square error. It efficiently calculates the best results
for missing values. These four classifiers are used to analyze the impact of imputation
methods. The best results for missing rate 10% to 40% are obtained by NB that is
0.657, 0.6541, 0.66, 0.657, and 0.657, as computed by RMSE (Root mean Square
error). It means that error will efficiently reduced by naïve bayes classifier.
Keywords: Datasets, GA (genetic algorithm), Feature selection, IG (Information gain), Missing values imputation, RMSE (Root mean square error).