Background: Code smells are symptoms that something may be wrong in software systems
that can cause complications in maintaining software quality. In literature, there exist many
code smells and their identification is far from trivial. Thus, several techniques have also been proposed
to automate code smell detection in order to improve software quality.
Objective: This paper presents an up-to-date review of simple and hybrid machine learning-based
code smell detection techniques and tools.
Methods: We collected all the relevant research published in this field till 2020. We extracted the
data from those articles and classified them into two major categories. In addition, we compared the
selected studies based on several aspects like code smells, machine learning techniques, datasets,
programming languages used by datasets, dataset size, evaluation approach, and statistical testing.
Results: A majority of empirical studies have proposed machine-learning based code smell detection
tools. Support vector machine and decision tree algorithms are frequently used by the researchers.
Along with this, a major proportion of research is conducted on Open Source Softwares (OSS)
such as Xerces, Gantt Project and ArgoUml. Furthermore, researchers pay more attention to Feature
Envy and Long Method code smells.
Conclusion: We identified several areas of open research like the need for code smell detection
techniques using hybrid approaches, the need for employing valid industrial datasets, etc.