Background: Code smells are symptoms that something may be wrong in software systems that can cause complications in maintaining software quality. In literature, there exist many code smells and their identification is far from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve software quality.
Objective: This paper presents an up-to-date review of simple and hybrid machine learning-based code smell detection techniques and tools.
Methods: We collected all the relevant research published in this field till 2020. We extracted the data from those articles and classified them into two major categories. In addition, we compared the selected studies based on several aspects like code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation approach, and statistical testing.
Results: A majority of empirical studies have proposed machine-learning based code smell detection tools. Support vector machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of research is conducted on Open Source Softwares (OSS) such as Xerces, Gantt Project and ArgoUml. Furthermore, researchers pay more attention to Feature Envy and Long Method code smells.
Conclusion: We identified several areas of open research like the need for code smell detection techniques using hybrid approaches, the need for employing valid industrial datasets, etc.