Background: Due to easily available Virus Creation Kits that help in the generation of variants
from an original malware in no time on the Internet has led to an exponential growth of the advanced malware.
In recent years, the detection of advanced malicious programs like Metamorphic and Polymorphic
variants has become a major issue for the Anti-Virus companies due to their concealing property either by
mutating their code or by using obfuscation techniques according to recent patents. Due to mutation property
of the morphed malware, the detection methods based on signature and heuristic techniques seem to
be irrelevant solutions.
Methods: In this paper, K-means clustering is used to identify the variants of known malicious programs.
In the proposed method, K-Means algorithm is applied on the dataset consisting of variants generated from
Virus Creation Kits like MPCGEN/G2 and normal malicious files downloaded from the Internet. For computing
the similarity score (using Euclidean distance equation), opcode sequence pattern matching is used.
Results: Based on the similarity score of opcode sequence pattern matching, the files considered in the dataset
are grouped as normal malware or Polymorphic/Metamorphic malware with promising accuracy rate
Conclusions: Due to the availability of Virus Generation Kits on the Internet and the concealing property
of the morphed malware, there has been an exponential growth of morphed malicious programs which are
complex and hard to be detected by Anti-Virus tools. The proposed method shows that K-means is very
effective for clustering malware variants mainly because it intuitively fits in solving the opcode sequence