Malware Detection Using Ensemble N-gram Opcode Sequences
Dublin Core
Title
Malware Detection Using Ensemble N-gram Opcode Sequences
Subject
Malware Detection
N-Gram
Opcode
Machine Learning
Ensemble
Grid Search
Description
Conventional approaches to tackling malware attacks have proven to be futile at detecting never-before-seen (zero-day) malware. Research however has shown that zero-day malicious files are mostly semantic-preserving variants of already existing malware, which are generated via obfuscation methods. In this paper we propose and evaluate a machine learning based malware detection model using ensemble approach. We employ a strategy of ensemble where multiple feature sets generated from different n-gram sizes of opcode sequences are trained using a single classifier. Model predictions on the trained multi feature sets are weighted and combined on average to make a final verdict on whether a binary file is malicious or benign. To obtain optimal weight combination for the ensemble feature sets, we applied a grid search on a set of pre-defined weights in the range 0 to 1. With a balanced dataset of 2000 samples, an ensemble of n-gram opcode sequences of n sizes 1 and 2 with respective weight pair 0.3 and 0.7 yielded the best detection accuracy of 98.1% using random forest (RF) classifier. Ensemble n-gram sizes 2 and 3 obtained 99.7% as best precision using weight 0.5 for both models.
Creator
Yeboah, Paul Ntim
Amuquandoh, Stephen Kweku
Musah, Haruna Balle Baz
Source
International Journal of Interactive Mobile Technologies (iJIM); Vol. 15 No. 24 (2021); pp. 19-31
1865-7923
Publisher
International Association of Online Engineering (IAOE), Vienna, Austria
Date
2021-12-21
Rights
Copyright (c) 2021 Paul Ntim Yeboah, Stephen Kweku Amuquandoh, Haruna Balle Baz Musah
https://creativecommons.org/licenses/by/4.0
Relation
Format
application/pdf
Language
eng
Type
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article
Identifier
Citation
Paul Yeboah Ntim, Stephen Amuquandoh Kweku and Haruna Musah Balle Baz, Malware Detection Using Ensemble N-gram Opcode Sequences, International Association of Online Engineering (IAOE), Vienna, Austria, 2021, accessed November 7, 2024, https://igi.indrastra.com/items/show/2114