Malware Detection Using Ensemble N-gram Opcode Sequences

Dublin Core

Title

Malware Detection Using Ensemble N-gram Opcode Sequences

Subject

Malware Detection
N-Gram
Opcode
Machine Learning
Ensemble
Grid Search

Description

Conventional approaches to tackling malware attacks have proven to be futile at detecting never-before-seen (zero-day) malware. Research however has shown that zero-day malicious files are mostly semantic-preserving variants of already existing malware, which are generated via obfuscation methods. In this paper we propose and evaluate a machine learning based malware detection model using ensemble approach. We employ a strategy of ensemble where multiple feature sets generated from different n-gram sizes of opcode sequences are trained using a single classifier. Model predictions on the trained multi feature sets are weighted and combined on average to make a final verdict on whether a binary file is malicious or benign. To obtain optimal weight combination for the ensemble feature sets, we applied a grid search on a set of pre-defined weights in the range 0 to 1. With a balanced dataset of 2000 samples, an ensemble of n-gram opcode sequences of n sizes 1 and 2 with respective weight pair 0.3 and 0.7 yielded the best detection accuracy of 98.1% using random forest (RF) classifier. Ensemble n-gram sizes 2 and 3 obtained 99.7% as best precision using weight 0.5 for both models.

Creator

Yeboah, Paul Ntim
Amuquandoh, Stephen Kweku
Musah, Haruna Balle Baz

Source

International Journal of Interactive Mobile Technologies (iJIM); Vol. 15 No. 24 (2021); pp. 19-31
1865-7923

Publisher

International Association of Online Engineering (IAOE), Vienna, Austria

Date

2021-12-21

Rights

Copyright (c) 2021 Paul Ntim Yeboah, Stephen Kweku Amuquandoh, Haruna Balle Baz Musah
https://creativecommons.org/licenses/by/4.0

Relation

Format

application/pdf

Language

eng

Type

info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article

Identifier

Citation

Paul Yeboah Ntim, Stephen Amuquandoh Kweku and Haruna Musah Balle Baz, Malware Detection Using Ensemble N-gram Opcode Sequences, International Association of Online Engineering (IAOE), Vienna, Austria, 2021, accessed November 7, 2024, https://igi.indrastra.com/items/show/2114

Social Bookmarking