Enhancing DGA Detection with Machine Learning Algorithms
DOI:
https://doi.org/10.26636/jtit.2025.FITCE2024.2033Keywords:
character-based DGA, cybersecurity, DGA detection, DNS, machine learning-based DGA detection, dynamic malware analysis, word-based DGAAbstract
The domain generation algorithm (DGA) is a popular technique used by malware to reliably establish a connection to a command and control (C&C) server. Pseudo-random domain names generated by DGA are used to bypass security measures and allow attackers to maintain control over malware-infected devices. In this work, we present a two-pronged approach to detecting character-based and word-based DGA domain names, creating classifiers specifically tailored to each type. For character-based DGA detection, we employed seven traditional machine learning methods: support vector machine, extremely randomized trees, logistic regression, Gaussian naive Bayes, nearest centroid, random forests, and k-nearest neighbors. We applied a featureful approach, using features extracted from the domain names themselves. Some of these features were drawn from existing literature, while others were newly proposed by authors. Feature selection techniques were used to retain only the best-performing ones. For the more complex task of detecting word-based DGA domain names, we used CNN and LSTM models, relying solely on word embeddings derived from the domain name components. Performance evaluation shows that proposed method gives high-performing, specialized DGA classifiers, which can be combined to create a more general-purpose classifier.
Downloads
References
[1] "Botnet Threat Update Q3 2023", Spamhaus, [Online]. Available: https://info.spamhaus.com/botnet-threat-updates.
View in Google Scholar
[2] "What is a Botnet?", Palo Alto Networks, [Online]. Available: https://www.paloaltonetworks.com/cyberpedia/what-is-botnet.
View in Google Scholar
[3] A. Randall et al., "The Challenges of Blockchain-based Naming Systems for Malware Defenders", 2022 APWG Symposium on Electronic Crime Research (eCrime), Boston, USA, 2022.
View in Google Scholar
[4] X.H. Vu, X.D. Hoang, and T.H.H. Chu, "A Novel Model Based on Ensemble Learning for Detecting DGA Botnets", 2022 14th International Conference on Knowledge and Systems Engineering (KSE), Nha Trang, Vietnam, 2022.
View in Google Scholar
[5] E. Durmaz, "DGA Classification and Detection for Automated Malware Analysis", Cyber.WTF, 2017 [Online]. Available: https://cyber.wtf/2017/08/30/dga-classification-and-detection-for-automated-malware-analysis.
View in Google Scholar
[6] "Kaspersky Security Bulletin 2023", Kaspersky, [Online]. Available: https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2023/11/28102415/KSB_statistics_2023_en.pdf.
View in Google Scholar
[7] L. Asher-Dotan, "What is Domain Generation Algorithm: 8 Real World DGA Variants", Cybereason, [Online]. Available: https://www.cybereason.com/blog/what-are-domain-generation-algorithms-dga.
View in Google Scholar
[8] R. Sivaguru et al., "Inline Detection of DGA Domains Using Side Information", IEEE Access, vol. 8, pp. 141910-141922, 2020.
View in Google Scholar
[9] X.D. Hoang and X.H. Vu, "A Novel Machine Learning-based Approach for Detecting Word-based DGA Botnets", Journal of Theoretical and Applied Information Technology, vol. 99, no. 24, 2021.
View in Google Scholar
[10] D. Plohmann et al., "A comprehensive measurement study of domain generating malware", Proc. of 25th USENIX Security Symposium, Austin, USA, pp. 263-278, 2016.
View in Google Scholar
[11] J. Woodbridge, H.S. Anderson, A. Ahuja, and D. Grant, "Predicting Domain Generation Algorithms with Long Short-Term Memory Networks", arXiv, 2016.
View in Google Scholar
[12] M. Pereira et al., "Dictionary Extraction and Detection of Algorithmically Generated Domain Names in Passive DNS Traffic", International Symposium on Research in Attacks, Intrusions, and Defenses, Heraklion, Greece, 2018.
View in Google Scholar
[13] R.R. Curtin et al., "Detecting DGA Domains with Recurrent Neural Networks and Side Information", Proc. of the 14th International Conference on Availability, Reliability and Security - ARES'19, pp. 1-10, 2019.
View in Google Scholar
[14] X.D. Hoang and X.H. Vu, "An Improved Model for Detecting DGA Botnets Using Random Forest Algorithm", Information Security Journal: A Global Perspective, vol. 31, no. 4, pp. 441-450, 2021.
View in Google Scholar
[15] A. Cucchiarelli, C. Morbidoni, L. Spalazzi, and M. Baldi, "Algorithmically Generated Malicious Domain Names Detection Based on n-Grams Features", Expert Systems with Applications, vol. 170, art. no. 114551, 2021.
View in Google Scholar
[16] B. Yu et al., "Inline DGA Detection with Deep Networks", 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, USA, 2017.
View in Google Scholar
[17] K. Highnam, D. Puzio, S. Luo, and N.R. Jennings, "Real-time Detection of Dictionary DGA Network Traffic Using Deep Learning", SN Computer Science, vol. 2, art. no. 110, 2021.
View in Google Scholar
[18] D. Tran et al., "A LSTM Based Framework for Handling Multiclass Imbalance in DGA Botnet Detection", Neurocomputing, vol. 275, pp. 2401-2413, 2018.
View in Google Scholar
[19] X.D. Hoang and Q.C. Nguyen, "Botnet Detection Based on Machine Learning Techniques Using DNS Query Data", Future Internet, vol. 10, art. no. 43, 2018.
View in Google Scholar
[20] S. Yadav, A.K.K. Reddy, A.L.N. Reddy, and S. Ranjan, "Detecting Algorithmically Generated Malicious Domain Names", Proc. of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 48-61, 2010.
View in Google Scholar
[21] H. Zhao, Z. Chang, G. Bao, and X. Zeng, "Malicious Domain Names Detection Algorithm Based on N-Gram", Journal of Computer Networks and Communications, pp. 1-9, 2019.
View in Google Scholar
[22] Y. Qiao et al., "DGA Domain Name Classification Method Based on Long Short-term Memory with Attention Mechanism", Applied Sciences, vol. 9, no. 20, art. no. 4205, 2019.
View in Google Scholar
[23] L. Yang et al., "A Novel Detection Method for Word-based DGA", Lecture Notes in Computer Science, vol. 11064, pp. 472-483, 2018.
View in Google Scholar
[24] F. Ren, Z. Jiang, X. Wang, and J. Liu, "A DGA Domain Names Detection Modeling Method Based on Integrating an Attention Mechanism and Deep Neural Network", Cybersecurity, vol. 3, art. no. 4, 2020.
View in Google Scholar
[25] Y. Li, K. Xiong, T. Chin, and C. Hu, "A Machine Learning Framework for Domain Generation Algorithm (DGA)-based Malware Detection", IEEE Access, vol. 7, pp. 32765-32782, 2019.
View in Google Scholar
[26] A. Géron, Hands-on Machine Learning with Scikit-learn, Keras, and TensorFlow, O’Reilly Media, Inc, 2nd ed., 848 p., 2019 (ISBN: 9781492032649).
View in Google Scholar
[27] A. Smola and S.V.N. Vishwanathan, Introduction to Machine Learning, Cambridge University Press: Cambridge, UK, 2008.
View in Google Scholar
[28] F. Pedregosa et al., "Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
View in Google Scholar
[29] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Applications in R, Springer, New York, 440 p., 2013.
View in Google Scholar
[30] K. Fukushima, "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position", Biological Cybernetics, vol. 36, pp. 193-202, 1980.
View in Google Scholar
[31] S. Saha, "A Comprehensive Guide to Convolutional Neural Networks the ELI5 way", Saturn Cloud, 2018 [Online], Available: https://saturncloud.io/blog/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way.
View in Google Scholar
[32] C. Olah, "Understanding LSTM Networks", Colah, 2015 [Online], Available: https://colah.github.io/posts/2015-08-Understanding-LSTMs/.
View in Google Scholar
[33] J. Starmer, "Long Short-Term Memory (LSTM), Clearly Explained", StatQuest with Josh Starmer, 2022 [Online], Available: https://www.youtube.com/watch?v=YCzL96nL7j0.
View in Google Scholar
[34] S. Hochreiter and J. Schimdhuber, "Long Short-Term Memory", Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
View in Google Scholar
[35] "The Majestic Million", Majestic, [Online], Available: https://majestic.com/reports/majestic-million.
View in Google Scholar
[36] J. Bader, "Binary Reverse Engineering Blog", [Online], Available: https://bin.re/blog.
View in Google Scholar
[37] J. Bader, "Domain Generation Algorithms", GitHub repository, [Online], Available: https://github.com/baderj/domain_generation_algorithms.
View in Google Scholar
[38] A. Abakumov, "DGA", GitHub repository, [Online], Available: https://github.com/andrewaeva/DGA.
View in Google Scholar
[39] F. Denis, "Dyre/Dyreza DGA", GitHub repository, [Online], Available: https://gist.github.com/jedisct1/33ab6b4e81209dbf53a3.
View in Google Scholar
[40] "DGA", GitHub repository, [Online], Available: https://github.com/360netlab/DGA.
View in Google Scholar
[41] P. Chaignon, "DGA-collection", GitHub repository, [Online], Available: https://github.com/pchaigno/dga-collection.
View in Google Scholar
[42] T.D. Truong and G. Cheng, "Detecting Domain-flux Botnet Based on DNS Traffic Features in Managed Network", Security and Communication Networks, vol. 9, pp. 2338-2347, 2016.
View in Google Scholar
[43] J. Brownlee, "How to Perform Feature Selection with Numerical Input Data", Machine Learning Mastery, [Online], Available: https://machinelearningmastery.com/feature-selection-with-numerical-input-data/.
View in Google Scholar
[44] D. Takahashi, "Emotet Domain", GitHub repository, [Online], Available: https://github.com/HASH1da1/emotet-domain.
View in Google Scholar
[45] Wordninja 2.0.0., Python Package Index, [Online], Available: https://pypi.org/project/wordninja/.
View in Google Scholar
[46] R. Sivaguru et al., "An Evaluation of DGA Classifiers", 2018 IEEE International Conference on Big Data (Big Data), Seattle, USA, 2018.
View in Google Scholar
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Hubert Biros, Mirosław Kantor

This work is licensed under a Creative Commons Attribution 4.0 International License.