Enhancing Biometric Security with Bimodal Deep Learning and Feature-level Fusion of Facial and Voice Data
DOI:
https://doi.org/10.26636/jtit.2024.4.1754Keywords:
biometric recognition, deep learning, multimodal systems, SincNet, voice modalityAbstract
Recent research in biometric technologies underscores the benefits of multimodal systems that use multiple traits to enhance security by complicating the replication of samples from genuine users. To address this, we present a bimodal deep learning network (BDLN or BNet) that integrates facial and voice modalities. Voice features are extracted using the SincNet architecture, and facial image features are obtained from convolutional layers. Proposed network fuses these feature vectors using either averaging or concatenation methods. A dense connected layer then processes the combined vector to produce a dual-modal vector that encapsulates distinctive user features. This dual-modal vector, processed through a softmax activation function and another dense connected layer, is used for identification. The presented system achieved an identification accuracy of 99% and a low equal error rate (EER) of 0.13% for verification. These results, derived from the VidTimit and BIOMEX-DB datasets, highlight the effectiveness of the proposed bimodal approach in improving biometric security.
Downloads
References
[1] S.A. Abdulrahman and B. Alhayani, "A Comprehensive Survey on the Biometric Systems Based on Physiological and Behavioral Characteristics", Materials Today: Proceedings, vol. 80, pp. 2642-2646, 2023. DOI: https://doi.org/10.1016/j.matpr.2021.07.005
View in Google Scholar
[2] S.K.S. Modak and V.K. Jha, "Multibiometric Fusion Strategy and its Applications: A Review", Information Fusion, vol. 49, pp. 174-204, 2019. DOI: https://doi.org/10.1016/j.inffus.2018.11.018
View in Google Scholar
[3] D. Patel, S. Patel, A.A. Thadeshwar, and R. Chaturvedi, "Multimodal Biometric Systems: A Review", International Journal of Advanced Research in Computer Science, vol. 9, no. 2, pp. 361-365, 2018. DOI: https://doi.org/10.26483/ijarcs.v9i2.5742
View in Google Scholar
[4] H. Mandalapu et al., "Audio-visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey", IEEE Access, vol. 9, pp. 37431-37455, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3063031
View in Google Scholar
[5] M. Singh, R. Singh, and A. Ross, "A Comprehensive Overview of Biometric Fusion", Information Fusion, vol. 52, pp. 187-205, 2019. DOI: https://doi.org/10.1016/j.inffus.2018.12.003
View in Google Scholar
[6] N. Alay and H.H. Al-Baity, "Deep Learning Approach for Multimodal Biometric Recognition System Based on Fusion of Iris, Face, and Finger Twenty Traits", Sensors, vol. 20, no. 19, art. no. 5523, 2020. DOI: https://doi.org/10.3390/s20195523
View in Google Scholar
[7] S. Shakil, D. Arora, and T. Zaidi, "Feature Based Classification of Voice Based Biometric Data Through Machine Learning Algorithm", Materials Today: Proceedings, vol. 51, pp. 240-247, 2022. DOI: https://doi.org/10.1016/j.matpr.2021.05.261
View in Google Scholar
[8] N.D. Al-Shakarchy, H.K. Obayes, and Z.N. Abdullah, "Person Identification Based on Voice Biometric Using Deep Neural Network", International Journal of Information Technology, vol. 15, no. 2, pp. 789-795, 2023. DOI: https://doi.org/10.1007/s41870-022-01142-1
View in Google Scholar
[9] N.K. Benamara, E. Zigh, T.B. Stambouli, and M. Keche, "Towards a Robust Thermal-visible Heterogeneous Face Recognition Approach Based on a Generative Cycle Adversarial Network", International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 4, pp. 132-145, 2022. DOI: https://doi.org/10.9781/ijimai.2021.12.003
View in Google Scholar
[10] D.M. Jiménez-Bravo et al., "Edge Face Recognition System Based on One-shot Augmented Learning", International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 6, pp. 31-44, 2022. DOI: https://doi.org/10.9781/ijimai.2022.09.001
View in Google Scholar
[11] A. Alcaide et al., "LIPSNN: A Light Intrusion-proving Siamese Neural Network Model for Facial Verification", International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 4, pp. 121-131, 2022. DOI: https://doi.org/10.9781/ijimai.2021.11.003
View in Google Scholar
[12] V. Talreja, M.C. Valenti, and N.M. Nasrabadi, "Multibiometric Secure System Based on Deep Learning", 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, Canada, 2017. DOI: https://doi.org/10.1109/GlobalSIP.2017.8308652
View in Google Scholar
[13] Q. Zhang, H. Li, Z. Sun, and T. Tan, "Deep Feature Fusion for Iris and Periocular Biometrics on Mobile Devices", IEEE Transactions on Information Forensics and Security, vol. 13, no. 11, pp. 2897-2912, 2018. DOI: https://doi.org/10.1109/TIFS.2018.2833033
View in Google Scholar
[14] Y. Xin et al., "Multimodal Feature-level Fusion for Biometrics Identification System on IoMT Platform", IEEE Access, vol. 6, pp. 21418-21426, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2815540
View in Google Scholar
[15] V.V. Khryashchev, A.I. Topnikov, A.F. Stefanidi, and A.L. Priorov, "Bimodal Person Identification Using Voice Data and Face Images", Eleventh International Conference on Machine Vision (ICMV 2018), Munich, Germany, 2019. DOI: https://doi.org/10.1117/12.2523138
View in Google Scholar
[16] A. Abozaid, A. Haggag, H. Kasban, and M. Eltokhy, "Multimodal Biometric Scheme for Human Authentication Technique Based on Voice and Face Recognition Fusion", Multimedia Tools and Applications, vol. 78, pp. 16345-16361, 2019. -0.3pt DOI: https://doi.org/10.1007/s11042-018-7012-3
View in Google Scholar
[17] O. Olazabal et al., "Multimodal Biometrics for Enhanced IoT Security", 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, USA, 2019. DOI: https://doi.org/10.1109/CCWC.2019.8666599
View in Google Scholar
[18] X. Zhang et al., "An Efficient Android-based Multimodal Biometric Authentication System with Face and Voice", IEEE Access, vol. 8, pp. 102757-102772, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.2999115
View in Google Scholar
[19] E. Al Alkeem et al., "Robust Deep Identification Using ECG and Multimodal Biometrics for Industrial Internet of Things", Ad Hoc Networks, vol. 121, art. no. 102581, 2021. DOI: https://doi.org/10.1016/j.adhoc.2021.102581
View in Google Scholar
[20] M. Leghari et al., "Deep Feature Fusion of Fingerprint and Online Signature for Multimodal Biometrics", Computers, vol. 10, no. 2, art. no. 21, 2021. DOI: https://doi.org/10.3390/computers10020021
View in Google Scholar
[21] C.F.F. Costa-Filho, J.V. Negreiro, and M.G.F. Costa, "Multimodal Biometric System Based on Autoencoders and Learning Vector Quantization", Brazilian Congress on Biomedical Engineering, Vitoria, Brazil, 2020.
View in Google Scholar
[22] C. Kamlaskar and A. Abhyankar, "Feature Level Fusion Framework for Multimodal Biometric System Based on CCA with SVM Classifier and Cosine Similarity Measure", Australian Journal of Electrical and Electronics Engineering, vol. 20, no. 2, pp. 205-218, 2023. DOI: https://doi.org/10.1080/1448837X.2022.2129147
View in Google Scholar
[23] Z. Zhang, H. Lu, P. Sang, and J. Wang, "MultiBioGM: A Hand Multimodal Biometric Model Combining Texture Prior Knowledge to Enhance Generalization Ability", in: Biometric Recognition (CCBR 2023), pp. 106-115, 2023. DOI: https://doi.org/10.1007/978-981-99-8565-4_11
View in Google Scholar
[24] V. Gurunathan and R. Sudhakar, "Multimodal Biometric System Using Palm Vein and Ear Images", Proceeding of International Conference on Computer Visions and Robotics, pp. 439-451, 2023. DOI: https://doi.org/10.1007/978-981-99-4577-1_36
View in Google Scholar
[25] T. Hafs, H. Zehir, A. Hafs, and A. Nait-Ali, "Multimodal Biometric System Based on the Fusion in Score of Fingerprint and Online Handwritten Signature", Applied Computer Systems, vol. 28, no. 1, pp. 37-49, 2023. DOI: https://doi.org/10.2478/acss-2023-0006
View in Google Scholar
[26] M. Ravanelli and Y. Bengio, "Speaker Recognition from Raw Waveform with SincNet", 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 2018. DOI: https://doi.org/10.1109/SLT.2018.8639585
View in Google Scholar
[27] Y. Badr, P. Mukherjee, and S.M. Thumati, "Speech Emotion Recognition using MFCC and Hybrid Neural Networks", Proceedings of the 13th International Joint Conference on Computational Intelligence, pp. 366-373, 2021. DOI: https://doi.org/10.5220/0010707400003063
View in Google Scholar
[28] A.K. Dubey and V. Jain, "Comparative Study of Convolution Neural Network's ReLU and Leaky-ReLU Activation Functions", in: Applications of Computing, Automation and Wireless Systems in Electrical Engineering, pp. 873-880, 2019. DOI: https://doi.org/10.1007/978-981-13-6772-4_76
View in Google Scholar
[29] D.B. Jadhav, G.S. Chavan, V.C. Bagal, and R.R. Manza, "Review on Multimodal Biometric Recognition System Using Machine Learning", Artificial Intelligence and Applications, vol. 20, pp. 1-7, 2023.
View in Google Scholar
[30] C. Sanderson and B.C. Lovell, "Multi-region Probabilistic Histograms for Robust and Scalable Identity Inference", in: Advances in Biometrics (Conference Proceedings), pp. 199-208, 2009. DOI: https://doi.org/10.1007/978-3-642-01793-3_21
View in Google Scholar
[31] D. Snyder, D. Povey, and G. Chen, "MUSAN: A Music, Speech, and Noise Corpus", ArXiv, 2015.
View in Google Scholar
[32] A. Zelinsky, "Learning OpenCV-Computer Vision with the OpenCV Library", IEEE Robotics & Automation Magazine, vol. 16, no. 3, p. 100, 2009. DOI: https://doi.org/10.1109/MRA.2009.933612
View in Google Scholar
[33] M. Wang, Z. Wang, and J. Li, "Deep Convolutional Neural Network Applies to Face Recognition in Small and Medium Databases", 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, China, 2017. DOI: https://doi.org/10.1109/ICSAI.2017.8248499
View in Google Scholar
[34] P. Ke, M. Cai, H. Wang, and J. Chen, "A Novel Face Recognition Algorithm Based on the Combination of LBP and CNN", 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 2018. DOI: https://doi.org/10.1109/ICSP.2018.8652477
View in Google Scholar
[35] Q. Xu and N. Zhao, "A Facial Expression Recognition Algorithm Based on CNN and LBP Feature", 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 2020. DOI: https://doi.org/10.1109/ITNEC48623.2020.9084763
View in Google Scholar
[36] A.B. Jung et al., "Imgaug", GitHub: San Francisco, USA, 2020 (https://github.com/aleju/imgaug).
View in Google Scholar
[37] J.-M. Cheng and H.-C. Wang, "A Method of Estimating the Equal Error Rate for Automatic Speaker Verification", 2004 International Symposium on Chinese Spoken Language Processing, Hong Kong, China, 2004.
View in Google Scholar
[38] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90
View in Google Scholar
[39] I. Aliyu, M.A. Bomoi, and M. Maishanu, "A Comparative Study of Eigenface and Fisherface Algorithms Based on OpenCV and Sci-kit Libraries Implementations", International Journal of Information Engineering & Electronic Business, vol. 14, no. 3, pp. 30-40, 2022. DOI: https://doi.org/10.5815/ijieeb.2022.03.04
View in Google Scholar
[40] D. Snyder et al., "X-vectors: Robust DNN Embeddings for Speaker Recognition", 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018. DOI: https://doi.org/10.1109/ICASSP.2018.8461375
View in Google Scholar
[41] Y. Kortli, M. Jridi, A. Al Falou, and M. Atri, "Face Recognition Systems: A Survey", Sensors, vol. 20, no. 2, art. no. 342, 2020. DOI: https://doi.org/10.3390/s20020342
View in Google Scholar
[42] A. Verma, A. Goyal, N. Kumar, and H. Tekchandani, "Face Recognition: A Review and Analysis", in: Computational Intelligence in Data Mining (Conference Proceedings), pp. 195-210, 2022. DOI: https://doi.org/10.1007/978-981-16-9447-9_15
View in Google Scholar
[43] Z. Bai and X. L. Zhang, "Speaker Recognition Based on Deep Learning: An Overview", Neural Networks, vol. 140, pp. 65-99, 2021. DOI: https://doi.org/10.1016/j.neunet.2021.03.004
View in Google Scholar
[44] A.Q. Ohi, M.F. Mridha, M.A. Hamid, and M.M. Monowar, "Deep Speaker Recognition: Process, Progress, and Challenges", IEEE Access, vol. 9, pp. 89619-89643, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3090109
View in Google Scholar
[45] M. Gofman et al., "Multimodal Biometrics via Discriminant Correlation Analysis on Mobile Devices", International Conference on Security and Management (SAM), Las Vegas, USA, 2018.
View in Google Scholar
[46] R. Ramachandra et al., "Smartphone Multimodal Biometric Authentication: Database and Evaluation", arXiv:1912.02487, 2019.
View in Google Scholar
[47] G. Antipov, N. Gengembre, O.L. Blouch, and G.L. Lan, "Automatic Quality Assessment for Audio-visual Verification Systems: The LOVe Submission to NIST SRE Challenge 2019", arXiv:2008.05889, 2020. DOI: https://doi.org/10.21437/Interspeech.2020-1434
View in Google Scholar
[48] S.O. Sadjadi et al., "The 2019 NIST Audio-visual Speaker Recognition Evaluation", The Speaker and Language Recognition Workshop: Odyssey 2020, Tokyo, Japan, 2020. DOI: https://doi.org/10.21437/Odyssey.2020-38
View in Google Scholar
[49] M. Liu et al., "Exploring Deep Learning for Joint Audio-visual Lip Biometrics", arXiv:2104.08510, 2021.
View in Google Scholar
[50] G. Fenu and M. Marras, "Demographic Fairness in Multimodal Biometrics: A Comparative Analysis on Audio-visual Speaker Recognition Systems", Procedia Computer Science, vol. 198, pp. 249-254, 2022. DOI: https://doi.org/10.1016/j.procs.2021.12.236
View in Google Scholar
[51] M.S. Saeed et al., "Single-branch Network for Multimodal Training", 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023. DOI: https://doi.org/10.1109/ICASSP49357.2023.10097207
View in Google Scholar
[52] G.P. Rajasekhar and J. Alam, "Audio-visual Speaker Verification via Joint Cross-attention", International Conference on Speech and Computer, Dharwad, India, 2023. DOI: https://doi.org/10.1007/978-3-031-48312-7_2
View in Google Scholar
[53] R. Tao et al., "Multi-stage Face-voice Association Learning with Keynote Speaker Diarization", arXiv:2407.17902, 2024. DOI: https://doi.org/10.1145/3664647.3688980
View in Google Scholar
[54] M. Abdrakhmanova et al., "One Model to Rule Them All: A Universal Transformer for Biometric Matching", IEEE Access, vol. 12, pp. 96729-96739, 2024. DOI: https://doi.org/10.1109/ACCESS.2024.3426602
View in Google Scholar
[55] A. Farhadipour, M. Chapariniya, T. Vukovic, and V. Dellwo, "Comparative Analysis of Modality Fusion Approaches for Audio-visual Person Identification and Verification", arXiv:2409.00562, 2024.
View in Google Scholar
[56] C. Wang, H. Zhu, and L. Xu, "Research on the Improvement of the Target Speaker Recognition System Based on Dual-Modal Fusion", 2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 2024. DOI: https://doi.org/10.1109/CVIDL62147.2024.10603613
View in Google Scholar
[57] Y. Jiang et al., "Target Speech Diarization with Multimodal Prompts", arXiv:2406.07198 , 2024. DOI: https://doi.org/10.1109/ICASSP48485.2024.10446072
View in Google Scholar
[58] C. Peng, L. He, and D. Su, "Fuse after Align: Improving Face-voice Association Learning via Multimodal Encoder", arXiv:2404.09509, 2024.
View in Google Scholar
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Khaled Merit, Mohammed Beladgham
This work is licensed under a Creative Commons Attribution 4.0 International License.