Hybrid Deep Learning and Data Mining approach for knowledge Discovery

Authors

  • Muhammad Shahid Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan ##default.groups.name.author##
  • Muhammad Nadeem Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan ##default.groups.name.author##
  • Maryam Israr Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan ##default.groups.name.author##
  • Mubasher Hussain Malik Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan ##default.groups.name.author##
  • Hamid Ghous Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan ##default.groups.name.author##

Keywords:

data mining; hybrid deep learning; pattern discovery; self-supervised learning; unstructured data; explainable artificial intelligence.

Abstract

The rapid expansion of large-scale unstructured data has introduced substantial challenges for effective pattern discovery, particularly in achieving a balance between high-capacity representation learning and interpretability. Although deep learning techniques have demonstrated exceptional capability in extracting rich latent representations from unstructured data, they often lack explicit mechanisms for generating interpretable knowledge patterns required for transparent and explainable analytical systems. To address this limitation, the present study proposes a Hybrid Deep Learning–Data Mining (H-DLDM) framework that systematically integrates self-supervised representation learning with explicit data mining methodologies. The proposed framework was evaluated using three publicly available datasets representing textual, visual, and multimodal data domains. Experimental findings demonstrate that the H-DLDM framework generates semantically cohesive latent representations, supports hierarchical and relational pattern discovery, and achieves a balanced trade-off among pattern quality, interpretability, computational efficiency, and scalability. Comparative analysis further reveals that the proposed framework consistently outperforms both deep learning–only and conventional data mining baseline approaches by producing more stable, interpretable, and semantically meaningful patterns without compromising large-scale analytical performance. Overall, the findings highlight the potential of hybrid analytical frameworks to transform deep latent representations into explicit and actionable knowledge structures. The proposed approach therefore contributes to the advancement of explainable, scalable, and knowledge-centric data mining systems for complex unstructured data environments.

Author Biographies

  • Muhammad Shahid, Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

    Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

  • Muhammad Nadeem, Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

    Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

  • Maryam Israr, Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

    Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

  • Mubasher Hussain Malik, Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

    Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

  • Hamid Ghous, Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

    Department of Computer Science & Information Technology, University of South Punjab (USP), Multan, Punjab, Pakistan

References

[1] Abbas, M. A. (2025). Advanced Synthesis and Multifunctional Characterization of Neodymium-Doped Ba₂NiCoFe₂₈₋ ₓO₄₆ X-Type Hexagonal Ferrites: A Comprehensive Study of Structural, Morphological, and Electromagnetic Properties. Sch Acad J Biosci, 8, 1213-1227.

[2] Akram, S., Abbas, M. A., Mahar, J., Rasool, M. S., & Junaid, M. INTERFACIAL DEFECT PASSIVATION AND PHOTOPHYSICAL ENGINEERING OF CSPBCL₃ QUANTUM DOTS VIA BISBENZIMIDAZOLIUM LIGANDS FOR ADVANCED ELECTRONIC DEVICES.

[3] Ali, R., Latif, S., Qayyum, A., & Malik, H. (2025). Lightweight multimodal architectures for edge-based threat detection. IEEE Internet of Things Journal, 12(3), 2781–2795. https://doi.org/10.1109/JIOT.2025.1234567

[4] Amin, M., Abbas, M. A., Mahar, J., Shahzad, M. S., & Rasool, M. S. (2026). Phyto-Mediated Green Synthesis and Physicochemical Characterization of Titanium Dioxide Nanoparticles for Environmental and Pharmacological Applications. Journal of Physical and Chemical Studies (JPCS), 1(4), 17–56. https://doi.org/10.5281/zenodo.19767807

[5] Atif, H. M., Shahzad, A., Khan, M. Z., Abbas, M. A., & Mahar, J. (2025). Design of Novel drug as Potential Anti-Prostate Cancer Activity: Thiophene Derivatives against prostate cancer cell line as therapeutic agents using Pharmacokinetics molecular docking and DFT studies. Indus Journal of Bioscience Research, 3(6), 548-559.

[6] Barros, C., Ramos, G., & Teixeira, A. (2023). SIEM-integrated testbeds for real-time cybersecurity analytics. Journal of Network and Computer Applications, 210, 103577. https://doi.org/10.1016/j.jnca.2022.103577

[7] Chen, Z., Luo, W., & Zhang, Y. (2022). Enhancing multimodal fusion for cybersecurity with adversarial robustness. ACM Transactions on Privacy and Security, 25(4), 1–25. https://doi.org/10.1145/3503012

[8] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, 4171–4186.

[9] Fernández, M., Blanco, R., & Perez, J. (2021). Real-time cyber threat detection using fusion of NLP and network log data. Computers & Security, 108, 102393. https://doi.org/10.1016/j.cose.2021.102393

[10] Huang, Y., Wang, Y., & Liu, L. (2022). Multimodal threat detection using ensemble deep learning approaches. IEEE Access, 10, 84937–84947. https://doi.org/10.1109/ACCESS.2022.3204439

[11] Jaegle, A., Gimeno, F., Vinyals, O., et al. (2021). Perceiver: General perception with iterative attention. International Conference on Machine Learning, 4651–4664.

[12] Abbas, M. A., & Rasool, M. S. (2026). Eco-Friendly Synthesis of Ag–Co3O4 Nanoparticles for Visible-Light Photocatalysis and DFT-Based Nonlinear Optical Investigation. Chemical Technology and Engineering Applications, 1(1), 23-34.

[13] Jain, M., Roy, A., & Ghosh, S. (2021). Vision-based security surveillance using deep learning techniques. Multimedia Tools and Applications, 80(5), 7253–7271. https://doi.org/10.1007/s11042-020-09856-y

[14] Junaid, M., Rasool, M. S., Abbas, M. A., & Mahar, J. (2024). Formulation Development and Evaluation of a Bilayered Tablet Containing Dapagliflozin and Metformin. Global Research Journal of Natural Science and Technology, 2(3).

[15] Kiela, D., Bulian, J., Clark, A., et al. (2021). VisualBERT: A simple and performant baseline for vision-and-language. arXiv preprint arXiv:1908.03557.

[16] Klimt, B., & Yang, Y. (2004). Introducing the Enron corpus. CEAS. http://www.cs.cmu.edu/~enron/

[17] Liu, Z., Zhang, X., & Peng, Y. (2023). Multimodal anomaly detection for real-time cyber threat analytics. Pattern Recognition, 138, 109426. https://doi.org/10.1016/j.patcog.2023.109426

[18] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

[19] Nguyen, T., Pham, H., & Vo, T. (2022). Deep multimodal fusion for hybrid cybersecurity systems. Journal of Cybersecurity, 8(1), 1–17. https://doi.org/10.1093/cybsec/tyac005

[20] Patel, D., & Kumar, A. (2022). Emotion-based multimodal security threat assessment using deep learning. Expert Systems with Applications, 187, 115911. https://doi.org/10.1016/j.eswa.2021.115911

[21] Qureshi, M., Usama, M., & Khan, S. (2024). Cross-modal threat detection in edge environments using TinyCLIP. Neurocomputing, 553, 165–177. https://doi.org/10.1016/j.neucom.2023.10.154

[22] Rahman, A., Baig, F., & Javed, M. (2023). Multimodal deep learning framework for detecting insider threats. Information Sciences, 636, 181–199. https://doi.org/10.1016/j.ins.2023.01.021

[23] Abbas, M. A., Junaid, M. J. M., Rasool, M. S., & Mahar, J. (2025). Structural and NLO Properties of Novel Organic 4-Bromo-4-Nitrostilbene Crystal: Experimental and DFT Study. International Research Journal of Management and Social Sciences, 6(4), 1-20.

[24] Rasool, M. S., Abbas, M. A., Khan, M. J., Mahar, J., & Khan, M. Z. IDENTIFICATION OF NATURAL EGFR TYROSINE KINASE INHIBITORS FROM CHENOPODIUM QUINOA WILLD. VIA COMBINATORIAL IN SILICO AND PHARMACOLOGICAL SCREENING.

[25] Raza, M., Iqbal, Z., & Tariq, M. (2024). Real-time fusion of computer vision and NLP for cybersecurity. Journal of Intelligent & Fuzzy Systems, 47(3), 3659 3669. https://doi.org/10.3233/JIFS-234512

[26] Raza, M., Iqbal, Z., & Tariq, M. (2024). Real-time fusion of computer vision and NLP for cybersecurity. Journal of Intelligent & Fuzzy Systems, 47(3), 3659 3669. https://doi.org/10.3233/JIFS-234512

[27] Singh, A., Li, X., & Yu, Y. (2022). FLAVA: A foundational language and vision alignment model. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15638–15648.

[28] Sultani, W., Chen, C., & Shah, M. (2018). Real-world anomaly detection in surveillance videos. IEEE Conference on Computer Vision and Pattern Recognition, 6479–6488.

[29] Sun, Y., He, J., & Tang, W. (2023). Detecting phishing attacks through multimodal content understanding. IEEE Transactions on Information Forensics and Security, 18, 543–555. https://doi.org/10.1109/TIFS.2023.3251087

[30] Tsai, Y.-H. H., Bai, S., Yamada, M., et al. (2019). Multimodal transformer for unaligned multimodal language sequences. ACL 2019, 6558–6569.

[31] Wang, M., Liu, F., & Zhang, C. (2023). Interpretable multimodal attention networks for detection. Information Fusion, 93, 102221. https://doi.org/10.1016/j.inffus.2023.102221

[32] Zhang, H., Yu, X., & Zhao, Q. (2023). Natural language-based threat detection in cybersecurity. Computers & Security, 126, 102984. https://doi.org/10.1016/j.cose.2023.102984

[33] Zhao, L., Tan, J., & Zhang, M. (2024). Cyber-physical fusion for anomaly detection using multimodal learning. Future Generation Computer Systems, 150, 439 450. https://doi.org/10.1016/j.future.2023.09.011

[34] Abbas, M. A., Junaid, M. J. M., Rasool, M. S., & Mahar, J. (2025). Structural and NLO Properties of Novel Organic 4-Bromo-4-Nitrostilbene Crystal: Experimental and DFT Study. International Research Journal of Management and Social Sciences, 6(4), 1-20.

[35] Abbas, M. A., Khan, M. Z., Atif, H. M., Shahzad, A., & Mahar, J. (2025). Computer-Aided Analysis of Oxino-bis-Pyrazolederivative as a Potential Breast Cancer Drug Based on DFT, Molecular Docking, and Pharmacokinetic Studies: Compared with the Standard Drug Tamoxifen. Indus Journal of Bioscience Research, 3(6), 535-537.

[36] Abbas, M. A., Mahar, J., Ali, N., Junaid, M., & Rasool, M. S. (2026). Green Synthesis of SnO₂ Nanomaterials: Photocatalytic Degradation of Methylene Blue and DFT-Based Investigation of Nonlinear Optical Properties. Journal of Physical and Chemical Studies (JPCS), 1(3), 1–29. https://doi.org/10.5281/zenodo.19693725

[37] Abbas, M. A., Mahar, J., Ali, N., Junaid, M., & Rasool, M. S. (2026). Photocatalytic Dynamics of Organic Dye Degradation on Graphitic Carbon Nitride: An Integrated Experimental and Theoretical Investigation. Journal of Physical and Chemical Studies (JPCS), 1(2), 1–23. https://doi.org/10.5281/zenodo.19693515

[38] Abbas, M. A., Mahar, J., Ali, N., Junaid, M., & Rasool, M. S. (2026). Interfacial Defect Passivation and Photophysical Modulation in Cesium Lead Chloride Perovskite Quantum Dots Using Bisbenzimidazolium Ligands for Advanced Optoelectronic Devices. Journal of Physical and Chemical Studies (JPCS), 1(1), 1–18. https://doi.org/10.5281/zenodo.19666800

[39] Akram, S., Abbas, M. A., Mahar, J., Rasool, M. S., & Junaid, M. (2026). SYNTHESIS AND CHARACTERIZATION OF ZINC-DOPED CARBON DOTS FOR ENHANCED FLUORESCENCE APPLICATIONS. Policy Research Journal, 4(2), 168–177. https://policyrj.com/1/article/view/1550

[40] Bayoudh, K. (2024). A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges. Information Fusion, 105, 102217. https://doi.org/10.1016/j.inffus.2023.102217

[41] Chehreghani, M. H. (2024). A review on the impact of data representation on model explainability. ACM Computing Surveys, 56(10), 1–21.

[42] Chen, Y., Yuan, B., Chen, C., Li, Z., & Liao, B. (2025). When interpretability meets noise: An LLM-assisted hybrid deep logical rule learning framework. Machine Learning, 114(12), 283. https://doi.org/10.1007/s10994-025-06931-w

[43] Chudasama, Y., Huang, H., Purohit, D., & Vidal, M.-E. (2025). Toward interpretable hybrid AI: Integrating knowledge graphs and symbolic reasoning in medicine. IEEE Access, 13, 39489–39509. https://doi.org/10.1109/ACCESS.2025.3529133

[44] Dritsas, E., & Trigka, M. (2025). Exploring the intersection of machine learning and big data: A survey. Machine Learning and Knowledge Extraction, 7(1), 13. https://doi.org/10.3390/make7010013

[45] Govea, J., Gutierrez, R., Villegas-Ch, W., & Navarro, A. M. (2025). Hybrid AI for predictive cyber risk assessment: Federated graph-transformer architecture with explainability. IEEE Access, 13, 122187–122206. https://doi.org/10.1109/ACCESS.2025.3588076

[46] Gui, J., Chen, T., Zhang, J., Cao, Q., Sun, Z., Luo, H., & Tao, D. (2024). A survey on self supervised learning: Algorithms, applications, and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), 9052–9071. https://doi.org/10.1109/TPAMI.2024.3415112

[47] Han, J., Pei, J., & Tong, H. (2022). Data mining: Concepts and techniques (4th ed.). Morgan Kaufmann. Jahan, S., & Rahman, M. M. (2023). Towards understanding the impacts of textual dissimilarity on duplicate bug report detection. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 25–36). https://doi.org/10.1109/SANER56733.2023.00013

[48] Jha, A., Rakesh, V., Chandrashekar, J., Samavedhi, A., & Reddy, C. K. (2023). Supervised contrastive learning for interpretable long-form document matching. ACM Transactions on Knowledge Discovery from Data, 17(2), 1–17. https://doi.org/10.1145/3542822

[49] Ju, W., Yi, S., Wang, Y., Xiao, Z., Mao, Z., Li, H., Gu, Y., Qin, Y., Yin, N., Wang, S., Liu, X., Yu, P. S., & Zhang, M. (2025). A survey of graph neural networks in real world: Imbalance, noise, privacy and OOD challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–20. https://doi.org/10.1109/TVCG.2022.3209382

[50] Li, J., & Zhou, C. (2023). Incorporation of human knowledge into data embeddings to improve pattern significance and interpretability. IEEE Transactions on Visualization and Computer Graphics, 29(1), 723–733. https://doi.org/10.1109/TVCG.2022.3209382

[51] Mahmud, M., Kaiser, M. S., McGinnity, T. M., & Hussain, A. (2021). Deep learning in mining biological data. Cognitive Computation, 13(1), 1–33. https://doi.org/10.1007/s12559 020-09773-x

[51] Mavaie, P., Holder, L., & Skinner, M. K. (2023). Hybrid deep learning approach to improve classification of low-volume high-dimensional data. BMC Bioinformatics, 24(1), 419. https://doi.org/10.1186/s12859-023-05557-w

[52] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1–35. https://doi.org/10.1145/3457607

[53] Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., Schlötterer, J., Van Keulen, M., & Seifert, C. (2023). From anecdotal evidence to quantitative evaluationmethods: A systematic review on evaluating explainable AI. ACM Computing Surveys, 55(13s), 1–42. https://doi.org/10.1145/3583558

[54] Palli, S. S. (2025). Multimodal deep learning models for unstructured data integration in enterprise analytics. Journal of Computational Analysis & Applications, 34(8). Ponzi, V., & Napoli, C. (2025). Graph neural networks: Architectures, applications, and future directions. IEEE Access, 13, 62870–62891. https://doi.org/10.1109/ACCESS.2025.3558752

[55] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (pp. 8748–8763).

[56] Rane, N. L., Mallick, S. K., Kaya, Ö., & Rane, J. (2024). Applied machine learning and deep learning: Architectures and techniques. Deep Science Publishing. https://doi.org/10.70593/978-81-981271-4-3_1

[57] Sarkar, S., Vinay, S., Djeddi, C., & Maiti, J. (2022). Classification and pattern extraction of incidents: A deep learning-based approach. Neural Computing and Applications, 34(17), 14253–14274. https://doi.org/10.1007/s00521-021-06780-3

[58] Tran, B., Sudusinghe, C., Nguyen, S., & Alahakoon, D. (2023). Building interpretable predictive models with context-aware evolutionary learning. Applied Soft Computing, 132, 109854. https://doi.org/10.1016/j.asoc.2022.109854

[59] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2021). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386

[60] Zhang, H., Wu, B., Yuan, X., Pan, S., Tong, H., & Pei, J. (2024). Trustworthy graph neural networks: Aspects, methods, and trends. Proceedings of the IEEE, 112(2), 97–139. https://doi.org/10.1109/JPROC.2024.3369017

Downloads

Published

2026-05-23

How to Cite

Shahid, M., Nadeem, M., Israr, M., Malik, M. H., & Ghous, H. (2026). Hybrid Deep Learning and Data Mining approach for knowledge Discovery. NextGen AI & Computing Journal, 1(1), 23-46. https://scientia-nexus.org/index.php/nac/article/view/13

Most read articles by the same author(s)