Jalal, H.D. (2026) “Cross-Modal Knowledge Mining Leveraging Multimodal Large Language Models for Automated Video Scene Understanding and Event Detection”, NextGen AI & Computing Journal, 1(1), pp. 102–131. doi:10.5281/zenodo.20461727.