Jalal, H. D., Aslam, S., Sultan, M. H., Raee, G. M. U. D., Azam, M., & Malik, M. H. (2026). Cross-Modal Knowledge Mining Leveraging Multimodal Large Language Models for Automated Video Scene Understanding and Event Detection. NextGen AI & Computing Journal, 1(1), 102-131. https://doi.org/10.5281/zenodo.20461727