Jalal, Hafiza Dua, et al. “Cross-Modal Knowledge Mining Leveraging Multimodal Large Language Models for Automated Video Scene Understanding and Event Detection”. NextGen AI & Computing Journal, vol. 1, no. 1, May 2026, pp. 102-31, https://doi.org/10.5281/zenodo.20461727.