About Me
I’m a Masters student in Computer Vision at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) advised by Zhiqiang Shen and Salman Khan. I received my Integrated Masters degree in Computer Science from IIIT Bangalore in 2024. My research centers on multimodal models and agentic grounding, with an emphasis on enabling intelligent systems to perceive, reason, and act across diverse modalities in grounded environments.
📢 I’m passionate about vision-language models, spatio-temporal understanding, and visualization recommendation systems. Feel free to reach out if you’re interested in collaboration!
News
- [July 2025] - Our One Last Attention for your Vision-Language Model has been accepted at ICCV 2025. See y’all in Hawaii
- [June 2025] - We release VideoMolmo: Spatio-Temporal Grounding meets Pointing - Check out our preprint and code.
- [May 2025] - Started a summer internship at Inception working with Prof. Boulbaba
- [April 2024] - Our paper ScaleViz: Scaling Visualization Recommendation Models on Large Data has been accpeted at PAKDD 2024 as an oral.
Selected Publications
Ghazi Shazan Ahmad, Heakl A, Gani H, Shaker A, Shen Z, Krishna R, Khan F.S., Khan S, VideoMolmo: Spatio-Temporal Grounding meets Pointing, Under Review.
Ghazi Shazan Ahmad, Chen L, Yao T, Liu T, Shen Z, One Last Attention for your Vision-Language Model, ICCV-2025.
Heakl A, Sohail A, Ranjan M, Hossam R, Ghazi Shazan Ahmad, El-Geish M, Maher O, Shen Z, Khan F.S., Khan S, KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding, ACL-2025.
Ghazi Shazan Ahmad, Agarwal S, Mitra S, Rossi R, SCALE-VIZ: A Framework for Scaling Vis-Rec Models on Large Datasets, PAKDD-2024 (Oral).
Patents
- Ghazi Shazan Ahmad, Agarwal S, Mitra S, Rossi R, Doshi M, Porwal V, Paila S.M, Reinforcement Learning Based Framework for Scaling Visualization Recommendation Models on Large Data, US Patent App. 18/668,888 Filed.
Contact: ghazi.ahmad@mbzuai.ac.ae | GitHub | LinkedIn | Google Scholar