About Me

I’m a Masters student in Computer Vision at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) advised by Zhiqiang Shen and Salman Khan. I received my Integrated Masters degree in Computer Science from IIIT Bangalore in 2024. My research centers on multimodal models and agentic grounding, with an emphasis on enabling intelligent systems to perceive, reason, and act across diverse modalities in grounded environments.

📢 I’m passionate about vision-language models, spatio-temporal understanding, and visualization recommendation systems. Feel free to reach out if you’re interested in collaboration!

News

  • [July 2025] - Our One Last Attention for your Vision-Language Model has been accepted at ICCV 2025. See y’all in Hawaii
  • [June 2025] - We release VideoMolmo: Spatio-Temporal Grounding meets Pointing - Check out our preprint and code.
  • [May 2025] - Started a summer internship at Inception working with Prof. Boulbaba
  • [April 2024] - Our paper ScaleViz: Scaling Visualization Recommendation Models on Large Data has been accpeted at PAKDD 2024 as an oral.

Selected Publications

Patents


Contact: ghazi.ahmad@mbzuai.ac.ae | GitHub | LinkedIn | Google Scholar