🔬 Research
|
Humans exhibit an extraordinary capacity to integrate information from multiple sensory modalities,
such as vision, auditory, and tactile inputs, to navigate and interpret their environments with
remarkable efficiency. This multimodal integration leverages the complementary strengths of each
sensory channel, facilitating a coherent and comprehensive understanding of complex surroundings.
Inspired by this cognitive prowess, my long-term objective is to develop AI systems that
emulate human-like multimodal synthesis, thereby enhancing robustness and adaptability in both
generative and understanding tasks.
Specifically, I focus on multimodal generative models, ranging from text-to-image synthesis to
Multimodal Large Language Models (MLLMs) and unified models.
|
🔥 News
|
2025-02: ✨ Our paper Science-T2I was accepted by CVPR 2025!
2024-09: I will join MSRA for an internship with Dr. Bin Li!
2024-07: ✨ Our paper CHOPS was
accepted by COLM 2024!
2024-02: I will start my internship in NYU Courant advised by Prof. Saining Xie. Looking forward to working
with Saining in New York!
|
📚 Publications
(* for equal contribution)
|
Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form
Video Understanding
Jialuo Li,
Bin
Li,
Jiahao Li,
Yan Lu
Under Review
Paper
/
Code
DIG optimizes long-form video understanding by distinguishing between "global" and "localized"
queries, applying different strategies for each with a training-free approach.
|
|
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with
Reinforcement Learning
Jitesh Jain,
Jialuo Li,
Zixian Ma,
Jieyu Zhang,
Chris Dongjoo Kim,
Sangho Lee,
Rohun Tripathi,
Tanmay Gupta,
Christopher Clark,
Humphrey Shi
Under Review
Project Page
/
Paper
/
Dataset
/
Code
SAGE introduces a human-inspired agentic framework that replaces resource-heavy frame processing
with iterative reasoning and a diverse toolkit for efficient long video understanding by leveraging
a novel synthetic data pipeline and a specialized RL strategy.
|
|
PAI-Bench: A Comprehensive Benchmark For Physical AI
Fengzhe Zhou*,
Jiannan Huang*,
Jialuo Li*,
Deva Ramanan,
Humphrey Shi
Under Review
Paper
/
Dataset
/
Code
PAI-Bench is a new benchmark designed to evaluate Physical AI capabilities across video generation
and understanding. The study finds that while current models produce high-quality visuals, they lack
the physical common sense and reasoning required to truly understand real-world dynamics.
|
|
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li,
Wenhao Chai,
Xingyu Fu,
Haiyang Xu,
Saining Xie
CVPR 2025
Project Page
/
Paper
/
Dataset
/
Code
/
Poster
We introduce SciScore, a reward model that improves generative models' scientific accuracy. Trained
in two stages, it achieves human-level performance in evaluating scientific image realism on
Science-T2I dataset.
|
|
CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs
Jingzhe Shi,
Jialuo Li,
Qinwei Ma,
Zaiwen Yang,
Huan Ma,
Lei Li
COLM 2024
Project Page
/
Paper
CHOPS uses small and large LLMs to create an efficient, safe LLM agent that accesses user data,
interacts with systems, and delivers accurate responses.
|
🏆 Honors and Awards
|
2023: Outstanding Scholarship in Social Work from Tsinghua University.
2022: Mr. and Mrs. Wong Yi-Chung Award, Friends of Tsinghua University.
2022: Outstanding Scholarship in Social Work from Tsinghua University.
2021: Member of the Chinese team of the 21st Asian Physics Olympiad (APhO).
2020: Gold Medalist 🏅 in the 37th Chinese Physics Olympiad (CPhO), ranking tenth
nationwide.
|
This Site Already Has

Visitors
This homepage is designed based on Jon Barron's homepage and
deployed on GitHub Pages. Last updated: Jan 21, 2026.
© 2026 Jialuo Li
|
|