Curriculum Vitae

Curriculum Vitae

Basic Info

Name

Chen Xihao

Position

Ph.D. candidate, Computer Science

Affiliation

National University of Singapore (ISEP & School of Computing)

Advisors

Professor Roger Zimmermann, Professor Yangyang Guo


Education

Aug 2023 - Present

Doctor of Philosophy, National University of Singapore

Integrative Sciences and Engineering Programme (ISEP) and School of Computing

Aug 2019 - Jun 2023

Bachelor of Computing (Computer Science), National University of Singapore

Honours (Highest Distinction), GPA 4.69/5.00

  • Minor in Statistics
  • Distinction in Artificial Intelligence Focus Area and Database Focus Area
  • Dean's List Honours Roll (AY2021/22 Sem 2)

Publications

Make Your LVLM KV Cache More Lightweight

Xihao Chen, Yangyang Guo, and Roger Zimmermann

Transactions on Machine Learning Research, May 2026

Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the prefill stage. To tackle this problem, we propose LightKV, a novel approach that reduces KV cache size by exploiting the redundancy among vision-token embeddings. Guided by text prompts, LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill. This prompt-aware guidance distinguishes our method from prior vision-only compression strategies. We evaluate LightKV on eight open-source LVLMs across eight public benchmark datasets, e.g., MME and SeedBench. Experimental results demonstrate that with only 55% of the original vision tokens, LightKV (a) halves the vision-token KV cache size, (b) reduces computation by up to 40%, and (c) preserves general-purpose performance while significantly outperforming existing baselines.
Efficient Corset Selection for Accelerated Vision Instruction-Tuning

Xihao Chen, Yangyang Guo, and Roger Zimmermann

Under review ECCV, 2026

Large Vision-Language Models (LVLMs) have demonstrated remarkable performance across a range of cross-modal understanding tasks. However, their supervised fine-tuning (SFT) stage often requires extensive data, leading to substantial challenges given limited resource budgets. In this work, we focus specifically on visual instruction SFT, where models are trained on multimodal instruction–response pairs rather than task-specific adaptation datasets. To address this bottleneck, recent efforts in data efficiency have exclusively relied on coreset selection to produce a reduced dataset of informative samples. All these methods, as found in this work, incur significant resource burdens in both time and additional storage required for coreset selection. Therefore, we propose a novel, resource-light coreset selection method for alleviating this bottleneck. Our method adopts a two-stage design: First, an LLM estimates the linguistic difficulty of each sample without visual input to identify high-language-prior samples. Second, we introduce a biased sampling distribution that favors challenging samples while maintaining data diversity. We evaluate our method on three representative models: LLaVA-1.5-7B, Qwen2-VL-7B, and InternVL2-8B, trained on two general-purpose datasets for visual instruction SFT. Our method consistently outperforms existing state-of-the-art baselines at the same coreset size budgets. More importantly, our approach delivers significant benefits in coreset selection efficiency than these baselines. These results together demonstrate the effectiveness and lightweight nature of our approach for efficient LVLM SFT, especially in resource-limited settings.

Projects

Multi-Modal Entity Resolution

Undergraduate dissertation project, supervised by Professor Kian-Lee Tan.

NLP Fake News Detection with Graph Neural Networks

CS4248 Project. We augment Long Short-Term Memory neural networks (LSTMs) to include the attention mechanism and empirically show its effectiveness. Then we propose a new model that attaches the attention-based LSTM layers as input to Graph Convolutional Networks (GCNs).

Transfer Learning for Tourist Photograph Location Recognition

CS3244 Machine Learning Project. We performed transfer learning of Convolutional Neural Networks (CNNs) for location recognition of tourist photographs.


Experience

May 2026 - Present

Machine Learning Engineer Intern, ByteDance Pte. Ltd.

May 2022 - Dec 2022

Machine Learning Research Intern, PayPal Pte. Ltd.

Feb 2021 - Aug 2021

Software Engineer Intern, Cialfo Pte. Ltd.


Skills

Programming

Python, Java, C++

Tools

PyTorch, DeepSpeed, vLLM, Hadoop, Spark, HuggingFace, Git, LaTeX, Markdown


Awards

Jan 2025

School of Computing Teaching Fellowship Scheme (TFS) Award

~10 recipients per year.

Aug 2023

NUS President's Graduate Fellowship

Top 5% of PhD admissions.

Jun 2022

Dean's List Honours Roll

AY2021/22 Semester 2.


Languages

  • Chinese (native, bilingual)
  • English (native, bilingual)