Manuscripts
-
LIFEBench: Evaluating Length Instruction Following in Large Language Models
Wei Zhang*, Zhenhong Zhou*, Junfeng Fang*, Rongwu Xu*, Kun Wang*, Yuanhe Zhang, Rui Wang, Ge
Zhang, Xinfeng Li, Li Sun, Lingjuan Lyu, Yang Liu, Sen Su
arXiv Preprint
[Paper][Project Page][Code][HF]
-
AI Awareness
Xiaojian Li, Haoyuan Shi, Rongwu Xu†, Wei Xu
arXiv Preprint
[Paper][Project Page]
-
Humanity's Last Exam
Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, ... Rongwu Xu ..., Summer Yue, Alexandr
Wang, Dan Hendrycks
arXiv Preprint
[Paper][Project
Page]
-
Rules Created by Symbolic Systems Cannot Constrain a Learning System
Shih-Wai Lin, Rongwu Xu, Xiaojian Li, Wei Xu
SSRN Preprint
[Paper]
-
DebateQA: Evaluating Question Answering on Debatable Knowledge
Rongwu Xu*, Xuan Qi*, Zehan Qi, Wei Xu, Zhijiang Guo
arXiv Preprint
[Paper][Code]
Conferences & Journals
2025
-
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents
Rongwu Xu*, Xiaojian Li*, Shuo Chen*, Wei Xu
ACL 2025 Findings
[Paper][Project Page][Code][X Post][Slides][AI Safety China]
-
Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?
Chengda Lu, Xiaoyu Fan, Yu Huang, Rongwu Xu, Jijie Li, Wei Xu
ACL 2025 Findings
-
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Kun Wang, Yang Liu, Junfeng
Fang,
Yongbin Li
ICLR 2025 Oral
[Paper][Code]
2024
-
MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao,
Rongwu
Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang,
Zhan
Shi,
Bailin Wang, Zhijiang Guo, Jiaya Jia
NeurIPS 2024
[Paper][Project
Page][Code]
-
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu*, Zehan Qi*, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
EMNLP 2024
[Paper][Code][机器之心][Talk
(Chinese)][Slides][Poster][X Post]
-
Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity
and Bias
Rongwu Xu, Zi'an Zhou, Tianwei Zhang, Zehan Qi, Su Yao, Ke Xu, Wei Xu, Han Qiu
EMNLP 2024
[Paper][Poster]
-
Course-Correction: Safety Alignment Using Synthetic Preferences
Rongwu Xu*, Yishuo Cai*, Zhenhong Zhou, Renjie Gu, Haiqin Wang, Yan Liu, Tianwei Zhang, Wei
Xu,
Han
Qiu
EMNLP 2024
[Paper][Code][Poster][X Post]
-
$LONG^{2}RAG$: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point
Recall
Zehan Qi*, Rongwu Xu*, Zhijiang Guo, Cunxiang Wang, Hao Zhang, Wei Xu
EMNLP 2024 Findings
[Paper]
-
How Alignment and Jailbreak Work: Explain LLM Safety through
Intermediate Hidden States
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li
EMNLP 2024 Findings
[Paper][Code][Poster]
-
Sing it, Narrate it: Quality Musical Lyrics Translation
Zhuorui Ye, Jinhan Li, Rongwu Xu
EMNLP 2024 Findings
[Paper]
-
The Earth is Flat because...: Investigating LLMs' Belief towards
Misinformation via Persuasive Conversation
Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang,
Wei
Xu,
Han Qiu
ACL 2024 Oral
🏆 Outstanding Paper Award [Certificate]
[Paper][Project Page][Code][机器之心][Video][Poster][Slides]
-
Preemptive Answer ``Attacks'' on Chain-of-Thought
Reasoning
Rongwu Xu*, Zehan Qi*, Wei Xu
ACL 2024 Findings
[Paper][Code][Poster]
- Exploring Chinese Humor Generation: A Study on Two-Part
Allegorical Sayings
Rongwu Xu
IJCNN 2024
[Paper]
- Tempo: Confidentiality Preservation in Cloud-Based Neural
Network Training
Rongwu Xu and Zhixuan Fang
IJCNN 2024
[Paper]
- LSync: A Universal Timeline-synchronizing Solution for Live Streaming
Fan Dang*, Yifan Xu*, Rongwu Xu, Xinlei Chen, Yunhao Liu
IEEE/ACM Trans. on Networking
[Paper]
2023
- MISO:
Legacy-compatible Privacy-preserving Single Sign-on using Trusted Execution Environments
Rongwu Xu, Sen Yang, Fan Zhang, Zhixuan Fang
EuroS&P 2023
[Paper][Project
Page][Slides]
2022
- LSync:
A Universal Event-synchronizing Solution for Live Streaming
Yifan Xu, Fan Dang, Rongwu Xu, Xinlei Chen, Yunhao
Liu
INFOCOM 2022
[Paper]
- LifeRec: A Mobile App for Lifelog Recording
and Ubiquitous Recommendation
Jiayu Li, Hantian Zhang*, Zhiyu He*, Rongwu Xu*, Pingfei Wu*, Min Zhang, Yiqun Liu, Shaoping
Ma
CHIIR 2022
[Paper][Code]
Reports
-
The Singapore Consensus on Global AI Safety Research Priorities: Building a Trustworthy, Reliable
and
Secure AI Ecosystem
Dawn Song, Lan Xue, Luke Ong, Max Tegmark, Stuart Russell, Tegan Maharaj, Ya-Qin Zhang, Yoshua
Bengio,
... Rongwu Xu and others
SCAI 2025
[Paper][Project Page]
(* equal contribution, † corresponding author)