Manuscripts
-
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents
Rongwu Xu*, Xiaojian Li*, Shuo Chen*, Wei Xu
arXiv Preprint
[Paper][Project Page][Code][X Post][ai safety China]
-
Humanity's Last Exam
Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, ... Rongwu Xu ..., Summer Yue, Alexandr
Wang, Dan Hendrycks
arXiv Preprint
[Paper][Project
Page]
-
Rules Created by Symbolic Systems Cannot Constrain a Learning System
Shih-Wai Lin, Rongwu Xu, Xiaojian Li, Wei Xu
SSRN Preprint
[Paper]
-
DebateQA: Evaluating Question Answering on Debatable Knowledge
Rongwu Xu*, Xuan Qi*, Zehan Qi, Wei Xu, Zhijiang Guo
arXiv Preprint
[Paper][Code]
2025
-
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Kun Wang, Yang Liu, Junfeng
Fang,
Yongbin Li
ICLR 2025 Oral [CORE
A*]
[Paper][Code]
2024
-
Course-Correction: Safety Alignment Using Synthetic Preferences
Rongwu Xu*, Yishuo Cai*, Zhenhong Zhou, Renjie Gu, Haiqin Wang, Yan Liu, Tianwei Zhang, Wei
Xu,
Han
Qiu
EMNLP 2024 [CORE A*]
[Paper][Code][Poster][X Post]
-
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu*, Zehan Qi*, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
EMNLP 2024 [CORE A*]
[Paper][Code][机器之心][Talk
(Chinese)][Slides][Poster][X Post]
-
$LONG^{2}RAG$: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point
Recall
Zehan Qi*, Rongwu Xu*, Zhijiang Guo, Cunxiang Wang, Hao Zhang, Wei Xu
EMNLP 2024 Findings [CORE A*]
[Paper]
-
Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity
and Bias
Rongwu Xu, Zi'an Zhou, Tianwei Zhang, Zehan Qi, Su Yao, Ke Xu, Wei Xu, Han Qiu
EMNLP 2024 [CORE A*]
[Paper][Poster]
-
Sing it, Narrate it: Quality Musical Lyrics Translation
Zhuorui Ye, Jinhan Li, Rongwu Xu^
EMNLP 2024 Findings [CORE A*]
[Paper]
-
How Alignment and Jailbreak Work: Explain LLM Safety through
Intermediate Hidden States
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li
EMNLP 2024 Findings [CORE A*]
[Paper][Code][Poster]
-
MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models
Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao,
Rongwu
Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang,
Zhan
Shi,
Bailin Wang, Zhijiang Guo, Jiaya Jia
NeurIPS 2024 [CORE A*]
[Paper][Project
Page][Code]
-
Preemptive Answer ``Attacks'' on Chain-of-Thought
Reasoning
Rongwu Xu*, Zehan Qi*, Wei Xu
ACL 2024 Findings [CORE A*]
[Paper][Code][Poster]
-
The Earth is Flat because...: Investigating LLMs' Belief towards
Misinformation via Persuasive Conversation
Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang,
Wei
Xu,
Han Qiu
ACL 2024 Oral [CORE
A*]
🏆 Outstanding Paper Award [Certificate]
[Paper][Project Page][Code][机器之心][Video][Poster]
- Exploring Chinese Humor Generation: A Study on Two-Part
Allegorical Sayings
Rongwu Xu
IJCNN 2024 [CORE B]
[Paper]
- Tempo: Confidentiality Preservation in Cloud-Based Neural
Network Training
Rongwu Xu and Zhixuan Fang
IJCNN 2024 [CORE B]
[Paper]
- LSync: A Universal Timeline-synchronizing Solution for Live Streaming
Fan Dang*, Yifan Xu*, Rongwu Xu, Xinlei Chen, Yunhao Liu
IEEE/ACM Trans. on Networking [JCR Q2]
[Paper]
2023
- MISO:
Legacy-compatible Privacy-preserving Single Sign-on using Trusted Execution Environments
Rongwu Xu, Sen Yang, Fan Zhang, Zhixuan Fang
EuroS&P 2023 [CORE A]
[Paper][Project
Page]
2022
- LSync:
A Universal Event-synchronizing Solution for Live Streaming
Yifan Xu, Fan Dang, Rongwu Xu, Xinlei Chen, Yunhao
Liu
INFOCOM 2022 [CORE A*]
[Paper]
- LifeRec: A Mobile App for Lifelog Recording
and Ubiquitous Recommendation
Jiayu Li, Hantian Zhang*, Zhiyu He*, Rongwu Xu*, Pingfei Wu*, Min Zhang, Yiqun Liu, Shaoping
Ma
CHIIR 2022
[Paper][Code]
* Equal Contribution, ^ Advising Role