Research
I am an artificial intelligence (AI) researcher with pretty interdisciplinary interests. I try to understand how
AI's design and its
interactions with humans
can lead to unexpected behaviors and increased societal risks. I approach this work through the lens of
behavioral experiments, machine learning (ML), interpretability tools, and psychology. I publish my
findings in the Natural Language Processing (NLP), AI, and ML communities.
Currently, I am interested in the following research topics:
- AI Safety and Alignment: Identifying potential safety and ethics risks associated with AI R&D and
developing
strategies to align AI systems with human values, behaviors, and expectations.
- Machine Behavior: Investigating the similarities and differences between AI models and human
behaviors, and utilizling psychology-inspired experiments to test and understand machines.
- AI and Psychology: Studying both the understanding the psychological impacts of AI systems on humans
(Psychology of AI, a subset of Psychology of Technology) and the application of AI in psychological research
(AI for
Psychology, a subset of AI for Science).
My other general interests include model evaluation and real-world applications of such models.
News
- May 2025 Two papers accepted to ACL 2025! Thanks to my collaborators!
- May 2025 Checkout our new review paper on AI awareness! [Paper][Project
Page]
-
Apr 2025 I am attending two AI safety & alignment conferences co-located with ICLR
2025 (Singapore): The Misalignment and Control Workshop (Apr 24th, our new paper on catastrophic risks
and deception of LLM agents will be presented [Paper][Project Page]) and The Singapore Conference on AI
(SCAI) (Apr 26th).
-
Mar 2025 Got accepted to UIUC CS, UW CSE and JHU CS. Grateful to the opportunities!
-
Jan 2025 I am looking for PhD opportunities starting 2025. Don't hesitate to reach out if you think I
can be a good candidate.
-
Oct 2024 Six papers accepted to EMNLP 2024! Thanks to my collaborators!
-
Sep 2024 I received the National Scholarship by the Ministry of Education of China!
-
Aug 2024 My paper "The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation
via
Persuasive Conversation" recieved an Outstanding Paper Award at ACL 2024!
-
Jul 2024 Check out our talk
(Chinese) on knowledge conflicts for (RAG) LLMs! [Paper][Resource][机器之心][Slides]
-
May 2024 Two papers accepted to ACL 2024! Thanks to my collaborators!
-
May 2024 Check out LLMs' safety vulnerabilities discovered by tricking them to believe in
misinformation! [Paper][Resource][机器之心][Video]
-
Apr 2024 I passed the PhD qualification exam (preliminary+oral) at IIIS, Tsinghua!
-
Dec 2023 I recieved the overall execellence scholarship at Tsinghua!
-
Apr 2023 One paper accepted to EuroS&P 2023! Thanks to my collaborators!
-
Dec 2022 Debut of my academic homepage.
-
Aug 2022 Enrolled as a graduate student at IIIS, Tsinghua University.
Selected Publications
-
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents
Rongwu Xu*, Xiaojian Li*, Shuo Chen*, Wei Xu
ACL 2025 Findings
[Paper][Project Page][Code][X Post][Slides][AI Safety China]
-
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu*, Zehan Qi*, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
EMNLP 2024
[Paper][Code][机器之心][Talk
(Chinese)][Slides][Poster][X Post]
-
How Alignment and Jailbreak Work: Explain LLM Safety through
Intermediate Hidden States
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li
EMNLP 2024 Findings
[Paper][Code][Poster]
-
The Earth is Flat because...: Investigating LLMs' Belief towards
Misinformation via Persuasive Conversation
Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei
Xu,
Han Qiu
ACL 2024 Oral
🏆 Outstanding Paper Award [Certificate]
[Paper][Project Page][Code][机器之心][Video][Poster][Slides]
(* equal contribution, † corresponding author)
Awards
- Most Recognized Research Outcomes at Tsinghua University Nomination (清华大学最受师生关注的年度亮点成果提名, Top 2 at
IIIS), 2024
-
Tsinghua University Excellent Teaching Assistant (清华大学优秀助教, Top 2%), 2024
-
National Scholarship (国家奖学金, Top 1%), 2024
-
Tsinghua University Outstanding Student Cadre (清华大学优秀学生干部, Top 1.5%, 121 out of 9000+), 2024
-
ACL 2024 Outstanding Paper Award (Top 0.79%, 35 out of 4407), 2024
-
Tsinghua-Yangtze River Delta International R&D Community Talent
Scholarship, 2023
-
Tsinghua University Overall Excellence Scholarship (Top 10%), 2023
-
Tsinghua University Overall Excellence Scholarship (Top 10%), 2022
-
Tsinghua University Technological Innovation Excellence Scholarship, 2020
-
Tsinghua-Panasonic Scholarship (Top 10%), 2019
-
Outstanding Volunteers in Beijing, 2018
Talks
-
Catastrophic Risks and Deception of LLM Agents [Slides]
- Misalignment and Control Workshop (w. Concordia AI, FAR.AI, etc), Singapore, Apr 2025
-
The Choice of Research (“做科研的选择”)
- Speech as a student
representative at the 2024 IIIS opening ceremony (在2024年院开学典礼上作为在校生代表的发言), IIIS, Tsinghua, Sep 2024
-
Investigating LLMs' Beliefs and Behaviors Under Persuasive Misinformation [Slides]
- Oral report@ACL conference, Bangkok, Thailand, Aug 2024
- Propaganda film, IIIS, Tsinghua, Apr
2024
-
Knowledge Conflicts for (RAG) LLMs [Slides]
- Online
talk, NICE (w. Soochow University), Jul 2024
-
Privacy-preserving Authentication using TEE [Slides]
- Oral report, EuroS&P conference, Delft, The Netherlands, May 2023
Professional Service
- Peer Reviewer, ACL Rolling Review, 2024-Present
Tracks: Ethics, Bias, and Fairness, Human-Centered NLP (2025-), NLP Applications, Resources and
Evaluation
- Peer Reviewer, IEEE Access, 2025