Research
I am an artificial intelligence (AI) researcher with pretty interdisciplinary interests. I try to understand how
AI's design and its
interactions with humans
can lead to unexpected behaviors and increased societal risks. I approach this work through the lens of
behavioral experiments, machine learning (ML), interpretability tools, and psychology. I publish my
findings in the Natural Language Processing (NLP), AI, and ML communities.
Currently, I am interested in the following research topics:
- AI Safety and Alignment: Identifying potential safety and ethics risks associated with AI R&D and
developing
strategies to align AI systems with human values, behaviors, and expectations.
- Machine Behavior: Investigating the similarities and differences between AI models and human
behaviors, and utilizling psychology-inspired experiments to test and understand machines.
- AI and Psychology: Studying both the understanding the psychological impacts of AI systems on humans
(Psychology of AI, a subset of Psychology of Technology) and the application of AI in psychological research
(AI for
Psychology, a subset of AI for Science).
My other general interests include model evaluation and real-world applications of such models.
News
AI/NLP research can be challenging for newcomers. If you're interested in my work or have ideas to explore, I'd
be happy to guide you. We can work on submitting papers to top venues.
Feel free to drop me an Email if interested.
-
Jan 2025 I am looking for PhD opportunities starting 2025. Don't hesitate to reach out if you think I
can be a good candidate. Mar 2025 Got accepted to UIUC CS, UW CSE and JHU CS. Grateful to the
opportunities!
-
Oct 2024 Six papers accepted to EMNLP 2024! Thanks to my collaborators!
-
Sep 2024 I received the National Scholarship by the Ministry of Education of China!
-
Aug 2024 My paper "The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation
via
Persuasive Conversation" recieved an Outstanding Paper Award at ACL 2024!
-
Jul 2024 Check out our talk
(Chinese) on knowledge conflicts for (RAG) LLMs! [Paper][Resource][机器之心][Slides]
-
May 2024 Two papers accepted to ACL 2024! Thanks to my collaborators!
-
May 2024 Check out LLMs' safety vulnerabilities discovered by tricking them to believe in
misinformation! [Paper][Resource][机器之心][Video]
-
Apr 2024 I passed the PhD qualification exam (preliminary+oral) at IIIS, Tsinghua!
-
Dec 2023 I recieved the overall execellence scholarship at Tsinghua!
-
Apr 2023 One paper accepted to EuroS&P 2023! Thanks to my collaborators!
-
Dec 2022 Debut of my academic homepage.
-
Aug 2022 Enrolled as a graduate student at IIIS, Tsinghua University.
Selected Publications
-
"Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents
Rongwu Xu*, Xiaojian Li*, Shuo Chen*, Wei Xu
Working Paper
[Paper][Project Page][Code][X Post]
-
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu*, Zehan Qi*, Zhijiang Guo, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
EMNLP 2024 [CORE A*]
[Paper][Code][机器之心][Talk
(Chinese)][Slides][Poster][X Post]
-
How Alignment and Jailbreak Work: Explain LLM Safety through
Intermediate Hidden States
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li
EMNLP 2024 Findings [CORE A*]
[Paper][Code][Poster]
-
The Earth is Flat because...: Investigating LLMs' Belief towards
Misinformation via Persuasive Conversation
Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei
Xu,
Han Qiu
ACL 2024 Oral [CORE A*]
🏆 Outstanding Paper Award [Certificate]
[Paper][Project Page][Code][机器之心][Video][Poster]
* Equal Contribution, ^ Advising Role
Awards
- Most Recognized Research Outcomes at Tsinghua University Nomination (清华大学最受师生关注的年度亮点成果提名, Top 2 at
IIIS), 2024
-
Tsinghua University Excellent Teaching Assistant (清华大学优秀助教, Top 2%), 2024
-
National Scholarship (国家奖学金, Top 1%), 2024
-
Tsinghua University Outstanding Student Cadre (清华大学优秀学生干部, Top 1.5%, 121 out of 9000+), 2024
-
ACL 2024 Outstanding Paper Award (Top 0.79%, 35 out of 4407), 2024
-
Tsinghua-Yangtze River Delta International R&D Community Talent
Scholarship, 2023
-
Tsinghua University Overall Excellence Scholarship (Top 10%), 2023
-
Tsinghua University Overall Excellence Scholarship (Top 10%), 2022
-
Tsinghua University Technological Innovation Excellence Scholarship, 2020
-
Tsinghua-Panasonic Scholarship (Top 10%), 2019
-
Outstanding Volunteers in Beijing, 2018
Talks
Professional Service
- Peer Reviewer, ACL Rolling Review, 2024-Present
Tracks: Ethics, Bias, and Fairness, Human-Centered NLP (2025-), NLP Applications, Resources and
Evaluation