Rongwu Xu 许融武

me

0xrwxu@gmail.com or
xrw22@mails.tsinghua.edu.cn

GitHub | X (Twitter) | LinkedIn
Google Scholar | OpenReview

Research

I am an artificial intelligence (AI) researcher with pretty interdisciplinary interests. I try to understand how AI's design and its interactions with humans can lead to unexpected behaviors and increased societal risks. I approach this work through the lens of behavioral experiments, machine learning (ML), interpretability tools, and psychology. I publish my findings in the Natural Language Processing (NLP), AI, and ML communities.

Currently, I am interested in the following research topics:

  1. AI Safety and Alignment: Identifying potential safety and ethics risks associated with AI R&D and developing strategies to align AI systems with human values, behaviors, and expectations.
  2. Machine Behavior: Investigating the similarities and differences between AI models and human behaviors, and utilizling psychology-inspired experiments to test and understand machines.
  3. AI and Psychology: Studying both the understanding the psychological impacts of AI systems on humans (Psychology of AI, a subset of Psychology of Technology) and the application of AI in psychological research (AI for Psychology, a subset of AI for Science).

My other general interests include model evaluation and real-world applications of such models.

News

  • May 2025 Two papers accepted to ACL 2025! Thanks to my collaborators!
  • May 2025 Checkout our new review paper on AI awareness! [Paper][Project Page]
  • Apr 2025 I am attending two AI safety & alignment conferences co-located with ICLR 2025 (Singapore): The Misalignment and Control Workshop (Apr 24th, our new paper on catastrophic risks and deception of LLM agents will be presented [Paper][Project Page]) and The Singapore Conference on AI (SCAI) (Apr 26th).
  • Mar 2025 Got accepted to UIUC CS, UW CSE and JHU CS. Grateful to the opportunities!
  • Jan 2025 I am looking for PhD opportunities starting 2025. Don't hesitate to reach out if you think I can be a good candidate.
  • Oct 2024 Six papers accepted to EMNLP 2024! Thanks to my collaborators!
  • Sep 2024 I received the National Scholarship by the Ministry of Education of China!
  • Aug 2024 My paper "The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation" recieved an Outstanding Paper Award at ACL 2024!
  • Jul 2024 Check out our talk (Chinese) on knowledge conflicts for (RAG) LLMs! [Paper][Resource][机器之心][Slides]
  • May 2024 Two papers accepted to ACL 2024! Thanks to my collaborators!
  • May 2024 Check out LLMs' safety vulnerabilities discovered by tricking them to believe in misinformation! [Paper][Resource][机器之心][Video]
  • Apr 2024 I passed the PhD qualification exam (preliminary+oral) at IIIS, Tsinghua!
  • Dec 2023 I recieved the overall execellence scholarship at Tsinghua!
  • Apr 2023 One paper accepted to EuroS&P 2023! Thanks to my collaborators!
  • Dec 2022 Debut of my academic homepage.
  • Aug 2022 Enrolled as a graduate student at IIIS, Tsinghua University.

Selected Publications

(* equal contribution, † corresponding author)

Awards

Talks

Professional Service