I am a tenured faculty (equivalent to full professor) at CISPA Helmholtz Center for Information Security. I sometimes also chime in iDRAMA Lab for the memes.

Research Areas

  • Trustworthy Machine Learning, with a focus on LLMs (Safety, Privacy, and Security)
  • Misinformation, Hate Speech, and Memes
  • Social Network Analysis

I’m always looking for motivated students and postdocs to join my group. If you are interested, please write me an email (zhang@cispa.de).

Awards

  • Best paper finalist at CSAW Europe 2023
  • Best paper award honorable mention at CCS 2022
  • Busy Beaver teaching award nomination for seminar “Privacy of Machine Learning” at Saarland University (2022 Winter)
  • Busy Beaver teaching award nomination for advanced lecture “Machine Learning Privacy” at Saarland University (2022 Summer)
  • Busy Beaver teaching award for seminar “Privacy of Machine Learning” at Saarland University (2021 Winter)
  • Distinguished reviewer award at TrustML Workshop 2020 (co-located with ICLR 2020)
  • Distinguished paper award at NDSS 2019
  • Best paper award at ARES 2014

What’s New

  • [5/2024] We released SecurityNet, a large-scale dataset containing more than 1000 models for evaluating attacks and defenses in the field of trustworthy machine learning!
  • [4/2024] I’ll join the PC of NDSS 2025!
  • [4/2024] One paper titled ““Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models” got accepted in CCS 2024!
  • [3/2024] One paper titled “Games and Beyond: Analyzing the Bullet Chats of Esports Livestreaming” got accepted in ICWSM 2024!
  • [3/2024] One paper titled “Composite Backdoor Attacks Against Large Language Models” got accepted in NAACL Findings 2024!
  • [2/2024] We established TrustAIRLab, a GitHub organization that includes many (upgraded versions of) codes published by my lab; please take a look!
  • [2/2024] One paper titled “Prompt Stealing Attacks Against Text-to-Image Generation Models” got accepted in USENIX Security 2024!
  • [12/2023] I’ll join the PC of CCS 2024!
  • [10/2023] Our paper “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots” is in the best paper finalist of CSAW Europe 2023!
  • [10/2023] I’ll be the publicity chair of CCS 2024 #haha!
  • [10/2023] Mingjie Li, Zeyuan Chen, and Qingqing Dong joined the team!
  • [10/2023] Zheng Li has successfully passed his Ph.D. defense! Congratulations, Dr. Li!
  • [9/2023] Our research on Jailbreak Prompts got covered by New Scientist!
  • [9/2023] One paper titled “SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models” got accepted in USENIX Security 2024!
  • [9/2023] One paper titled “Quantifying Privacy Risks of Prompts in Visual Prompt Learning” got accepted in USENIX Security 2024!
  • [8/2023] Xinlei He has successfully passed his Ph.D. defense! Congratulations, Dr. He!
  • [7/2023] One paper titled “You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content” got accepted in Oakland 2024!
  • [7/2023] One paper titled “Test-Time Poisoning Attacks Against Test-Time Adaptation Models” got accepted in Oakland 2024!
  • [7/2023] I got invited to give keynote speech at ISC 2023 and ACISP 2024!
  • [5/2023] One paper titled “DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models” got accepted in CCS 2023!
  • [5/2023] One paper titled “Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models” got accepted in CCS 2023!
  • [5/2023] One paper titled “NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models” got accepted in ACL 2023!
  • [4/2023] One paper titled “Generated Graph Detection” got accepted in ICML 2023!
  • [4/2023] One paper titled “Data Poisoning Attacks Against Multimodal Encoders” got accepted in ICML 2023!
  • [4/2023] One paper titled “Two-in-One: A Model Hijacking Attack Against Text Generation Models” got accepted in USENIX Security 2023!
  • [4/2023] We released a new technical report In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT on the trustworthiness of ChatGPT!
  • [3/2023] We released MGTBench, a benchmark for the current machine-generated text (by ChatGPT) detection methods.
  • [3/2023] One paper titled “FACE-AUDITOR: Data Auditing in Facial Recognition Systems” got accepted in USENIX Security 2023!
  • [3/2023] I join the editorial board of ACM TOPS!
  • [3/2023] We released MLHospital, a python package to evaluate machine learning models’ security and privacy risks. MLHospital is under continual development, and we welcome contributors!
  • [3/2023] I successfully passed my tenure-track evaluation and become a tenured faculty at CISPA!