My publication list can also be found at DBLP and Google Scholar, however, they may not be up to date.
2025
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang; EMNLP 2025Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma, Michael Backes, Savvas Zannettou, Yang Zhang; ICCV 2025UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang; CCS 2025Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities
Yiting Qu, Michael Backes, Yang Zhang; USENIX Security 2025SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark
Rui Wen, Yiyong Liu, Michael Backes, Yang Zhang; USENIX Security 2025HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen, Yixin Wu, Yiting Qu, Michael Backes, Savvas Zannettou, Yang Zhang; USENIX Security 2025From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models
Yihan Ma, Xinyue Shen, Yiting Qu, Ning Yu, Michael Backes, Savvas Zannettou, Yang Zhang; USENIX Security 2025On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Yixin Wu, Ning Yu, Michael Backes, Yun Shen, Yang Zhang; USENIX Security 2025Synthetic Artifact Auditing: Tracing LLM-Generated Synthetic Data Usage in Downstream Applications
Yixin Wu, Ziqing Yang, Yun Shen, Michael Backes, Yang Zhang; USENIX Security 2025Data-Free Model-Related Attacks: Unleashing the Potential of Generative AI
Dayong Ye, Tianqing Zhu, Shang Wang, Bo Liu, Leo Yu Zhang, Wanlei Zhou, Yang Zhang; USENIX Security 2025Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning
Dayong Ye, Tianqing Zhu, Jiayang Li, Kun Gao, Bo Liu, Leo Yu Zhang, Wanlei Zhou, Yang Zhang; USENIX Security 2025Enhanced Label-Only Membership Inference Attacks with Fewer Queries
Hao Li, Zheng Li, Siyuan Wu, Yutong Ye, Min Zhang, Dengguo Feng, Yang Zhang; USENIX Security 2025Membership Inference Attacks Against Vision-Language Models
Yuke Hu, Zheng Li, Zhihao Liu, Yang Zhang, Zhan Qin, Kui Ren, Chun Chen; USENIX Security 2025Generated Data with Fake Privacy: Hidden Dangers of Finetuning Large Language Models on Generated Data
Atilla Akkus, Masoud Poorghaffar Aghdam, Mingjie Li, Junjie Chu, Michael Backes, Yang Zhang, Sinem Sav; USENIX Security 2025On the Generalization Ability of Machine-Generated Text Detectors
Yule Liu, Zhiyuan Zhong, Yifan Liao, Zhen Sun, Jingyi Zheng, Jiaheng Wei, Qingyuan Gong, Fenghua Tong, Yang Chen, Yang Zhang, Xinlei He; KDD 2025JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang; ACL 2025 (oral)When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang; ACL 2025Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media
Zhen Sun, Zongmin Zhang, Xinyue Shen, Ziyi Zhang, Yule Liu, Michael Backes, Yang Zhang, Xinlei He; ACL 2025White-box Membership Inference Attacks against Diffusion Models
Yan Pang, Tianhao Wang, Xuhui Kang, Mengdi Huai, Yang Zhang; PoPETS 2025A Comprehensive Study of Privacy Risks in Curriculum Learning
Joann Qiongna Chen, Xinlei He, Zheng Li, Yang Zhang, Zhou Li; PoPETS 2025The Ripple Effect: On Unforeseen Complications of Backdoor Attacks
Rui Zhang, Yun Shen, Hongwei Li, Wenbo Jiang, Hanxiao Chen, Yuan Zhang, Guowen Xu, Yang Zhang; ICML 2025Neeko: Model Hijacking Attacks Against Generative Adversarial Networks
Junjie Chu, Yugeng Liu, Xinlei He, Michael Backes, Yang Zhang, Ahmed Salem, Yang Zhang; ICME 2025GPTracker: A Large-Scale Measurement of Misused GPTs
Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang; S&P 2025On the Effectiveness of Prompt Stealing Attacks on In-The-Wild Prompts
Yicong Tan, Xinyue Shen, Yun Shen, Michael Backes, Yang Zhang; S&P 2025SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li and Wai Man Si and Michael Backes and Yang Zhang and Yisen Wang; ICLR 2025Towards Understanding Unsafe Video Generation
Yan Pang, Aiping Xiong, Yang Zhang, Tianhao Wang; NDSS 2025Understanding Data Importance in Machine Learning Attacks: Does Valuable Data Pose Greater Harm?
Rui Wen, Michael Backes, Yang Zhang; NDSS 20252024
The Death and Life of Great Prompts: Analyzing the Evolution of LLM Prompts from the Structural Perspective
Yihan Ma, Xinyue Shen, Yixin Wu, Boyang Zhang, Michael Backes, Yang Zhang; EMNLP 2024ModScan: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities
Yukun Jiang, Zheng Li, Xinyue Shen, Yugeng Liu, Michael Backes, Yang Zhang; EMNLP 2024Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models
Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang; EMNLP 2024Membership Inference Attacks Against In-Context Learning
Rui Wen, Zheng Li, Michael Backes, Yang Zhang; CCS 2024Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution
Yixin Wu, Yun Shen, Michael Backes, Yang Zhang; CCS 2024BadMerging: Backdoor Attacks Against Model Merging
Jinghuai Zhang, Jianfeng Chi, Zheng Li, Kunlin Cai, Yang Zhang, Yuan Tian; CCS 2024ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models
Zeyang Sha, Yicong Tan, Mingjie Li, Michael Backes, Yang Zhang; CCS 2024SeqMIA: Sequential-Metric Based Membership Inference Attack
Hao Li, Zheng Li, Siyuan Wu, Chengrui Hu, Yutong Ye, Min Zhang, Dengguo Feng, Yang Zhang; CCS 2024MGTBench: Benchmarking Machine-Generated Text Detection
Xinlei He, Xinyue Shen, Zeyuan Chen, Michael Backes, Yang Zhang; CCS 2024"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang; CCS 2024pdf arxiv code dataset Media Coverage: New Scientist, Deutschlandfunk Nova
Instruction Backdoor Attacks Against Cutomized LLMs
Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, Yang Zhang; USENIX Security 2024Prompt Stealing Attacks Against Text-to-Image Generation Models
Xinyue Shen, Yiting Qu, Michael Backes, Yang Zhang; USENIX Security 2024SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models
Boyang Zhang, Zheng Li, Ziqing Yang, Xinlei He, Michael Backes, Mario Fritz, Yang Zhang; USENIX Security 2024Quantifying Privacy Risks of Prompts in Visual Prompt Learning
Yixin Wu, Rui Wen, Michael Backes, Pascal Berrang, Mathias Humbert, Yun Shen, Yang Zhang; USENIX Security 2024Link Stealing Attacks Against Inductive Graph Neural Networks
Yixin Wu, Xinlei He, Pascal Berrang, Mathias Humbert, Michael Backes, Neil Zhenqiang Gong, Yang Zhang; PoPETS 2024Composite Backdoor Attacks Against Large Language Models
Hai Huang, Zhengyu Zhao, Michael Backes, Yun Shen, Yang Zhang; NAACL Findings 2024Games and Beyond: Analyzing the Bullet Chats of Esports Livestreaming
Yukun Jiang, Xinyue Shen, Rui Wen, Zeyang Sha, Junjie Chu, Yugeng Liu, Michael Backes, Yang Zhang; ICWSM 2024FAKEPCD: Fake Point Cloud Detection via Source Attribution
Yiting Qu, Zhikun Zhang, Yun Shen, Michael Backes, Yang Zhang; ASIACCS 2024You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content
Xinlei He, Savvas Zannettou, Yun Shen, Yang Zhang; S&P 2024Test-Time Poisoning Attacks Against Test-Time Adaptation Models
Tianshuo Cong, Xinlei He, Yun Shen, Yang Zhang; S&P 2024Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models
Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang; WACV 2024VERITRAIN: Validating MLaaS Training Efforts via Anomaly Detection
Xiaokuan Zhang, Yang Zhang, Yinqian Zhang; IEEE Transactions on Dependable and Secure Computing2023
DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models
Zeyang Sha, Zheng Li, Ning Yu, Yang Zhang; CCS 2023Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang Zhang; CCS 2023Differentially Private Resource Allocation
Joann Qiongna Chen, Tianhao Wang, Zhikun Zhang, Yang Zhang, Somesh Jha, Zhou Li; ACSAC 2023A Plot is Worth a Thousand Words: Model Information Stealing Attacks via Scientific Plots
Boyang Zhang, Xinlei He, Yun Shen, Tianhao Wang, Yang Zhang; USENIX Security 2023Two-in-One: A Model Hijacking Attack Against Text Generation Models
Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem; USENIX Security 2023UnGANable: Defending Against GAN-based Face Manipulation
Zheng Li, Ning Yu, Ahmed Salem, Michael Backes, Mario Fritz, Yang Zhang; USENIX Security 2023pdf arxiv code Media Coverage: Mimikama, it-sicherheit.de, SOLARIFY, elektroniknet.de, Digitale Schweiz, innovations report
FACE-AUDITOR: Data Auditing in Facial Recognition Systems
Min Chen, Zhikun Zhang, Michael Backes, Tianhao Wang, Yang Zhang; USENIX Security 2023PrivTrace: Differentially Private Trajectory Synthesis by Adaptive Markov Model
Haiming Wang, Zhikun Zhang, Tianhao Wang, Shibo He, Michael Backes, Jiming Chen, Yang Zhang; USENIX Security 2023Generated Graph Detection
Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, Yang Zhang; ICML 2023Data Poisoning Attacks Against Multimodal Encoders
Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, Yang Zhang; ICML 2023NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models
Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma; ACL 2023Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders
Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang; CVPR 2023On the Evolution of (Hateful) Memes by Means of Multimodal Contrastive Learning
Yiting Qu, Xinlei He, Shannon Pierson, Michael Backes, Yang Zhang, Savvas Zannettou; S&P 2023Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning?
Rui Wen, Zhengyu Zhao, Zhuoran Liu, Michael Backes, Tianhao Wang, Yang Zhang; ICLR 2023 (spotlight)Backdoor Attacks Against Dataset Distillation
Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang; NDSS 2023Pseudo Label-Guided Model Inversion Attack via Conditional Generative Adversarial Network
Xiaojian Yuan, Kejiang Chen, Jie Zhang, Weiming Zhang, Nenghai Yu, Yang Zhang; AAAI 20232022
Amplifying Membership Exposure via Data Poisoning
Yufei Chen, Chao Shen, Yun Shen, Cong Wang, Yang Zhang; NeurIPS 2022Why So Toxic?: Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang; CCS 2022pdf arxiv Media Coverage: Fast Company
Best Paper Award Honorable Mention
On the Privacy Risks of Cell-Based NAS Architectures
Hai Huang, Zhikun Zhang, Yun Shen, Michael Backes, Qi Li, Yang Zhang; CCS 2022Membership Inference Attacks by Exploiting Loss Trajectory
Yiyong Liu, Zhengyu Zhao, Michael Backes, Yang Zhang; CCS 2022Auditing Membership Leakages of Multi-Exit Networks
Zheng Li, Yiyong Liu, Xinlei He, Ning Yu, Michael Backes, Yang Zhang; CCS 2022Graph Unlearning
Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, Yang Zhang; CCS 2022SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders
Tianshuo Cong, Xinlei He, Yang Zhang; CCS 2022Finding MNEMON: Reviving Memories of Node Embeddings
Yun Shen, Yufei Han, Zhikun Zhang, Min Chen, Ting Yu, Michael Backes, Yang Zhang, Gianluca Stringhini; CCS 2022Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning
Xinlei He, Hongbin Liu, Neil Zhenqiang Gong, Yang Zhang; ECCV 2022Teacher Model Fingerprinting Attacks Against Transfer Learning
Yufei Chen, Chao Shen, Cong Wang, Yang Zhang; USENIX Security 2022ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models
Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang; USENIX Security 2022Inference Attacks Against Graph Neural Networks
Zhikun Zhang, Min Chen, Michael Backes, Yun Shen, Yang Zhang; USENIX Security 2022On Xing Tian and the Perseverance of Anti-China Sentiment Online
Xinyue Shen, Xinlei He, Michael Backes, Jeremy Blackburn, Savvas Zannettou, Yang Zhang; ICWSM 2022Model Stealing Attacks Against Inductive Graph Neural Networks
Yun Shen, Xinlei He, Yufei Han, Yang Zhang; S&P 2022Get a Model! Model Hijacking Attack Against Machine Learning Models
Ahmed Salem, Michael Backes, Yang Zhang; NDSS 2022Property Inference Attacks Against GANs
Junhao Zhou, Yufei Chen, Chao Shen, Yang Zhang; NDSS 2022Dynamic Backdoor Attacks Against Machine Learning Models
Ahmed Salem, Rui Wen, Michael Backes, Shiqing Ma, Yang Zhang; EuroS&P 2022FairSR: Fairness-aware Sequential Recommendation through Multi-Task Learning with Preference Graph Embeddings
Cheng-Te Li, Cheng Hsu, Yang Zhang; ACM Transactions on Intelligent Systems and Technology2021
Quantifying and Mitigating Privacy Risks of Contrastive Learning
Xinlei He, Yang Zhang; CCS 2021When Machine Unlearning Jeopardizes Privacy
Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, Yang Zhang; CCS 2021Membership Inference Attacks Against Recommender Systems
Minxing Zhang, Zhaochun Ren, Zihan Wang, Pengjie Ren, Zhumin Chen, Pengfei Hu, Yang Zhang; CCS 2021Membership Leakage in Label-Only Exposures
Zheng Li, Yang Zhang; CCS 2021BadNL: Backdoor Attacks Against NLP Models with Semantic-preserving Improvements
Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, Yang Zhang; ACSAC 2021Stealing Links from Graph Neural Networks
Xinlei He, Jinyuan Jia, Michael Backes, Neil Zhenqiang Gong, Yang Zhang; USENIX Security 2021PrivSyn: Differentially Private Data Synthesis
Zhikun Zhang, Tianhao Wang, Jean Honorio, Ninghui Li, Michael Backes, Shibo He, Jiming Chen, Yang Zhang; USENIX Security 2021“Go eat a bat, Chang!”: On the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19
Fatemeh Tahmasbi, Leonard Schild, Chen Ling, Jeremy Blackburn, Gianluca Stringhini, Yang Zhang, Savvas Zannettou; WWW 2021pdf arxiv Media Coverage: The Washington Post