In their latest study, 'Universal and Transferable Attacks on Aligned Language Models,' Computer Science Department faculty members Matt Fredrikson and Zico Kolter, CSD Ph.D. student Andy Zou, and ECE alum Zifan Wang found a suffix that, when attached to a wide range of queries, significantly increases the likelihood that both open- and closed-source LLMs will produce affirmative responses to queries that they would otherwise refuse.
https://csd.cmu.edu/news/csd-researchers-discover-vulnerability-in-large-language-models