Researchers show that turning harmful prompts into poetry lets many large language models ignore safety rules. Across 25 models, poetic prompts raised jailbreak success rates far above non-poetic versions. The study suggests current safety methods can be defeated by simple stylistic changes.
ORIGINAL LINK: https://arxiv.org/abs/2511.15304