"defending tool-enabled agents requires reasoning over entire action sequences and their cumulative effects, rather than evaluating isolated prompts or responses." [entire action sequences and their cumulative effects]
The article's defense evaluation (Table 4) shows that even the best reasoning-based defense prompt degrades sharply over multiple turns (ASR increases from 58.6% to 86.7% by turn T+2), indicating that prompt-based mitigations are inherently limited. This suggests future defenses will require architectural changes—such as model-level modifications, runtime monitoring of action sequences, or constraint-based execution frameworks—rather than prompt engineering alone.