The paper analyzes trust between stages in LLM and agent toolchains. If intermediate representations are accepted without verification, models may treat structure and format as implicit instructions, even when no explicit imperative appears. I document 41 mechanism level failure modes.
Scope
Selected findings
Mitigations (paper §10)
Limitations
Click to Open Code Editor