AI Safety Prompts Abused to Trigger Remote Code Execution

Researchers demonstrated how AI safety approval prompts can be manipulated to trigger remote code execution.

Written By
thumbnail
Ken Underhill
Ken Underhill
Dec 22, 2025
eSecurity Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Researchers have demonstrated a new way attackers can turn one of AI’s most trusted safety mechanisms into a delivery system for malicious code. 

By manipulating human-in-the-loop (HITL) approval dialogs, attackers can trick users into authorizing actions that result in arbitrary code execution — without realizing anything is wrong.

The attack “… can deceive users into approving a remote code execution attack originating from indirect prompt injections,” said Checkmarx researchers.

The Hidden Risks of AI Approval Workflows

Human-in-the-loop (HITL) controls are recommended as a defense against prompt injection and excessive AI autonomy, particularly for agentic systems such as code assistants that can execute operating system commands. 

Many organizations rely on these approval dialogs as a last line of defense, assuming that user confirmation prevents catastrophic outcomes.

The Lies-in-the-Loop (LITL) attack undermines that assumption. It shows that attackers don’t need to bypass HITL safeguards — they can simply manipulate what the user sees. 

The technique affects developer tools, AI code assistants, and other privileged agents operating in environments like VS Code terminals and chat-based IDE extensions. 

Advertisement

Inside the HITL Dialog Forging Attack

At a high level, the LITL attack exploits indirect prompt injection to poison the agent’s context. The attacker supplies malicious instructions that the AI later embeds into an HITL dialog shown to the user. 

While the underlying command is harmful, the dialog is crafted to appear benign, encouraging approval.

Several techniques make this deception more effective. 

One approach uses padding, where attackers append or prepend large amounts of harmless-looking text to push the malicious payload out of the visible area of the dialog. 

Even scrolling may reveal only innocuous content, lowering suspicion.

Another vector involves metadata tampering. Some agents display a short description summarizing what the command will do. 

Researchers showed that this descriptive line can also be manipulated, causing the UI to claim the agent is performing a safe action while executing something entirely different.

The most concerning method is Markdown injection. Many HITL dialogs are rendered using Markdown or HTML. 

If that content is not properly sanitized, attackers can break formatting boundaries, hide malicious commands, or inject fake UI elements. 

In testing, Microsoft Copilot Chat was shown to improperly sanitize Markdown, allowing injected content to render in ways that could plausibly deceive users under the right conditions.

While proof-of-concept demonstrations only launched benign programs like calc.exe, researchers emphasized that the same technique could be used for far more destructive actions.

Advertisement

Reducing Risk From AI Approval Abuse

Because LITL attacks rely heavily on user trust, mitigation requires both technical controls and human awareness. Organizations using agentic AI tools should:

  • Educate users that HITL dialogs can be manipulated and train them to critically review dialog content, formatting, and visual boundaries before approving actions.
  • Prefer AI tools with well-designed, structured UIs and minimize reliance on terminal-based interfaces where malicious content can be hidden more easily.
  • Limit agent privileges using least-privilege and zero-trust, ensuring sensitive actions require additional controls beyond in-context HITL approval.
  • Enforce command validation controls such as allowlists, policy checks, or separation between command construction and execution to prevent unsafe operations.
  • Monitor and audit agent behavior by logging HITL dialog content, approval decisions, and executed actions to detect abuse and support forensic analysis.
  • Add layered approval and integrity safeguards for high-risk actions, including out-of-band confirmation, dialog consistency checks, and restricted context inputs.

While no single control fully eliminates the risk, a layered approach that combines user awareness with technical safeguards can meaningfully improve resilience.

Advertisement

When Trust Becomes the Attack Surface

Lies-in-the-Loop attacks reflect a broader reality in modern security: mechanisms designed to enforce trust are increasingly becoming attack surfaces themselves. 

As AI agents gain greater autonomy and deeper access to systems, attackers are shifting away from directly breaking technical controls. 

Instead, they are focusing on manipulating human judgment and approval workflows, where a single trusted decision can authorize far-reaching actions.

As trust itself becomes a point of exploitation, organizations are increasingly turning to zero-trust principles that eliminate assumptions of default trust.

thumbnail
Ken Underhill

Ken Underhill is an award-winning cybersecurity professional, bestselling author, and seasoned IT professional. He holds a graduate degree in cybersecurity and information assurance from Western Governors University and brings years of hands-on experience to the field.

Recommended for you...

AI Agent Safety Checklist
Girish Redekar
Mar 12, 2026
Fake OpenClaw npm Package Installs GhostClaw Malware
Ken Underhill
Mar 10, 2026
Fake Claude Code Install Pages Spread Infostealer Malware
Ken Underhill
Mar 10, 2026
CyberProof 2026 Report Warns of Rising Identity and AI Cyberattacks
Ken Underhill
Mar 6, 2026
eSecurity Planet Logo

eSecurity Planet is a leading resource for IT professionals at large enterprises who are actively researching cybersecurity vendors and latest trends. eSecurity Planet focuses on providing instruction for how to approach common security challenges, as well as informational deep-dives about advanced cybersecurity topics.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.