Thinking Outside the Box: New Attack Surfaces in Sandboxed AI Agents
Discover how sandboxed AI agents remain vulnerable to AI-native attacks, enabling data exfiltration and configuration poisoning despite strict policies.
Full article excerpt tap to expand
Back to all postsThinking Outside The Box: New Attack Surfaces in Sandboxed AI AgentsNoy Pearl April 23, 2026 9min read On this pageThis is a h2 This is a h3This is a h4The rapid adoption of always-on autonomous agents projects like OpenClaw has triggered a parallel arms race in the security industry. As these agents gain the ability to write code, access personal files, and operate indefinitely, the immediate reflex has been to containerize them.The theoretical goal of an AI sandbox is straightforward: create a bidirectional shield. It must protect the host infrastructure from the sandbox, prevent sensitive data from leaking out, and block outside attackers from penetrating the environment. However, as our recent research into NVIDIA's NemoClaw and OpenShell stack demonstrates, simply placing an agent in a locked-down container does not neutralize AI-native attacks.There’s a fundamental requirement that any useful AI agent needs access to the outside world to utilize basic tools. This is exactly what we exploited, demonstrating that even with sandboxing in place, this introduces an inherent attack surface. This is why we argue that sandboxing alone is not a sufficient defense, when it comes to AI agents. This article will detail the nature of this vulnerability and present our approach to taking advantage of it.But first - let’s understand what exactly NemoClaw is.Illustration of NemoClaw in a nutshellThe Baseline: About NemoClaw & OpenShell Nvidia describes NemoClaw as a “reference stack that simplifies running [OpenClaw] assistants more safely”. It manages the AI agent and uses NVIDIA’s OpenShell - a runtime that acts as a kind of a gateway. OpenShell works with policies that you can change in order to modify the permissions without actually changing the NemoClaw sandbox itself.Looking at the architecture, OpenShell provides robust, kernel-level isolation. It runs a lightweight Kubernetes (K3s) cluster inside a privileged Docker container, spinning up isolated pods for the sandbox. The following figure depicts the architecture:The ambition is that users will set up their sandboxes as they wish and run their AI agents without needing to worry about security (said no one ever).The security boundaries are enforced by declarative YAML policies (Egress policies) that affect what the agent can see and do - Filesystem restriction, limited capabilities, gateway process isolation and binary-scoped rules. Every domain is mapped to specific binaries - e.g. if we want to use curl command to github.com - we have to specifically enable the curl binary. There’s a default policy for the sandbox and the user can preconfigure a sandbox with custom policies or set/change policies to NemoClaw OpenShell in runtime via the OpenShell cli: openshell policy set NEMOCLAW_SANDBOX_NAME POLICY.yml For example - the default configuration of the sandbox’s gateway enables both gh and git binaries for the github.com and api.github.com domains like that: github: name: github endpoints: - host: github.com port: 443 access: full - host: api.github.com port: 443 access: full binaries: - path: /usr/bin/gh - path: /usr/bin/git And it works as following:At the time of our research, NemoClaw was still in an early alpha version and exclusively supported OpenClaw. In fact, the software was so new that we had to manually allowlist the api.openai.com domain in the configuration just so we could use OpenClaw with our own OpenAI API key.In theory, this is an…
This excerpt is published under fair use for community discussion. Read the full article at Lasso.