Code Interpreter - Bud Stack Documentation

Overview

Code Interpreter gives an agent a real, isolated execution environment for Python, JavaScript, and bash. Each prompt version that enables the tool owns its own sandbox; sandboxes provision on first use, persist across turns within the same session, and can be configured for CPU, memory, network egress, and idle expiry. Sandboxes are based on Firecracker microVMs via E2B, so they boot in seconds and keep the host kernel out of reach of model-generated code.

When to Use It

The agent needs to run code — data analysis, transforms, ad-hoc calculations, file parsing.
The task benefits from persistent state across multiple model turns — e.g. loading a dataset once, then asking follow-up questions.
You want deterministic, sandboxed execution of model-generated code without giving it network or filesystem access to your platform.

If the task only needs a one-shot Python eval, a function-calling tool may be simpler. Code Interpreter shines when the model writes and runs multiple cells over a conversation.

Sandbox Lifecycle

A sandbox is provisioned lazily on the first tool call and reused for every subsequent call as long as it stays within its idle window. The sandbox stays warm across turns within the configured idle window, so variables, installed packages, and uploaded files persist for the lifetime of the sandbox.

Configuration

The tool exposes resource, networking, and lifecycle controls on the prompt version.

Field	Type	Default	Description
`languages`	array	`["python", "javascript"]`	Always available — Python and JavaScript ship in every sandbox. Read-only.
`cpu`	integer	`2`	Sandbox vCPU count. One of `2` or `4`.
`ram_gb`	integer	`2`	Sandbox memory in GB. One of `2`, `4`, `8`, `16`.
`custom_template_id`	string	`null`	Bind to an SDK-built custom template. Mutually exclusive with `cpu` / `ram_gb` — the custom template’s own resources and Dockerfile extras apply.
`container_expiry_seconds`	integer	`1200`	Idle timeout before the sandbox is paused. Minimum 300. `null` selects Never expire (auto-pause + auto-resume).
`network_policy`	object	disabled	Egress policy for the sandbox — see below.

Resource Tiers

Pick any combination of cpu ∈ {2, 4} and ram_gb ∈ {2, 4, 8, 16} — eight built-in templates cover that grid, from a 2 vCPU / 2 GB sandbox for quick lookups up to a 4 vCPU / 16 GB sandbox for heavier in-memory work. Pick the smallest tier that fits — larger sandboxes provision the same way but consume more capacity from your cluster pool.

Custom Templates

When the built-in tiers do not include a library your agent needs, build a custom template through the BudAIFoundry SDK. A custom template inherits the platform’s hardened base image (Jupyter + uvicorn + the MCP shim) and appends your own Dockerfile instructions on top.

from budaifoundry import Client

client = Client(api_key="...", project_id="...")

client.templates.create(
    name="my-data-template",
    commands=[
        "RUN pip install pandas scikit-learn",
        "RUN apt-get update && apt-get install -y ffmpeg",
    ],
)

Templates are project-scoped — visible only inside the project that built them — and built asynchronously. Once the template reaches ready state, bind it to a prompt version by setting custom_template_id to the template’s name. A few rules apply to the commands list:

FROM, CMD, ENTRYPOINT, COPY, and ADD are rejected. The base image and its systemd-managed Jupyter + MCP services must remain intact.
Standard RUN, ENV, WORKDIR, USER, etc. work as expected.
CPU and memory tuning lives on the template itself, not on the prompt version, when a custom template is bound.

Builds run in a Dapr workflow that surfaces pending → building → ready (or failed, with an error message you can inspect through the SDK).

Network Policy

By default the sandbox cannot reach the network at all. Open egress only when the workload genuinely needs it — most data-analysis tasks do not.

Type	Behaviour
`disabled`	Block all egress. Default.
`open`	Allow all egress.
`filtered`	Apply `allow_out` and `deny_out` lists.

In filtered mode you can list IPs, CIDR ranges, exact domains, or wildcard domains (*.example.com). The special sentinel value ALL_TRAFFIC matches everything in that direction — combine it with deny_out for a deny-all baseline, or with allow_out for a permissive baseline that you then narrow with deny_out. Allow rules take precedence over deny rules within filtered mode.

Operational Considerations

Capacity — every prompt version that has the tool enabled may provision a sandbox; size your E2B (or self-hosted equivalent) capacity for the steady-state active sessions.
Idle policy — short container_expiry_seconds reclaims capacity faster but adds cold-start latency for the next call. Never expire keeps the kernel warm via auto-pause/auto-resume — recommended for power-user agents, costlier at idle.
Security boundary — model-generated code runs in a Firecracker microVM with no access to your platform’s filesystem, network policies enforced at the sandbox boundary, and no persistent storage beyond the sandbox’s own lifetime.
Data flow — files uploaded into the sandbox stay there until the sandbox is destroyed. Treat the sandbox as ephemeral; persist anything important by streaming it back through the tool’s results.
Audit — every code-interpreter call is recorded in the platform’s observability pipeline alongside the model invocation that produced it.

Next Steps

Native Tools Overview

How native tools differ from MCP connectors

Python SDK

Build custom templates and bind them programmatically

Web Fetch

Pair with Web Fetch for code that consumes external pages

​Overview

​When to Use It

​Sandbox Lifecycle

​Configuration

​Resource Tiers

​Custom Templates

​Network Policy

​Operational Considerations

​Next Steps