A practical setup guide for agent-browser, the open-source Rust CLI for AI agent browser automation from Vercel Labs. Replaces Playwright MCP with a native daemon that uses 93% less tokens for the same browser actions. Apache 2.0 licensed.
The problem nobody talks about
Most AI agent browser setups burn massive amounts of context on every action. The agent fires up Playwright through MCP, the MCP wraps a Node.js process, the Node process talks to Chrome via Playwright's protocol, and every "click this button" round-trip costs thousands of tokens just for the protocol overhead. Add the page snapshot for the agent to "see," and a single browse-and-click can chew through 5,000 tokens.
For agents that browse a lot (research, scraping, QA), the protocol tax adds up fast. You hit context limits before you finish the task. You pay for tokens that did no work.
agent-browser collapses the cycle. Native Rust binary, daemon architecture, accessibility-tree snapshots instead of full-page screenshots, semantic element refs instead of CSS selectors. Same browser actions, 93% fewer tokens.
Why this changes everything
Native Rust, no Node.js. No interpreter overhead, no package install dance, no version conflicts with the host project. One binary, statically compiled.
Daemon architecture.Starts once, persists between commands. The agent doesn't pay startup cost on every action.
Semantic element refs. Instead of brittle CSS selectors (.btn-primary[data-id="checkout-3"]), the daemon hands the agent stable refs (@e1, @e2) from the accessibility tree. The refs survive page redesigns and only break when the underlying element actually changes.
CDP, not Playwright. Direct Chrome DevTools Protocol. No translation layer. Faster, fewer dependencies, lower token overhead.
Step 1: install prerequisites
You need:
- Chrome (auto-downloaded by agent-browser, or use your existing install)
- Node.js (only if you're installing via npm; not required for cargo or brew installs)
That's it. No Playwright. No Puppeteer. No separate Node daemon.
Step 2: install agent-browser
The fastest path is via npm. Two commands.
npm install -g agent-browser
agent-browser installThe first command installs the CLI. The second downloads Chrome for Testing if needed and sets up the daemon.
Homebrew (Mac):
brew install vercel-labs/tap/agent-browser
agent-browser installCargo (any Rust toolchain):
cargo install agent-browser
agent-browser installStep 3: run your first browser session
Start a session:
agent-browser startThe daemon spins up, Chrome launches, you get an interactive prompt. Now you can navigate, click, snapshot, all from the CLI.
agent-browser navigate https://example.com
agent-browser snapshot
agent-browser click @e1The snapshot command returns the accessibility tree with semantic element refs (@e1, @e2, @e3). The click @e1 command clicks whatever the first element ref pointed to. No CSS selectors needed.
Step 4: how agent-browser actually works
agent-browser runs a native Rust daemon that speaks Chrome DevTools Protocol directly. Three components.
The CLI. What you (or your agent) calls. Sends commands to the daemon over a local socket.
The daemon. A persistent Rust process. Holds the Chrome session open between commands. Manages multiple isolated sessions if you need them.
Chrome (or Lightpanda). The actual browser. agent-browser manages the lifecycle, but you can attach to your own Chrome profile if you want to reuse logged-in sessions.
The accessibility-tree snapshot is the magic. Instead of sending a full DOM dump (tens of thousands of tokens) to your agent, agent-browser sends a structured accessibility tree (a few hundred to a few thousand tokens) with semantic refs. The agent sees a clean list of interactable elements, not a wall of HTML.
Step 5: semantic element refs explained
Every accessibility snapshot returns elements like:
@e1: button "Sign in"
@e2: link "Forgot password"
@e3: textbox "Email"
@e4: textbox "Password"You (or your agent) refer to elements by their @e ref. The daemon resolves the ref to a real DOM element under the hood.
Why this matters for AI agents:
- Tokens. Each ref is a single token. CSS selectors are 10 to 50 tokens.
- Stability. A button's accessibility role rarely changes. Its CSS class changes every redesign.
- Determinism. The daemon hands the agent a consistent ref space. Same page, same refs.