browser-automationweb-scraping Status: active

Stagehand

Open-source browser automation framework pairing natural-language actions with deterministic Playwright execution

Stagehand is Browserbase's open-source browser automation framework. It lets you describe browser actions in plain English and falls back to deterministic Playwright execution for reliability. It's become a building block for agent builders who want AI-driven web interaction without giving up on testability.

Anyone who has tried to maintain a web scraper or browser automation script for more than six months knows the maintenance problem. The site updates, a class name changes, a modal appears where it didn't before, and your carefully written Playwright selectors break. You fix them, they break again. The production cost of brittle browser automation is enormous and mostly invisible until it is very visible at 2am.

Stagehand is Browserbase's approach to that problem. Instead of describing every browser action as a precise selector and step sequence, you describe what you want in plain English and let an LLM figure out the current page state and how to accomplish the goal. When precision matters more than flexibility, you drop back to deterministic Playwright. The two modes live in the same script and compose naturally. That hybrid approach is what makes Stagehand interesting as a building block for agents, not just as an automation convenience.

Quick verdict

If you're building an AI agent that needs to interact with the web, Stagehand is one of the most practical starting points available. The open-source MIT license, the Python and TypeScript SDKs, and the Playwright integration mean you're working with a tool built on solid foundations rather than a proprietary black box. The main costs are the LLM API calls for natural-language steps and Browserbase hosting if you need managed browsers. For agent builders and developers running data extraction pipelines on dynamic sites, Stagehand earns a place in the stack.

What Stagehand is

Browserbase released Stagehand in August 2024 as an open-source framework for LLM-driven browser automation. The repository lives at github.com/browserbase/stagehand. Browserbase is a San Francisco company (founded 2023) that sells managed browser infrastructure; Stagehand is the framework that makes that infrastructure easier to use with AI.

The core abstraction is three methods: act(), extract(), and observe(). You call act("click the login button") and Stagehand figures out where the login button is on the current page state and clicks it. You call extract("get the product title and price from this page") and Stagehand returns structured data. You call observe("what form fields are on this page") and Stagehand describes the current state, which helps with decision branching in agent loops.

Under the hood, act() calls your configured LLM with a description of the current page (DOM, screenshot, or both depending on configuration) and asks it to produce a Playwright action. The LLM returns something like page.click('[data-testid="login-btn"]'), which Stagehand then executes. If the LLM's selector is wrong, Stagehand retries with a different prompt strategy. For deterministic steps where you already know the exact selector, you write plain Playwright and call it directly in the same script.

That composability is the design win. A typical Stagehand script might navigate to a URL with page.goto() (pure Playwright), call stagehand.act("accept the cookie banner if present") (AI-driven, handles whatever the current cookie modal looks like), fill a known form field with page.fill('#email', email) (deterministic), then call stagehand.extract("get the confirmation number from the success message") (AI-driven extraction that works regardless of the exact HTML structure). Each step uses the right tool for the job.

The technical architecture

The act-extract-observe loop

act() is the most used method. You pass a plain-English instruction, and Stagehand:

Takes a snapshot of the current page (DOM tree, optionally screenshot)
Sends that snapshot plus your instruction to the configured LLM
Receives a Playwright action back
Executes the action
Retries with additional context if the action fails

The retry behavior and the ability to include screenshots give Stagehand better success rates on visually complex pages than approaches that rely on DOM alone. Modern JavaScript-heavy apps sometimes have semantic HTML that doesn't match their visual structure; including a screenshot lets the LLM reason about what the user actually sees.

extract() returns structured data from the current page. You define the shape of the data you want (TypeScript interface or Pydantic model in Python), pass an instruction, and Stagehand populates the structure from the page content. This is more reliable than writing XPath for complex, variable HTML structures because the LLM understands the semantic content rather than the structural position.

observe() is the introspection method. It's useful in agent loops where you need to understand the current page state before deciding what to do next. An agent that's filling out a multi-step form can call observe() after each step to confirm it's on the expected screen before proceeding.

Self-healing selectors

When Stagehand generates a Playwright action using an LLM, it generates the selector based on the current page state. This means if the page layout changes between runs, the next run's LLM call will generate a selector based on the new layout. The selector "heals" itself without manual intervention.

This is a genuine improvement over traditional test maintenance. A Playwright test suite with hundreds of hardcoded selectors is a maintenance burden. A Stagehand script where the selectors are regenerated dynamically is more resilient to UI updates. The trade-off is latency and cost per step; each act() call makes an LLM API call. For high-frequency automation on tight budgets, traditional Playwright is cheaper per operation. For lower-frequency workflows where maintenance cost dominates, Stagehand's economics look better.

Model configuration

Stagehand is model-agnostic. You configure it with an LLM client and API key. Claude Sonnet and Claude Haiku are popular choices: Sonnet for complex pages where reasoning quality matters, Haiku for simpler steps where cost efficiency matters more. GPT-4o is widely used. Gemini Flash is an option for cost-sensitive high-volume extraction.

The model choice affects two things: success rate on ambiguous instructions and cost per step. On complex modern web apps with dynamic content and heavy JavaScript, a stronger model (Claude Sonnet, GPT-4o) will succeed more reliably. On simple data extraction from relatively structured pages, a cheaper fast model performs adequately.

Browserbase integration

Stagehand ships with a first-class integration with Browserbase's managed browser service. Browserbase runs headless Chrome instances in the cloud, handles session management, provides live debugging views, and stores session recordings for troubleshooting.

The free tier gives you 10 hours of browser time per month, which is enough to prototype and run small workflows. Beyond that, Browserbase charges on usage. The rates as of early 2026 are usage-based and scale with the number of concurrent sessions and session duration; check their pricing page for current numbers.

You can also run Stagehand against a local Playwright browser instance without Browserbase. That's the right setup for development and for anyone who wants to avoid the hosting cost entirely. The Browserbase integration becomes valuable when you need multiple concurrent browser sessions, reliable infrastructure without managing your own headless browser setup, or session recordings for debugging failed runs.

Pricing

Stagehand is MIT-licensed and free. You install it with npm install @browserbase/stagehand (TypeScript) or pip install stagehand (Python). No license key, no registration required to run it locally.

Your two variable costs are:

LLM API calls: each act() and extract() call makes one or more API calls to your configured model. At Claude Sonnet prices (roughly $3 per million input tokens, $15 per million output tokens as of early 2026), a typical act() call costs fractions of a cent. On high-volume pipelines processing thousands of pages, this adds up. Build a cost estimate based on your expected call volume before scaling.
Browserbase hosting: if you use managed browsers, the free tier covers 10 hours/month. Beyond that, it's usage-based. For most developers the free tier covers experimentation. For production pipelines, model the cost based on expected browser session hours.

There's no "Stagehand Pro" or paid middleware tier. The money flows to LLM providers and optionally to Browserbase for hosting.

Stagehand vs the alternatives

Stagehand vs Browser Use

Browser Use is the Python-first open-source alternative. Both solve similar problems: giving an AI agent control over a browser. Browser Use is oriented toward giving the LLM full control of the browser as an agent tool; the LLM decides what to do next at every step. Stagehand's hybrid model gives you more explicit control over which steps are AI-driven and which are deterministic.

For Python-first teams building AI agents, Browser Use often wins on familiarity and ecosystem fit. For TypeScript teams or for projects where you want more explicit control over the automation flow, Stagehand is the more natural choice. Both are worth trying; the practical difference often comes down to whether you want the LLM in the driver's seat entirely or as a co-pilot alongside your explicit code.

Stagehand vs Skyvern

Skyvern is a commercial product focused on workflow automation. Where Stagehand is a framework you embed in your code, Skyvern is more of a service with a workflow builder interface. Skyvern is the better fit if you want a no-code or low-code tool for non-developers. Stagehand is the better fit if you're a developer building custom agents or pipelines.

Stagehand vs Multion

Multion is a cloud service that handles browser automation as an API. You make an API call describing what you want done; Multion's infrastructure runs the browser. Stagehand is a framework you control; you manage the execution environment. Multion abstracts away more; Stagehand gives you more control. For developers who want to own the full execution stack, Stagehand is the right level of abstraction.

Who should use Stagehand

Agent builders are the primary audience. If you're building an AI agent that needs to interact with web interfaces, form submission, data extraction from dynamic pages, navigation through multi-step flows, Stagehand provides a reliable abstraction without requiring you to write and maintain hundreds of Playwright selectors.

QA engineers writing integration tests against dynamic UIs are a natural fit. The self-healing selector approach reduces test suite maintenance significantly on applications that ship UI updates frequently.

Data engineers running extraction pipelines against sites with no public API will find extract() a more maintainable approach than XPath-based scraping. The structured output typing (TypeScript interfaces or Pydantic models) also makes downstream processing cleaner.

The developers for whom Stagehand is less appropriate: anyone running very high-volume, cost-sensitive scraping where LLM API costs per page view are not acceptable, and anyone who needs the maximum possible speed and can maintain traditional selectors. Traditional Playwright is cheaper and faster for well-understood, stable UIs. Stagehand pays off when the UI is variable, dynamic, or maintained by a third party you don't control.

Getting started

TypeScript install:

npm install @browserbase/stagehand playwright

Python install:

pip install stagehand playwright
python -m playwright install

A minimal TypeScript example:

import { Stagehand } from "@browserbase/stagehand";

const stagehand = new Stagehand({
  env: "LOCAL",
  modelName: "claude-sonnet-4-6",
  modelClientOptions: { apiKey: process.env.ANTHROPIC_API_KEY },
});

await stagehand.init();
const page = stagehand.page;

await page.goto("https://example.com/login");
await stagehand.act("fill in the email field with [email protected]");
await stagehand.act("fill in the password field with mypassword");
await stagehand.act("click the sign in button");

const result = await stagehand.extract("get the user's display name from the header");
console.log(result);

await stagehand.close();

Start with a low-stakes page: a public news site's top headlines, a product catalog, a public API's web interface. Get extract() working and returning the data structure you expect. Then try act() with a simple navigation step. Build complexity gradually.

For agent use cases, look at the example projects in the Stagehand repo. The framework composes well with LangChain, LlamaIndex, and direct Anthropic/OpenAI SDK calls. Most agent builders wire Stagehand as one tool among several in an agent loop, alongside search tools, code execution, and memory systems.

The bottom line

Stagehand is one of the better-designed open-source tools in the browser automation space. The hybrid AI-plus-Playwright model is the right architecture for production use: you get the flexibility of natural-language instructions where selectors are brittle, and you keep the reliability of deterministic code where it matters. The MIT license, active development from Browserbase, and support for multiple LLM providers make it a solid foundation rather than a dead-end dependency.

The costs are real: LLM API calls per step, and Browserbase hosting if you need managed browsers. Model both before committing to Stagehand in a high-volume pipeline. For lower-volume agent workflows and developer-facing automation, those costs are a reasonable trade for the maintenance reduction. If you're building something that interacts with the web, it deserves a place in your evaluation alongside Browser Use and Skyvern.

Key features

Natural-language browser actions with AI model backends
Deterministic Playwright fallback for reliable automation
TypeScript and Python SDKs
Compatible with any LLM provider via configuration
Browserbase managed browser hosting integration
Page observation and extraction methods for structured data
Self-healing selectors that adapt to UI changes

Pros and cons

Pros

+ Open-source and MIT-licensed with active development
+ Hybrid AI-plus-Playwright approach is more reliable than pure LLM automation
+ Works with any LLM via configurable providers
+ Python and TypeScript SDKs cover most teams
+ Backed by Browserbase's managed browser infrastructure
+ Self-healing selectors reduce maintenance on changing UIs

Cons

− Browserbase hosting costs add up on high-volume scraping
− Less mature than pure Playwright for non-AI automation tasks
− LLM-driven steps add latency compared to deterministic selectors
− Requires LLM API keys and manages associated costs
− Documentation depth trails the feature set in places

Who is Stagehand for?

Agent builders needing reliable web interaction in AI workflows
QA engineers writing tests that adapt to UI changes
Data extraction pipelines where XPath brittle selectors are a maintenance burden
Prototype automation against sites without public APIs

Alternatives to Stagehand

If Stagehand isn't quite the right fit, the closest alternatives are browser-use , skyvern , multion , and project-mariner . See our full Stagehand alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is Stagehand?

Stagehand is an open-source browser automation framework from Browserbase. It lets you control a browser using natural-language instructions backed by an LLM, and falls back to deterministic Playwright actions for steps that need reliability. It's used by developers building AI agents that interact with the web.

How does Stagehand differ from pure Playwright?

Playwright requires you to write precise selectors and action sequences in code. Stagehand lets you describe what you want in plain English for steps where the exact selector is fragile or unknown, and uses Playwright for the deterministic parts where reliability matters more than flexibility. The two modes compose in a single script.

Is Stagehand free to use?

Stagehand itself is MIT-licensed and free. You run it against your own browser or via Browserbase's hosted browser service. Browserbase has a free tier with 10 hours of browser time per month, then usage-based pricing. You also pay for whatever LLM API calls Stagehand makes.

What LLMs does Stagehand support?

Stagehand is model-agnostic. You configure it with your preferred LLM provider and API key. Claude, GPT-4o, and Gemini are all used by Stagehand developers in production. The model quality affects how reliably the natural-language steps succeed on complex pages.

How does Stagehand compare to Browser Use?

Both are open-source browser automation frameworks driven by LLMs. Browser Use is Python-first and focuses on giving an LLM full control of a browser. Stagehand's hybrid approach, natural language where helpful, deterministic Playwright where reliable, is designed for production pipelines where you need both flexibility and testability. TypeScript-first teams tend toward Stagehand; Python-first teams often start with Browser Use.

Related agents

Bardeen

AI agent for browser automation and multi-step workflows without writing code

workflow-automationbrowser-automation Free + from $99/mo

Browserbase

Hosted headless browser infrastructure purpose-built for AI agents, with session management and full observability

browser-automationinfrastructure Free + from $39/mo

64 ★ ↑ 1.6%

Perplexity Comet

Perplexity's AI-native browser that brings search and research directly into your browsing

ai-searchproductivity From $20/mo