Skip to content

Browser Runtime

AnySoul can let your agent browse and act on web pages, but there are currently two different runtime paths:

  1. Web + browser extension
    • uses your current browser profile
    • opens and manages real browser tabs
    • supports explicit structured browser actions
  2. Desktop app
    • uses the local browser runtime inside the AnySoul desktop app
    • supports managed browser tabs in the app window
    • supports the same explicit structured actions, plus richer semantic browser actions when available

This guide explains which path to choose, what each path can do today, and what to expect from performance and limitations.

If you want…Choose…Why
Your agent to act in the browser you already use every dayWeb + browser extensionReuses your current browser tabs and current signed-in browser identity
Per-agent local browser runtime managed by AnySoulDesktop appThe desktop app owns the local runtime and managed browser tabs
The richest current browser capability surfaceDesktop appDesktop currently supports semantic browser actions in addition to explicit actions
The lightest-weight setupWeb + browser extensionNo desktop app required

Both the extension path and the desktop-app path support the current explicit browser action family:

  • open and activate tabs
  • navigate, go back, go forward, reload
  • read page state
  • scroll, focus, hover, click, double-click, right-click
  • drag and drop
  • press keys
  • clear, type, paste, copy text
  • set checked state
  • select dropdown options
  • submit forms
  • upload files
  • wait for selectors, text, or URL changes
  • extract structured page data
  • close tabs

For most deterministic browser flows, this explicit-action family is the best default.

CapabilityWeb + browser extensionDesktop app
Uses your current browser profileYesNo
Per-agent isolated local browser runtimeNoYes
Real browser tabs in your browserYesNo
Managed tabs inside AnySoul app windowNoYes
Explicit structured browser actionsYesYes
semantic_actNoYes, when the desktop target supports semantic actions
semantic_extractNoYes, when the desktop target supports semantic extraction

Important Limitation: Extension Does Not Support Semantic Actions

Section titled “Important Limitation: Extension Does Not Support Semantic Actions”

Today, the browser extension executor does not support:

  • semantic_act
  • semantic_extract

That means the extension path should be treated as an explicit-action browser runtime.

In practice:

  • good extension flows are selector-driven and deterministic
  • extension is great for opening tabs, reading pages, filling forms, uploading files, waiting for page changes, and extracting structured page data
  • extension is not the right path if your planned workflow depends on natural-language browser commands such as “open the notifications tab” or “extract this page into the following schema without explicit selectors”

If you need those semantic browser actions, use the desktop app path instead.

The desktop app can expose richer semantic browser actions through Stagehand.

These semantic actions are useful when:

  • the page is hard to target with stable selectors
  • the next step is easier to describe in natural language than as a deterministic DOM action
  • you want a more adaptive page interaction or extraction step

But there is a tradeoff:

  • semantic actions are usually slower
  • semantic actions usually consume more tokens
  • semantic actions add a model-mediated reasoning layer on top of the browser runtime

Use this rule of thumb:

  • if the page has stable controls and you know what to click, read, type, or extract, prefer the explicit action path
  • if the desktop runtime is available and the task is hard to express with selectors alone, semantic actions can be worth the extra cost

The runtime choice also changes which browser identity your agent uses.

  • your agent uses the current browser identity
  • if you are already signed in on a site, the extension path sees that same signed-in session
  • there is no isolated per-agent browser profile

This is convenient, but it also means you should not assume the agent has a separate sandboxed login state.

  • the desktop app uses the local browser runtime managed by AnySoul
  • managed browser tabs live inside the AnySoul app window
  • this path is better when you want the richer desktop browser surface

Use this path when you want AnySoul to continue in your real browser.

1. Install and connect the AnySoul browser extension

Section titled “1. Install and connect the AnySoul browser extension”

Install the AnySoul browser extension, sign in, and keep it connected so it can publish live executor presence back to AnySoul.

Open Agent Settings → Browser and enable browser runtime for the agent you want to use.

3. Turn on browser tools in the current run mode

Section titled “3. Turn on browser tools in the current run mode”

Open the Run Mode editor and enable browser tools there as well.

Both levels matter:

  • Agent Browser settings are the long-lived policy for that agent
  • Run Mode decides whether browser tools are exposed for the current run

Because the extension path does not support semantic actions, plan flows like:

  • open tab
  • read page
  • click
  • type text
  • wait
  • extract

instead of relying on natural-language browser commands.

Use this path when you want the richest current browser runtime.

Follow the Install Desktop App guide first.

Open Settings → Browser Runtime inside the desktop app and enable the local browser runtime.

Open Agent Settings → Browser and allow that agent to use the browser runtime.

4. Turn on browser tools in the current run mode

Section titled “4. Turn on browser tools in the current run mode”

Open the Run Mode editor and enable browser tools for the current mode.

5. Use explicit actions first, semantic actions when needed

Section titled “5. Use explicit actions first, semantic actions when needed”

The desktop path supports both:

  • the explicit structured browser actions listed above
  • semantic actions when the current target exposes them

Use semantic actions when they genuinely simplify a hard page interaction. Otherwise, the explicit path is usually faster and more predictable.

This usually means AnySoul cannot confirm a live browser runtime for the current environment.

Check:

  • on web: the browser extension is connected and live
  • on desktop: the local browser runtime is enabled in Settings → Browser Runtime

Cached state alone is not enough to unlock the browser toggle.

The agent browser settings are enabled, but browser still is not available

Section titled “The agent browser settings are enabled, but browser still is not available”

You need both:

  • browser enabled for the agent
  • browser enabled in the current run mode

If either one is off, the tool will stay unavailable.

The model still does not see browser_control

Section titled “The model still does not see browser_control”

If browser is enabled in both places but the model still behaves as if no browser tool exists, check:

  • the current run mode really includes browser tools
  • a live browser runtime is currently available, not just cached state
  • you started a new run after changing browser settings or runtime availability

The browser tool is only injected when the current run, agent policy, and live runtime availability all line up.

The extension is connected, but semantic_act / semantic_extract do not appear

Section titled “The extension is connected, but semantic_act / semantic_extract do not appear”

This is expected.

The extension path currently does not support semantic browser actions, so AnySoul will not advertise them as available browser actions in that runtime.

The desktop app is available, but browser tasks still fail

Section titled “The desktop app is available, but browser tasks still fail”

Check:

  • Browser Runtime is enabled in desktop settings
  • the agent is allowed to use browser runtime
  • the current run mode exposes browser tools
  • the target page flow is being expressed with actions the current runtime supports

The agent keeps trying to use vague browser instructions

Section titled “The agent keeps trying to use vague browser instructions”

If you are on the extension path, switch to explicit steps:

  • read the page
  • identify the target element
  • click or focus it
  • type or paste text
  • wait for the next state
  • extract the result

This is the most reliable way to use the current extension runtime.

If the page is hard to describe with selectors and you truly need natural-language browser instructions, switch to the desktop app path instead of forcing semantic-style planning through the extension runtime.