Browser Runtime

AnySoul can let your agent browse and act on web pages, but there are currently two different runtime paths:

Web + browser extension
- uses your current browser profile
- opens and manages real browser tabs
- supports explicit structured browser actions
Desktop app
- uses the local browser runtime inside the AnySoul desktop app
- supports managed browser tabs in the app window
- supports the same explicit structured actions, plus richer semantic browser actions when available

This guide explains which path to choose, what each path can do today, and what to expect from performance and limitations.

Quick Decision Guide

If you want…	Choose…	Why
Your agent to act in the browser you already use every day	Web + browser extension	Reuses your current browser tabs and current signed-in browser identity
Per-agent local browser runtime managed by AnySoul	Desktop app	The desktop app owns the local runtime and managed browser tabs
The richest current browser capability surface	Desktop app	Desktop currently supports semantic browser actions in addition to explicit actions
The lightest-weight setup	Web + browser extension	No desktop app required

What Both Runtimes Support Today

Both the extension path and the desktop-app path support the current explicit browser action family:

open and activate tabs
navigate, go back, go forward, reload
read page state
scroll, focus, hover, click, double-click, right-click
drag and drop
press keys
clear, type, paste, copy text
set checked state
select dropdown options
submit forms
upload files
wait for selectors, text, or URL changes
extract structured page data
close tabs

For most deterministic browser flows, this explicit-action family is the best default.

Runtime Differences

Capability	Web + browser extension	Desktop app
Uses your current browser profile	Yes	No
Per-agent isolated local browser runtime	No	Yes
Real browser tabs in your browser	Yes	No
Managed tabs inside AnySoul app window	No	Yes
Explicit structured browser actions	Yes	Yes
`semantic_act`	No	Yes, when the desktop target supports semantic actions
`semantic_extract`	No	Yes, when the desktop target supports semantic extraction

Important Limitation: Extension Does Not Support Semantic Actions

Today, the browser extension executor does not support:

semantic_act
semantic_extract

That means the extension path should be treated as an explicit-action browser runtime.

In practice:

good extension flows are selector-driven and deterministic
extension is great for opening tabs, reading pages, filling forms, uploading files, waiting for page changes, and extracting structured page data
extension is not the right path if your planned workflow depends on natural-language browser commands such as “open the notifications tab” or “extract this page into the following schema without explicit selectors”

If you need those semantic browser actions, use the desktop app path instead.

Why Semantic Actions Are Heavier

The desktop app can expose richer semantic browser actions through Stagehand.

These semantic actions are useful when:

the page is hard to target with stable selectors
the next step is easier to describe in natural language than as a deterministic DOM action
you want a more adaptive page interaction or extraction step

But there is a tradeoff:

semantic actions are usually slower
semantic actions usually consume more tokens
semantic actions add a model-mediated reasoning layer on top of the browser runtime

Use this rule of thumb:

if the page has stable controls and you know what to click, read, type, or extract, prefer the explicit action path
if the desktop runtime is available and the task is hard to express with selectors alone, semantic actions can be worth the extra cost

The runtime choice also changes which browser identity your agent uses.

Web + Browser Extension

your agent uses the current browser identity
if you are already signed in on a site, the extension path sees that same signed-in session
there is no isolated per-agent browser profile

This is convenient, but it also means you should not assume the agent has a separate sandboxed login state.

Desktop App

the desktop app uses the local browser runtime managed by AnySoul
managed browser tabs live inside the AnySoul app window
this path is better when you want the richer desktop browser surface

Setup: Web + Browser Extension

Use this path when you want AnySoul to continue in your real browser.

1. Install and connect the AnySoul browser extension

Install the AnySoul browser extension, sign in, and keep it connected so it can publish live executor presence back to AnySoul.

2. Enable browser runtime for the agent

Open Agent Settings → Browser and enable browser runtime for the agent you want to use.

3. Turn on browser tools in the current run mode

Open the Run Mode editor and enable browser tools there as well.

Both levels matter:

Agent Browser settings are the long-lived policy for that agent
Run Mode decides whether browser tools are exposed for the current run

4. Prefer explicit browser workflows

Because the extension path does not support semantic actions, plan flows like:

open tab
read page
click
type text
wait
extract

instead of relying on natural-language browser commands.

Setup: Desktop App

Use this path when you want the richest current browser runtime.

1. Install the desktop app

Follow the Install Desktop App guide first.

2. Enable the local browser runtime

Open Settings → Browser Runtime inside the desktop app and enable the local browser runtime.

3. Enable browser runtime for the agent

Open Agent Settings → Browser and allow that agent to use the browser runtime.

4. Turn on browser tools in the current run mode

Open the Run Mode editor and enable browser tools for the current mode.

5. Use explicit actions first, semantic actions when needed

The desktop path supports both:

the explicit structured browser actions listed above
semantic actions when the current target exposes them

Use semantic actions when they genuinely simplify a hard page interaction. Otherwise, the explicit path is usually faster and more predictable.

Troubleshooting

The browser toggle in Run Mode is locked

This usually means AnySoul cannot confirm a live browser runtime for the current environment.

Check:

on web: the browser extension is connected and live
on desktop: the local browser runtime is enabled in Settings → Browser Runtime

Cached state alone is not enough to unlock the browser toggle.

The agent browser settings are enabled, but browser still is not available

You need both:

browser enabled for the agent
browser enabled in the current run mode

If either one is off, the tool will stay unavailable.

The model still does not see `browser_control`

If browser is enabled in both places but the model still behaves as if no browser tool exists, check:

the current run mode really includes browser tools
a live browser runtime is currently available, not just cached state
you started a new run after changing browser settings or runtime availability

The browser tool is only injected when the current run, agent policy, and live runtime availability all line up.

The extension is connected, but `semantic_act` / `semantic_extract` do not appear

This is expected.

The extension path currently does not support semantic browser actions, so AnySoul will not advertise them as available browser actions in that runtime.

The desktop app is available, but browser tasks still fail