Browser Runtime
AnySoul can let your agent browse and act on web pages, but there are currently two different runtime paths:
- Web + browser extension
- uses your current browser profile
- opens and manages real browser tabs
- supports explicit structured browser actions
- Desktop app
- uses the local browser runtime inside the AnySoul desktop app
- supports managed browser tabs in the app window
- supports the same explicit structured actions, plus richer semantic browser actions when available
This guide explains which path to choose, what each path can do today, and what to expect from performance and limitations.
Quick Decision Guide
Section titled “Quick Decision Guide”| If you want… | Choose… | Why |
|---|---|---|
| Your agent to act in the browser you already use every day | Web + browser extension | Reuses your current browser tabs and current signed-in browser identity |
| Per-agent local browser runtime managed by AnySoul | Desktop app | The desktop app owns the local runtime and managed browser tabs |
| The richest current browser capability surface | Desktop app | Desktop currently supports semantic browser actions in addition to explicit actions |
| The lightest-weight setup | Web + browser extension | No desktop app required |
What Both Runtimes Support Today
Section titled “What Both Runtimes Support Today”Both the extension path and the desktop-app path support the current explicit browser action family:
- open and activate tabs
- navigate, go back, go forward, reload
- read page state
- scroll, focus, hover, click, double-click, right-click
- drag and drop
- press keys
- clear, type, paste, copy text
- set checked state
- select dropdown options
- submit forms
- upload files
- wait for selectors, text, or URL changes
- extract structured page data
- close tabs
For most deterministic browser flows, this explicit-action family is the best default.
Runtime Differences
Section titled “Runtime Differences”| Capability | Web + browser extension | Desktop app |
|---|---|---|
| Uses your current browser profile | Yes | No |
| Per-agent isolated local browser runtime | No | Yes |
| Real browser tabs in your browser | Yes | No |
| Managed tabs inside AnySoul app window | No | Yes |
| Explicit structured browser actions | Yes | Yes |
semantic_act | No | Yes, when the desktop target supports semantic actions |
semantic_extract | No | Yes, when the desktop target supports semantic extraction |
Important Limitation: Extension Does Not Support Semantic Actions
Section titled “Important Limitation: Extension Does Not Support Semantic Actions”Today, the browser extension executor does not support:
semantic_actsemantic_extract
That means the extension path should be treated as an explicit-action browser runtime.
In practice:
- good extension flows are selector-driven and deterministic
- extension is great for opening tabs, reading pages, filling forms, uploading files, waiting for page changes, and extracting structured page data
- extension is not the right path if your planned workflow depends on natural-language browser commands such as “open the notifications tab” or “extract this page into the following schema without explicit selectors”
If you need those semantic browser actions, use the desktop app path instead.
Why Semantic Actions Are Heavier
Section titled “Why Semantic Actions Are Heavier”The desktop app can expose richer semantic browser actions through Stagehand.
These semantic actions are useful when:
- the page is hard to target with stable selectors
- the next step is easier to describe in natural language than as a deterministic DOM action
- you want a more adaptive page interaction or extraction step
But there is a tradeoff:
- semantic actions are usually slower
- semantic actions usually consume more tokens
- semantic actions add a model-mediated reasoning layer on top of the browser runtime
Use this rule of thumb:
- if the page has stable controls and you know what to click, read, type, or extract, prefer the explicit action path
- if the desktop runtime is available and the task is hard to express with selectors alone, semantic actions can be worth the extra cost
Identity and Login Behavior
Section titled “Identity and Login Behavior”The runtime choice also changes which browser identity your agent uses.
Web + Browser Extension
Section titled “Web + Browser Extension”- your agent uses the current browser identity
- if you are already signed in on a site, the extension path sees that same signed-in session
- there is no isolated per-agent browser profile
This is convenient, but it also means you should not assume the agent has a separate sandboxed login state.
Desktop App
Section titled “Desktop App”- the desktop app uses the local browser runtime managed by AnySoul
- managed browser tabs live inside the AnySoul app window
- this path is better when you want the richer desktop browser surface
Setup: Web + Browser Extension
Section titled “Setup: Web + Browser Extension”Use this path when you want AnySoul to continue in your real browser.
1. Install and connect the AnySoul browser extension
Section titled “1. Install and connect the AnySoul browser extension”Install the AnySoul browser extension, sign in, and keep it connected so it can publish live executor presence back to AnySoul.
2. Enable browser runtime for the agent
Section titled “2. Enable browser runtime for the agent”Open Agent Settings → Browser and enable browser runtime for the agent you want to use.
3. Turn on browser tools in the current run mode
Section titled “3. Turn on browser tools in the current run mode”Open the Run Mode editor and enable browser tools there as well.
Both levels matter:
- Agent Browser settings are the long-lived policy for that agent
- Run Mode decides whether browser tools are exposed for the current run
4. Prefer explicit browser workflows
Section titled “4. Prefer explicit browser workflows”Because the extension path does not support semantic actions, plan flows like:
- open tab
- read page
- click
- type text
- wait
- extract
instead of relying on natural-language browser commands.
Setup: Desktop App
Section titled “Setup: Desktop App”Use this path when you want the richest current browser runtime.
1. Install the desktop app
Section titled “1. Install the desktop app”Follow the Install Desktop App guide first.
2. Enable the local browser runtime
Section titled “2. Enable the local browser runtime”Open Settings → Browser Runtime inside the desktop app and enable the local browser runtime.
3. Enable browser runtime for the agent
Section titled “3. Enable browser runtime for the agent”Open Agent Settings → Browser and allow that agent to use the browser runtime.
4. Turn on browser tools in the current run mode
Section titled “4. Turn on browser tools in the current run mode”Open the Run Mode editor and enable browser tools for the current mode.
5. Use explicit actions first, semantic actions when needed
Section titled “5. Use explicit actions first, semantic actions when needed”The desktop path supports both:
- the explicit structured browser actions listed above
- semantic actions when the current target exposes them
Use semantic actions when they genuinely simplify a hard page interaction. Otherwise, the explicit path is usually faster and more predictable.
Troubleshooting
Section titled “Troubleshooting”The browser toggle in Run Mode is locked
Section titled “The browser toggle in Run Mode is locked”This usually means AnySoul cannot confirm a live browser runtime for the current environment.
Check:
- on web: the browser extension is connected and live
- on desktop: the local browser runtime is enabled in Settings → Browser Runtime
Cached state alone is not enough to unlock the browser toggle.
The agent browser settings are enabled, but browser still is not available
Section titled “The agent browser settings are enabled, but browser still is not available”You need both:
- browser enabled for the agent
- browser enabled in the current run mode
If either one is off, the tool will stay unavailable.
The model still does not see browser_control
Section titled “The model still does not see browser_control”If browser is enabled in both places but the model still behaves as if no browser tool exists, check:
- the current run mode really includes browser tools
- a live browser runtime is currently available, not just cached state
- you started a new run after changing browser settings or runtime availability
The browser tool is only injected when the current run, agent policy, and live runtime availability all line up.
The extension is connected, but semantic_act / semantic_extract do not appear
Section titled “The extension is connected, but semantic_act / semantic_extract do not appear”This is expected.
The extension path currently does not support semantic browser actions, so AnySoul will not advertise them as available browser actions in that runtime.
The desktop app is available, but browser tasks still fail
Section titled “The desktop app is available, but browser tasks still fail”Check:
- Browser Runtime is enabled in desktop settings
- the agent is allowed to use browser runtime
- the current run mode exposes browser tools
- the target page flow is being expressed with actions the current runtime supports
The agent keeps trying to use vague browser instructions
Section titled “The agent keeps trying to use vague browser instructions”If you are on the extension path, switch to explicit steps:
- read the page
- identify the target element
- click or focus it
- type or paste text
- wait for the next state
- extract the result
This is the most reliable way to use the current extension runtime.
If the page is hard to describe with selectors and you truly need natural-language browser instructions, switch to the desktop app path instead of forcing semantic-style planning through the extension runtime.
Related
Section titled “Related”- Install Desktop App — desktop setup
- Browser Extension — Context Bro guide (different product; not the AnySoul browser runtime executor)
- Event Stream — how agent-visible state accumulates over time
- AI Browser Agent use case — public outcome-first introduction
- One Browser Agent, Two Runtime Paths — launch-style announcement post