- Authors

- Name
- Nadim Tuhin
- @nadimtuhin
I hate maintaining Puppeteer scripts. One CSS class change breaks everything. Selectors rot. Shadow DOM defeats you. And don't get me started on those sites that change their layout every week.
That's why I switched to AI-powered browser agents. Instead of brittle selectors, you describe goals in natural language. The agent figures out how to achieve them—and adapts when the UI changes.
TL;DR: Framework Comparison
| Framework | Best For | Language | Learning Curve |
|---|---|---|---|
| LangChain + LangGraph | Custom agents, full control | JS/TS, Python | Medium |
| Browser Use | Production scraping, stealth | Python, TS | Low |
| Stagehand | Hybrid code + AI | JS/TS | Low |
| Skyvern | Complex multi-site workflows | Python | Medium |
This guide focuses on LangChain + LangGraph—the approach that gives you the most control and customization.
What We're Building
A production-ready browser agent using:
- LangGraph - LangChain's recommended agent framework (replaces legacy LangChain agents)
- Puppeteer or Playwright - Browser automation engine
- LLM Provider - OpenAI, Gemini, Claude, or any compatible model
Why Browser Agents Now?
2025-2026 is the tipping point. According to LangChain's State of Agent Engineering report, 57% of organizations now have agents in production. Browser automation is one of the top use cases.
What changed:
- Vision models got cheap - GPT-4o-mini and Gemini Flash can "see" webpages for pennies
- LangGraph matured - Production-ready stateful agents with human-in-the-loop
- Self-healing is real - AI agents recover from broken selectors automatically
The old way isn't dead—Playwright and Puppeteer are still essential. But AI agents add a reasoning layer that makes automation resilient.
What is a Browser Agent?
A browser agent combines:
- LLM (Language Model) - Understands goals, reasons about next steps
- Browser Tools - Low-level actions (navigate, click, type, etc.)
- Orchestrator - Combines tools and LLM into autonomous workflow (LangGraph)
Traditional vs Agent-Based Automation
Traditional Approach:
// Hardcoded steps - brittle
await page.goto('https://example.com')
await page.click('#search-input')
await page.type('search-input', 'query')
await page.click('#search-button')
await page.waitForNavigation()
Agent-Based Approach:
// Natural language goal - flexible
await agent.executeTask('Go to example.com, search for "query", and get results')
Why this matters:
| Aspect | Traditional | Agent-Based |
|---|---|---|
| UI changes | ❌ Breaks immediately | ✅ Self-heals |
| Error handling | Manual retry logic | AI-driven recovery |
| Complex tasks | Hundreds of lines | Single prompt |
| Debugging | Stack traces | Reasoning trace |
| Maintenance | Constant updates | Minimal |
Architecture
User Prompt
↓
LangGraph ReAct Agent
↓
Reasoning ←→ Browser Tools
↓
Puppeteer/Playwright Page
↓
Website (via CDP)
Core Components
BrowserAgent Class
import type { BaseChatModel } from '@langchain/core/language_models/chat_models'
import { createReactAgent } from '@langchain/langgraph/prebuilt'
import type { Page } from 'puppeteer-core'
export class BrowserAgent {
private page: Page
private llm: BaseChatModel
private tools: LangChainTool[]
private agent: AgentExecutor
constructor(config: { page: Page; browserUrl: string; model: BaseChatModel }) {
this.page = config.page
this.llm = config.model
this.tools = createBrowserTools(this.page)
this.agent = createReactAgent({ llm: this.llm, tools: this.tools })
}
async executeTask(task: string): Promise<ExecuteTaskResult> {
const result = await this.agent.invoke({
messages: [{ role: 'user', content: task }],
})
return result
}
}
Browser Tools Functions the agent can call:
navigate(url)- Navigate to webpageclick_element(selector)- Click elementtype_text(selector, text)- Type into inputpress_keys(keys)- Press keyboard shortcutswait_for_element(selector)- Wait for elementget_page_info()- Get current page infofind_elements_by_text(text)- Find by text contenttake_screenshot()- Capture screenshot
Prerequisites
Before diving in, you'll need:
- Node.js 18+ (for ES modules and native fetch)
- TypeScript (recommended for type safety with LangChain)
- API key from OpenAI, Google AI, or Anthropic
- Basic understanding of Puppeteer or Playwright
Note: This guide uses LangGraph, which is LangChain's recommended framework for production agents as of 2025. Legacy
AgentExecutorfrom LangChain still works but lacks LangGraph's stateful features and human-in-the-loop capabilities.
Step 1: Set Up Dependencies
Install required packages:
npm install @langchain/core @langchain/langgraph
npm install puppeteer-core # or: npm install playwright
npm install zod
Install your preferred LLM provider:
npm install @langchain/openai # OpenAI, Groq, Together, etc.
npm install @langchain/anthropic # Claude
npm install @langchain/google-genai # Gemini
For Playwright instead of Puppeteer:
npm install playwright
npx playwright install chromium
Step 2: Create Browser Tools
Define tools using LangChain's DynamicStructuredTool:
import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'
import type { Page } from 'puppeteer-core'
export const createBrowserTools = (page: Page) => {
return [
// Navigate Tool
new DynamicStructuredTool({
name: 'navigate',
description: 'Navigate to a URL',
schema: z.object({
url: z.string().url().describe('Full URL to navigate to'),
}),
func: async ({ url }) => {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 })
return `Navigated to ${url}`
},
}),
// Click Tool
new DynamicStructuredTool({
name: 'click_element',
description: 'Click element by CSS selector',
schema: z.object({
selector: z.string().describe('CSS selector to click'),
waitForNavigation: z.boolean().optional().describe('Wait after clicking'),
}),
func: async ({ selector, waitForNavigation = false }) => {
await page.waitForSelector(selector, { visible: true, timeout: 10000 })
if (waitForNavigation) {
await Promise.all([
page.waitForNavigation({ waitUntil: 'networkidle2' }),
page.click(selector),
])
} else {
await page.click(selector)
}
return `Clicked ${selector}`
},
}),
// Type Tool
new DynamicStructuredTool({
name: 'type_text',
description: 'Type text into input field',
schema: z.object({
selector: z.string().describe('CSS selector for input'),
text: z.string().describe('Text to type'),
clearFirst: z.boolean().optional().describe('Clear before typing'),
}),
func: async ({ selector, text, clearFirst = true }) => {
await page.waitForSelector(selector, { visible: true, timeout: 10000 })
if (clearFirst) {
await page.click(selector, { clickCount: 3 })
await page.keyboard.press('Backspace')
}
await page.type(selector, text)
return `Typed "${text}" into ${selector}`
},
}),
// Get Page Info Tool
new DynamicStructuredTool({
name: 'get_page_info',
description: 'Get current page information',
schema: z.object({}),
func: async () => {
const url = page.url()
const title = await page.title()
const bodyText = await page.evaluate(() => {
return document.body.innerText.substring(0, 500)
})
return `URL: ${url}, Title: ${title}, Text: ${bodyText}...`
},
}),
]
}
Tool Best Practices
1. Clear Descriptions
// Good
description: 'Navigate to a specific URL. Use this when you need to go to a new webpage.'
// Bad
description: 'Nav'
2. Validation in Schemas
schema: z.object({
url: z.string().url(), // Must be valid URL
timeout: z.number().min(1000).max(60000), // Must be 1-60s
})
3. Error Handling
func: async ({ selector }) => {
try {
await page.click(selector)
return `Success`
} catch (error) {
return `Error: ${error.message}` // Return error to agent
}
}
Step 3: Create LLM Instance
Use LangChain's model classes:
import { ChatOpenAI } from '@langchain/openai'
import { ChatGoogleGenerativeAI } from '@langchain/google-genai'
// OpenAI
const llm = new ChatOpenAI({
modelName: 'gpt-4o-mini',
temperature: 0,
openAIApiKey: process.env.OPENAI_API_KEY,
})
// Gemini
const llm = new ChatGoogleGenerativeAI({
model: 'gemini-2.0-flash',
temperature: 0,
apiKey: process.env.GEMINI_API_KEY,
})
// Claude (Anthropic)
import { ChatAnthropic } from '@langchain/anthropic'
const llm = new ChatAnthropic({
model: 'claude-3-5-sonnet-20241022',
temperature: 0,
anthropicApiKey: process.env.ANTHROPIC_API_KEY,
})
// Any OpenAI-compatible API (Groq, Together, local Ollama, etc.)
const llm = new ChatOpenAI({
modelName: 'llama-3.3-70b-versatile',
temperature: 0,
openAIApiKey: process.env.GROQ_API_KEY,
configuration: {
baseURL: 'https://api.groq.com/openai/v1',
},
})
Temperature Settings
- 0.0-0.2 - Deterministic, good for scraping (exact selectors)
- 0.3-0.5 - Balanced, good for general automation
- 0.6-0.8 - Creative, good for exploration
- 0.9-1.0 - Very creative, good for brainstorming
Step 4: Create Browser Agent
Combine LLM and tools:
import { createReactAgent } from '@langchain/langgraph/prebuilt'
export class BrowserAgent {
private page: Page
private llm: BaseChatModel
private tools: LangChainTool[]
private agent: AgentExecutor
private maxSteps: number
constructor(config: { page: Page; browserUrl: string; model: BaseChatModel; maxSteps?: number }) {
this.page = config.page
this.llm = config.model
this.maxSteps = config.maxSteps || 50
this.tools = createBrowserTools(this.page)
// Create ReAct agent (Reasoning + Acting)
this.agent = createReactAgent({
llm: this.llm,
tools: this.tools,
})
console.log(`BrowserAgent initialized with model: ${config.model.modelName}`)
}
async executeTask(task: string, options: { verbose?: boolean } = {}): Promise<ExecuteTaskResult> {
console.log(`Executing task: "${task}"`)
try {
const startTime = Date.now()
const steps: Step[] = []
// Execute agent
const result = await this.agent.invoke(
{
messages: [{ role: 'user', content: task }],
},
{
recursionLimit: this.maxSteps,
// Track steps for debugging
callbacks: [
{
handleToolStart: (tool, input, _runId) => {
const step = steps.length + 1
if (options.verbose) {
console.log(`Step ${step}: ${tool.name}(${JSON.stringify(input)})`)
}
steps.push({
step,
tool: tool.name,
input,
timestamp: new Date().toISOString(),
})
},
handleToolEnd: (output) => {
if (steps.length > 0) {
steps[steps.length - 1].output = String(output)
}
if (options.verbose) {
console.log(` → ${output}`)
}
},
},
],
}
)
const duration = Date.now() - startTime
// Extract token usage
const lastMessage = result.messages[result.messages.length - 1]
const tokenUsage = lastMessage?.usage_metadata
return {
success: true,
output: String(result.messages[result.messages.length - 1].content),
steps,
duration,
stepsCount: steps.length,
tokenUsage,
}
} catch (error) {
return {
success: false,
error: error.message,
}
}
}
}
Key Features
1. Step Tracking
callbacks: [
{
handleToolStart: (tool, input) => {
console.log(`Step ${step}: ${tool.name}`)
},
handleToolEnd: (output) => {
console.log(` → ${output}`)
},
},
]
2. Recursion Limit Prevent infinite loops:
{
recursionLimit: this.maxSteps, // Stop after N steps
}
3. Token Usage Tracking
const tokenUsage = result.messages[result.messages.length - 1]?.usage_metadata
// { prompt_tokens: 1234, completion_tokens: 5678, total_tokens: 6912 }
Step 5: Execute Tasks
Simple Navigation
const result = await agent.executeTask('Navigate to https://example.com')
console.log(result.output)
// "Navigated to https://example.com"
Multi-Step Task
const result = await agent.executeTask(
'Go to https://google.com, search for "Puppeteer", and click the first result'
)
console.log(`Steps: ${result.stepsCount}`)
console.log(`Duration: ${result.duration}ms`)
console.log(`Tokens: ${result.tokenUsage?.total_tokens}`)
Web Scraping
const result = await agent.executeTask(
'Go to https://news.ycombinator.com, extract the top 5 headlines, and return them'
)
console.log(result.output)
// "1. AI Breaks Records...
// 2. New Study Shows...
// 3. OpenAI Releases...
// 4. ..."
Form Submission
const result = await agent.executeTask(
'Go to https://example.com/signup, fill in the form with name "John" and email "[email protected]", and submit'
)
Advanced Patterns
1. Task Templates
Create reusable task prompts:
const tasks = {
login: (email: string, password: string) =>
`Go to https://example.com/login, enter email "${email}" and password "${password}", and click login`,
scrapeListings: (url: string, count: number) =>
`Go to ${url}, find the first ${count} product cards, extract title, price, and image URL`,
postStatus: (content: string) =>
`Navigate to Facebook, click "What's on your mind", type "${content}", and press Ctrl+Enter to post`,
}
// Use template
await agent.executeTask(tasks.login('[email protected]', 'password123'))
2. Error Recovery
Handle agent failures gracefully:
async function executeWithRetry(
agent: BrowserAgent,
task: string,
maxRetries = 3
): Promise<ExecuteTaskResult> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const result = await agent.executeTask(task)
if (result.success) {
return result
}
console.log(`Attempt ${attempt + 1} failed: ${result.error}`)
// Screenshot for debugging
if (attempt === maxRetries - 1) {
await page.screenshot({ path: 'error.png' })
}
await new Promise((resolve) => setTimeout(resolve, 2000))
}
return { success: false, error: 'Max retries exceeded' }
}
3. Context Injection
Provide additional context to agent:
const context = `
You are browsing Facebook Marketplace.
- You are logged in as a seller.
- Your goal is to post a vehicle listing.
- The form has fields: title, price, description, condition, location.
- Use "Post" button to submit when all fields are filled.
`
const result = await agent.executeTask(context + '\n\n' + task)
4. Step Validation
Validate agent's reasoning:
callbacks: [
{
handleToolStart: (tool, input) => {
// Prevent dangerous actions
if (tool.name === 'navigate' && input.url.includes('delete')) {
throw new Error('Navigation to delete URL prevented')
}
// Log for audit
auditLog.push({
timestamp: new Date(),
tool: tool.name,
input,
})
},
},
]
Real-World Example: Facebook Status Posting
Complete example of posting to Facebook:
import puppeteer from 'puppeteer-core'
import { ChatOpenAI } from '@langchain/openai'
import { createReactAgent } from '@langchain/langgraph/prebuilt'
import { createBrowserTools } from './browserTools'
async function postFacebookStatus(content: string) {
// 1. Launch browser
const browser = await puppeteer.launch({ headless: false })
const page = await browser.newPage()
// 2. Create agent
const llm = new ChatOpenAI({
modelName: 'gpt-4o-mini',
temperature: 0,
openAIApiKey: process.env.OPENAI_API_KEY,
})
const agent = new BrowserAgent({
page,
browserUrl: 'http://localhost:9222',
model: llm,
maxSteps: 30,
})
// 3. Define task
const task = `You are an autonomous browser agent controlling a logged-in Facebook session.
OBJECTIVE: Publish the following content as a new post:
"${content}"
EXECUTION STRATEGY:
1. Navigate to https://www.facebook.com
2. Wait for page to load using get_page_info()
3. Find the post composer (search for "What's on your mind")
4. Click the composer
5. Wait for modal to open (up to 10 seconds)
6. Type the content into the contenteditable div
7. Wait 2-3 seconds
8. Click the "Post" button (use force_click)
9. Wait 3 seconds to verify post was published
10. Declare success and STOP
CORE PRINCIPLES:
- Use find_elements_by_text() for discovery over CSS selectors
- Wait for elements before interacting (3-5 second timeouts)
- If an action fails, retry up to 3 times with 2-second delays
- Stay within facebook.com
- Should take approximately 15-20 steps total
- If you reach 25+ steps, STOP and report issue`
// 4. Execute
const result = await agent.executeTask(task, { verbose: true })
console.log(`Steps: ${result.stepsCount}`)
console.log(`Duration: ${result.duration}ms`)
console.log(`Tokens: ${result.tokenUsage?.total_tokens}`)
// 5. Cleanup
await browser.close()
return result
}
// Usage
postFacebookStatus('Hello from automated posting!')
Error Handling
Common Agent Failures
1. Max Steps Exceeded
if (result.stepsCount >= maxSteps) {
console.warn('Agent reached max steps without completing task')
// Screenshot for debugging
await page.screenshot({ path: 'max-steps-error.png' })
}
2. Tool Execution Failures
// Check if any step failed
const failedSteps = result.steps.filter(
(s) => s.output?.toLowerCase().includes('error') || s.output?.toLowerCase().includes('failed')
)
if (failedSteps.length > 0) {
console.warn('Agent encountered errors:', failedSteps)
}
3. Hallucination (Agent invents actions)
// Verify agent actually used tools
if (result.stepsCount === 0) {
console.warn('Agent did not execute any tools')
}
// Verify final state
const finalUrl = page.url()
if (!finalUrl.includes('expected-domain.com')) {
console.warn('Agent did not reach expected destination')
}
Debugging Techniques
1. Enable Verbose Logging
const result = await agent.executeTask(task, { verbose: true })
// Shows each step in real-time
2. Screenshots on Error
if (!result.success) {
await page.screenshot({ path: `error-${Date.now()}.png` })
await page.content().then((html) => fs.writeFileSync(`error-${Date.now()}.html`, html))
}
3. Step Analysis
result.steps.forEach((step, index) => {
console.log(`Step ${index + 1}:`)
console.log(` Tool: ${step.tool}`)
console.log(` Input: ${JSON.stringify(step.input)}`)
console.log(` Output: ${step.output}`)
console.log(` Timestamp: ${step.timestamp}`)
})
Performance Optimization
1. Reduce Token Usage
Optimize Prompt Length:
// Bad - Too verbose
const task = `Please navigate to the website located at https://example.com. Once you are there, find the search input field and type the text "query". Then click the search button to submit the search.`
// Good - Concise
const task = `Navigate to https://example.com, search for "query"`
Cache Tool Descriptions:
// Tool descriptions are sent with every call
// Keep them short and focused
description: 'Navigate to URL' // Good
description: 'Navigate to a specific URL. Use this when you need to go to a new webpage or different section.' // Verbose
2. Use Appropriate Models
| Task | Recommended Model | Cost (per 1M tokens) | Why |
|---|---|---|---|
| Simple navigation | GPT-4o-mini | 0.60 output | Fast, cheap, good enough |
| Complex reasoning | GPT-4o | 10.00 | Best logic for multi-step |
| High volume | Gemini 2.0 Flash | 0.30 | Extremely cost-effective |
| Vision-heavy | Claude 3.5 Sonnet | 15.00 | Best at "seeing" pages |
| Self-hosted | Llama 3.3 70B | Free (compute only) | Privacy, no rate limits |
Pro tip: Start with GPT-4o-mini. Only upgrade to GPT-4o or Claude when the agent fails on complex reasoning tasks.
3. Limit Steps
// Start with lower limit for testing
const agent = new BrowserAgent({
page,
model: llm,
maxSteps: 15, // Increase after testing
})
// Increase for complex tasks
const agent = new BrowserAgent({
page,
model: llm,
maxSteps: 50,
})
Testing Strategies
1. Unit Test Tools
describe('clickTool', () => {
it('should click element successfully', async () => {
const mockPage = {
waitForSelector: jest.fn(),
click: jest.fn().mockResolvedValue(undefined),
}
const tool = createBrowserTools(mockPage)[1] // clickTool
const result = await tool.func({ selector: '#button' })
expect(result).toContain('Successfully clicked')
expect(mockPage.click).toHaveBeenCalledWith('#button')
})
})
2. Integration Test Agent
describe('BrowserAgent', () => {
it('should navigate and extract text', async () => {
const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()
const agent = new BrowserAgent({ page, browserUrl: 'ws://...', model: llm })
const result = await agent.executeTask('Go to https://example.com and extract the main heading')
expect(result.success).toBe(true)
expect(result.output).toContain('Example Domain')
await browser.close()
})
})
3. End-to-End Tests
describe('Facebook Posting', () => {
it('should post status successfully', async () => {
const result = await postFacebookStatus('Test post')
expect(result.success).toBe(true)
expect(result.stepsCount).toBeLessThan(30)
expect(result.duration).toBeLessThan(60000) // 60 seconds
})
})
Troubleshooting
Agent Stuck in Loop
Problem: Agent repeats same action
Solution: Add explicit stopping condition
const task = `... Stop once you see confirmation message and DO NOT continue.`
Can't Find Element
Problem: Agent keeps looking for non-existent element
Solution: Use text-based discovery
// Instead of: click_element('.obfuscated-class')
// Agent uses: find_elements_by_text('Submit')
Too Slow
Problem: Agent takes 60+ seconds for simple task
Solution: Use faster model and lower step limit
const agent = new BrowserAgent({
model: new ChatOpenAI({ modelName: 'gpt-4o-mini' }),
maxSteps: 20,
})
When to Use LangChain Browser Agents
Good Fit ✅
- Complex multi-step workflows - Login, navigate, fill forms, extract data
- Sites that change frequently - AI adapts to new layouts
- Custom integration needs - Full control over agent behavior
- Internal tools - Dashboard automation, report generation
- Prototyping - Quick proof-of-concept before production
Not Ideal ❌
- High-volume scraping - Consider Browser Use or Skyvern for stealth
- Simple, stable pages - Plain Puppeteer/Playwright is faster and cheaper
- Real-time requirements - Agent reasoning adds 2-5 seconds per step
- Sites with aggressive bot detection - Need specialized anti-detection
Alternative Frameworks
If LangChain + LangGraph isn't the right fit, consider these alternatives:
Browser Use
Browser Use is optimized for production scraping with built-in stealth features.
Key Features:
- Native anti-detection (bypasses CAPTCHAs)
- Custom 30B parameter model optimized for browsing
- ~$0.02 per task (53 tasks per dollar)
- Python and TypeScript support
Best for: High-volume scraping, sites with bot protection
Stagehand
Stagehand by Browserbase combines AI with precise code control.
Key Features:
- Three primitives:
act(),extract(),observe() - Self-healing selectors (caches AI decisions, auto-recovers)
- 44% faster than v2 with direct CDP access
- Works with Playwright or Puppeteer
Best for: Hybrid automation where you want AI flexibility + code precision
Skyvern
Skyvern uses agent swarms for complex workflows.
Key Features:
- Visual understanding of page layout
- Works on never-seen-before websites
- Resistant to layout changes
- Multi-agent coordination
Best for: Complex multi-site workflows, enterprise automation
Puppeteer vs Playwright for AI Agents
Both work well. Here's how to choose:
| Feature | Playwright | Puppeteer |
|---|---|---|
| Browser support | Chrome, Firefox, WebKit | Chrome-focused |
| Languages | JS, Python, .NET, Java | JS/Node only |
| Speed | ~4.5s avg (complex tasks) | ~4.8s avg |
| Auto-wait | Built-in | Manual |
| Community | 64K GitHub stars | 87K GitHub stars |
My recommendation: Use Playwright for new projects—better multi-browser support and auto-wait. Use Puppeteer if you're already invested in Chrome-only workflows.
My Personal Setup
After months of experimentation, here's what I actually use:
- LangGraph + Playwright — My daily driver for custom automation. Full control, great debugging.
- Stagehand — For quick scripts where I want self-healing without building a full agent.
- Browser Use — When I need stealth for scraping protected sites.
Cost breakdown (typical month):
- ~2,000 agent tasks
- GPT-4o-mini for reasoning: ~$8
- Playwright cloud (Browserbase): ~$15
- Total: ~$23/month
For high-volume work, I switch to Gemini Flash (10/month**.
Final Thoughts
LangChain + LangGraph is the right choice when you need:
- Full control over agent behavior and tools
- Custom integrations with your existing stack
- Debugging visibility into every reasoning step
- Flexibility to switch LLM providers
Key implementation points:
- Clear tool descriptions for LLM understanding
- Appropriate step limits (start with 15-20, increase as needed)
- Error recovery with screenshots
- Token usage monitoring for cost control
Start with simple navigation tasks, add complexity gradually, and always test before production.
The browser automation landscape is evolving fast. What worked in 2024 (pure Puppeteer scripts) is being replaced by AI-native approaches. Get comfortable with these patterns now—they'll be table stakes by 2027.
Resources
LangChain/LangGraph:
Browser Automation:
Alternative Frameworks:
- Browser Use - Production-ready scraping
- Stagehand - Hybrid AI + code automation
- Skyvern - Multi-agent browser automation
Further Reading:
- State of Agent Engineering 2026 - LangChain's industry report
- Building Web Agents with LangGraph