- Published on
The Lazy Developer's Guide to Browser Automation at Scale
- Authors

- Name
- Nadim Tuhin
- @nadimtuhin
Disclaimer: This is an educational deep-dive into browser automation architecture. The techniques shown can be used for legitimate purposes like testing, research, and managing your own accounts. Don't be evil—respect ToS and rate limits.
I hate maintaining Puppeteer scripts.
Every time a website updates their UI, my selectors break. Every time they add a new modal, my automation fails. I was spending more time fixing scripts than actually using them.
So I built something different: browser agents that think.
Instead of hardcoding document.querySelector('.xyz123'), the AI figures out what to click based on what it sees. Website changed their button from "Submit" to "Post"? The agent adapts. New confirmation dialog? It handles it.
Here's the stack:
- Multilogin — Run 50+ browser profiles without getting fingerprinted
- Puppeteer — The muscle for clicking, typing, and navigating
- LangChain ReAct — The brain that decides what to click
Let's build it.
What We're Building
Instead of hardcoding CSS selectors, our agent uses natural language reasoning:
Agent: "Task: Fill out the contact form and submit"
Agent: "Looking for input fields... found Name, Email, Message"
Agent: "Typing into the Name field..."
Agent: "Typing into the Email field..."
Agent: "Looking for the submit button... found 'Send Message'"
Agent: "Clicking submit..."
Agent: "Checking for confirmation... found 'Thank you!' message"
Agent: "Success! Form submitted."
The magic? I didn't write a single selector. The AI figured it out.
Why This Matters
| Traditional Automation | AI-Powered Automation |
|---|---|
#submit-btn-v3 breaks when they rename it | "Click the submit button" always works |
| Fails silently on UI changes | Adapts or explains why it's stuck |
| One script per website | Same agent, any website |
| Hours debugging selectors | Minutes tweaking prompts |
Architecture
┌─────────────────────────────────────────────────────────┐
│ Your Application │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Profile 1 │ │ Profile 2 │ │ Profile N │ │
│ │ (Account A)│ │ (Account B)│ │ (Account X)│ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ LangChain ReAct Agent │ │
│ │ "Think → Act → Observe → Repeat" │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Puppeteer (Browser Control) │ │
│ │ click() • type() • screenshot() • wait() │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Multilogin (Browser Isolation) │ │
│ │ Unique fingerprint • Cookies • Proxy • Session │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
Target Website
Why Multilogin? Each profile has its own browser fingerprint, cookies, and session. Run 50 profiles simultaneously without them knowing they're from the same machine.
Prerequisites
Install required packages:
npm install puppeteer-core axios md5
npm install @langchain/core @langchain/langgraph @langchain/openai
Note: Multilogin uses HTTP API authentication rather than a dedicated npm SDK. The examples below show how to integrate with their API directly.
Environment variables:
# Multilogin credentials
MULTILOGIN_EMAIL=[email protected]
MULTILOGIN_PASSWORD=your-password
# LLM provider (choose one)
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIz...
ZAI_API_KEY=...
# Optional: Specific model
AI_MODEL=gpt-4o-mini
Step 1: Set Up Multilogin Client
import axios from 'axios'
import md5 from 'md5'
// Multilogin API client helper
async function getMultiloginToken(email: string, password: string): Promise<string> {
const response = await axios.post('https://api.multilogin.com/user/signin', {
email,
password: md5(password),
})
return response.data.data.token
}
// Create authenticated client
const token = await getMultiloginToken(
process.env.MULTILOGIN_EMAIL!,
process.env.MULTILOGIN_PASSWORD!
)
Step 2: Launch Browser Profile
import puppeteer from 'puppeteer-core'
// Launch profile via Multilogin API
async function startProfile(
token: string,
folderId: string,
profileId: string,
headless = false
): Promise<{ port: number }> {
const url = `https://launcher.mlx.yt:45001/api/v2/profile/f/${folderId}/p/${profileId}/start?automation_type=puppeteer&headless_mode=${headless}`
const response = await axios.get(url, {
headers: { Authorization: `Bearer ${token}` },
})
return response.data.data
}
// Launch and connect
const { port } = await startProfile(token, 'your-folder-uuid', 'your-profile-uuid')
// Connect Puppeteer to Multilogin profile
const browser = await puppeteer.connect({
browserWSEndpoint: `ws://127.0.0.1:${port}`,
})
const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080 })
console.log('Browser started on port:', port)
Step 3: Create LLM Instance
import { ChatOpenAI } from '@langchain/openai'
// Create OpenAI model
const llm = new ChatOpenAI({
modelName: 'gpt-4o-mini',
temperature: 0,
openAIApiKey: process.env.OPENAI_API_KEY,
})
// Or use Gemini
import { ChatGoogleGenerativeAI } from '@langchain/google-genai'
const llm = new ChatGoogleGenerativeAI({
model: 'gemini-2.0-flash',
temperature: 0,
apiKey: process.env.GEMINI_API_KEY,
})
Step 4: Create Browser Tools
LangChain agents need tools to interact with the browser:
import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'
const tools = [
// Navigate to URL
new DynamicStructuredTool({
name: 'navigate',
description: 'Navigate to a URL',
schema: z.object({
url: z.string().url(),
}),
func: async ({ url }) => {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 })
return `Navigated to ${url}`
},
}),
// Click element
new DynamicStructuredTool({
name: 'click_element',
description: 'Click element by CSS selector',
schema: z.object({
selector: z.string(),
}),
func: async ({ selector }) => {
await page.waitForSelector(selector, { visible: true, timeout: 10000 })
await page.click(selector)
return `Clicked ${selector}`
},
}),
// Type text
new DynamicStructuredTool({
name: 'type_text',
description: 'Type text into an element',
schema: z.object({
selector: z.string(),
text: z.string(),
}),
func: async ({ selector, text }) => {
await page.waitForSelector(selector)
await page.click(selector)
await page.keyboard.type(text)
return `Typed: ${text}`
},
}),
// Press keys
new DynamicStructuredTool({
name: 'press_keys',
description: 'Press keyboard keys',
schema: z.object({
keys: z.string(),
}),
func: async ({ keys }) => {
await page.keyboard.press(keys)
return `Pressed: ${keys}`
},
}),
// Get page info
new DynamicStructuredTool({
name: 'get_page_info',
description: 'Get current page information',
schema: z.object({}),
func: async () => {
return `URL: ${page.url()}, Title: ${await page.title()}`
},
}),
]
Step 5: Create LangChain Agent
import { createReactAgent } from '@langchain/langgraph/prebuilt'
// Create ReAct agent (Reasoning + Acting)
const agent = createReactAgent({
llm,
tools,
})
Step 6: Build Facebook Posting Prompt
The agent needs clear instructions:
function buildPostingTaskPrompt(content: string): string {
return `You are an autonomous browser agent controlling a logged-in Facebook session.
=== OBJECTIVE ===
Publish the following content as a new post on Facebook:
${content}
=== EXECUTION STRATEGY ===
PHASE 0: NAVIGATE TO FACEBOOK
- Use: navigate("https://www.facebook.com")
- Wait for page to load
- Verify you're on facebook.com using get_page_info()
PHASE 1: LOCATE POST COMPOSER
- Search for composer using find_elements_by_text("What's on your mind")
- Verify element is visible and clickable
PHASE 2: OPEN COMPOSER MODAL
- Click the composer trigger
- Wait for modal to load (up to 10 seconds)
- Confirm modal is open
PHASE 3: ENTER CONTENT
- Locate text input with contenteditable="true"
- Click inside editor to focus
- Type the content exactly
- Verify text appears
PHASE 4: SUBMIT POST
- Wait 2-3 seconds after typing
- Use: force_click("div[role='dialog'] div[aria-label='Post'][role='button']")
- Alternative: press_keys("Control+Enter")
PHASE 5: VERIFY SUCCESS
- Wait 3 seconds after clicking Post
- Check if modal closed
- Declare success and STOP
=== CORE PRINCIPLES ===
1. ALWAYS use find_elements_by_text() over CSS selectors
2. Wait for elements before interacting (3-5 second timeouts)
3. If an action fails, retry up to 3 times with 2-second delays
4. Stay within facebook.com
5. Think step-by-step and verify each action
Begin execution now.`
}
Step 7: Execute the Task
async function publishFacebookStatus(content: string) {
try {
const startTime = Date.now()
// Invoke the agent
const result = await agent.invoke({
messages: [
{
role: 'user',
content: buildPostingTaskPrompt(content),
},
],
})
const duration = Date.now() - startTime
// Extract token usage (if available)
const lastMessage = result.messages[result.messages.length - 1]
const tokenUsage = lastMessage?.usage_metadata
console.log('Task completed successfully:', {
steps: result.messages.length,
duration: `${duration}ms`,
tokenUsage,
})
return {
success: true,
output: lastMessage?.content,
steps: result.messages.length,
duration,
tokenUsage,
}
} catch (error) {
console.error('Task failed:', error)
return {
success: false,
error: error.message,
}
}
}
Step 8: Complete Example
Putting it all together:
import puppeteer, { Browser, Page } from 'puppeteer-core'
import axios from 'axios'
import md5 from 'md5'
import { ChatOpenAI } from '@langchain/openai'
import { DynamicStructuredTool } from '@langchain/core/tools'
import { createReactAgent } from '@langchain/langgraph/prebuilt'
import { z } from 'zod'
// Multilogin API helpers
async function getToken(email: string, password: string): Promise<string> {
const response = await axios.post('https://api.multilogin.com/user/signin', {
email,
password: md5(password),
})
return response.data.data.token
}
async function launchProfile(
token: string,
folderId: string,
profileId: string
): Promise<{ port: number }> {
const url = `https://launcher.mlx.yt:45001/api/v2/profile/f/${folderId}/p/${profileId}/start?automation_type=puppeteer&headless_mode=false`
const response = await axios.get(url, {
headers: { Authorization: `Bearer ${token}` },
})
return response.data.data
}
async function stopProfile(token: string, profileId: string): Promise<void> {
await axios.get(`https://launcher.mlx.yt:45001/api/v2/profile/stop/p/${profileId}`, {
headers: { Authorization: `Bearer ${token}` },
})
}
// Create browser tools for LangChain
function createBrowserTools(page: Page) {
return [
new DynamicStructuredTool({
name: 'navigate',
description: 'Navigate to a URL',
schema: z.object({ url: z.string().url() }),
func: async ({ url }) => {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 })
return `Navigated to ${url}`
},
}),
new DynamicStructuredTool({
name: 'click_element',
description: 'Click element by CSS selector',
schema: z.object({ selector: z.string() }),
func: async ({ selector }) => {
await page.waitForSelector(selector, { visible: true, timeout: 10000 })
await page.click(selector)
return `Clicked ${selector}`
},
}),
new DynamicStructuredTool({
name: 'type_text',
description: 'Type text into an element',
schema: z.object({ selector: z.string(), text: z.string() }),
func: async ({ selector, text }) => {
await page.waitForSelector(selector)
await page.click(selector)
await page.keyboard.type(text)
return `Typed: ${text}`
},
}),
]
}
async function postFacebookStatus({
folderId,
profileId,
content,
}: {
folderId: string
profileId: string
content: string
}) {
let browser: Browser | null = null
let token: string | null = null
try {
// Step 1: Authenticate with Multilogin
token = await getToken(process.env.MULTILOGIN_EMAIL!, process.env.MULTILOGIN_PASSWORD!)
// Step 2: Launch browser profile
const { port } = await launchProfile(token, folderId, profileId)
browser = await puppeteer.connect({
browserWSEndpoint: `ws://127.0.0.1:${port}`,
})
const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080 })
console.log('✓ Browser profile launched')
// Step 3: Create LLM
const llm = new ChatOpenAI({
modelName: process.env.AI_MODEL || 'gpt-4o-mini',
temperature: 0,
})
// Step 4: Create browser tools
const tools = createBrowserTools(page)
// Step 5: Create agent
const agent = createReactAgent({ llm, tools })
console.log('✓ AI agent initialized')
// Step 6: Execute posting task
const taskPrompt = buildPostingTaskPrompt(content)
const result = await agent.invoke({
messages: [{ role: 'user', content: taskPrompt }],
})
console.log('✓ Post published successfully')
return {
success: true,
steps: result.messages.length,
}
} catch (error: any) {
console.error('✗ Failed:', error.message)
throw error
} finally {
// Always cleanup
if (browser) await browser.close()
if (token) await stopProfile(token, profileId)
console.log('✓ Browser closed')
}
}
// Usage
postFacebookStatus({
folderId: 'your-folder-id',
profileId: 'your-profile-id',
content: 'Hello from automated posting!',
})
Scaling to Multiple Accounts
import pLimit from 'p-limit'
const limit = pLimit(5) // Max 5 concurrent posts
const accounts = [
{ folderId: '...', profileId: '...' },
{ folderId: '...', profileId: '...' },
// ... more accounts
]
const tasks = accounts.map((account) =>
limit(() =>
postFacebookStatus({
...account,
content: 'Batch post content',
})
)
)
const results = await Promise.all(tasks)
The Power of Scale
Here's what this architecture enables:
// Run 50 tasks in parallel, each with isolated browser profile
const accounts = await getAccountProfiles() // Your 50 profiles
const results = await Promise.all(
accounts.map((account) =>
limit(() =>
runAgentTask({
profile: account,
task: 'Check notifications and summarize any important messages',
})
)
)
)
console.log(`Processed ${results.length} accounts in parallel`)
Real numbers from my setup:
- 50 profiles running simultaneously
- ~$0.02 per task (GPT-4o-mini)
- 3-5 minutes per complex task
- Zero selector maintenance for 6+ months
When to Use This Approach
Perfect for:
- 🔄 Sites that A/B test constantly (your selectors will break weekly)
- 🌐 Managing multiple legitimate accounts (social media managers, agencies)
- 🧪 E2E testing across different user states
- 📊 Research and data collection at scale
- 🔍 Ad verification and competitor monitoring
Overkill for:
- Static sites with stable HTML
- One-off scripts you'll run once
- Sub-second latency requirements (AI adds 2-5s per decision)
What's Next?
This architecture opens up interesting possibilities:
- Multi-agent systems — Specialized agents for different parts of a workflow
- Vision-based navigation — Using screenshot analysis instead of DOM parsing
- Self-healing selectors — AI that automatically fixes broken automation
The browser automation landscape is evolving fast. AI agents aren't just a gimmick—they're becoming a practical solution for real-world automation challenges.