Nadim Tuhin
Published on

The Lazy Developer's Guide to Browser Automation at Scale

Authors

Disclaimer: This is an educational deep-dive into browser automation architecture. The techniques shown can be used for legitimate purposes like testing, research, and managing your own accounts. Don't be evil—respect ToS and rate limits.

I hate maintaining Puppeteer scripts.

Every time a website updates their UI, my selectors break. Every time they add a new modal, my automation fails. I was spending more time fixing scripts than actually using them.

So I built something different: browser agents that think.

Instead of hardcoding document.querySelector('.xyz123'), the AI figures out what to click based on what it sees. Website changed their button from "Submit" to "Post"? The agent adapts. New confirmation dialog? It handles it.

Here's the stack:

  • Multilogin — Run 50+ browser profiles without getting fingerprinted
  • Puppeteer — The muscle for clicking, typing, and navigating
  • LangChain ReAct — The brain that decides what to click

Let's build it.

What We're Building

Instead of hardcoding CSS selectors, our agent uses natural language reasoning:

Agent: "Task: Fill out the contact form and submit"
Agent: "Looking for input fields... found Name, Email, Message"
Agent: "Typing into the Name field..."
Agent: "Typing into the Email field..."
Agent: "Looking for the submit button... found 'Send Message'"
Agent: "Clicking submit..."
Agent: "Checking for confirmation... found 'Thank you!' message"
Agent: "Success! Form submitted."

The magic? I didn't write a single selector. The AI figured it out.

Why This Matters

Traditional AutomationAI-Powered Automation
#submit-btn-v3 breaks when they rename it"Click the submit button" always works
Fails silently on UI changesAdapts or explains why it's stuck
One script per websiteSame agent, any website
Hours debugging selectorsMinutes tweaking prompts

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Your Application                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │  Profile 1  │  │  Profile 2  │  │  Profile N  │     │
│  │  (Account A)│  │  (Account B)│  │  (Account X)│     │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘     │
│         │                │                │             │
│         ▼                ▼                ▼             │
│  ┌─────────────────────────────────────────────────┐   │
│  │           LangChain ReAct Agent                 │   │
│  │     "Think → Act → Observe → Repeat"            │   │
│  └─────────────────────────────────────────────────┘   │
│                          │                              │
│                          ▼                              │
│  ┌─────────────────────────────────────────────────┐   │
│  │              Puppeteer (Browser Control)         │   │
│  │     click() • type() • screenshot() • wait()    │   │
│  └─────────────────────────────────────────────────┘   │
│                          │                              │
│                          ▼                              │
│  ┌─────────────────────────────────────────────────┐   │
│  │         Multilogin (Browser Isolation)           │   │
│  │   Unique fingerprint • Cookies • Proxy • Session │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
                    Target Website

Why Multilogin? Each profile has its own browser fingerprint, cookies, and session. Run 50 profiles simultaneously without them knowing they're from the same machine.

Prerequisites

Install required packages:

npm install puppeteer-core axios md5
npm install @langchain/core @langchain/langgraph @langchain/openai

Note: Multilogin uses HTTP API authentication rather than a dedicated npm SDK. The examples below show how to integrate with their API directly.

Environment variables:

# Multilogin credentials
MULTILOGIN_EMAIL=[email protected]
MULTILOGIN_PASSWORD=your-password

# LLM provider (choose one)
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIz...
ZAI_API_KEY=...

# Optional: Specific model
AI_MODEL=gpt-4o-mini

Step 1: Set Up Multilogin Client

import axios from 'axios'
import md5 from 'md5'

// Multilogin API client helper
async function getMultiloginToken(email: string, password: string): Promise<string> {
  const response = await axios.post('https://api.multilogin.com/user/signin', {
    email,
    password: md5(password),
  })
  return response.data.data.token
}

// Create authenticated client
const token = await getMultiloginToken(
  process.env.MULTILOGIN_EMAIL!,
  process.env.MULTILOGIN_PASSWORD!
)

Step 2: Launch Browser Profile

import puppeteer from 'puppeteer-core'

// Launch profile via Multilogin API
async function startProfile(
  token: string,
  folderId: string,
  profileId: string,
  headless = false
): Promise<{ port: number }> {
  const url = `https://launcher.mlx.yt:45001/api/v2/profile/f/${folderId}/p/${profileId}/start?automation_type=puppeteer&headless_mode=${headless}`

  const response = await axios.get(url, {
    headers: { Authorization: `Bearer ${token}` },
  })

  return response.data.data
}

// Launch and connect
const { port } = await startProfile(token, 'your-folder-uuid', 'your-profile-uuid')

// Connect Puppeteer to Multilogin profile
const browser = await puppeteer.connect({
  browserWSEndpoint: `ws://127.0.0.1:${port}`,
})

const page = await browser.newPage()
await page.setViewport({ width: 1920, height: 1080 })

console.log('Browser started on port:', port)

Step 3: Create LLM Instance

import { ChatOpenAI } from '@langchain/openai'

// Create OpenAI model
const llm = new ChatOpenAI({
  modelName: 'gpt-4o-mini',
  temperature: 0,
  openAIApiKey: process.env.OPENAI_API_KEY,
})

// Or use Gemini
import { ChatGoogleGenerativeAI } from '@langchain/google-genai'

const llm = new ChatGoogleGenerativeAI({
  model: 'gemini-2.0-flash',
  temperature: 0,
  apiKey: process.env.GEMINI_API_KEY,
})

Step 4: Create Browser Tools

LangChain agents need tools to interact with the browser:

import { DynamicStructuredTool } from '@langchain/core/tools'
import { z } from 'zod'

const tools = [
  // Navigate to URL
  new DynamicStructuredTool({
    name: 'navigate',
    description: 'Navigate to a URL',
    schema: z.object({
      url: z.string().url(),
    }),
    func: async ({ url }) => {
      await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 })
      return `Navigated to ${url}`
    },
  }),

  // Click element
  new DynamicStructuredTool({
    name: 'click_element',
    description: 'Click element by CSS selector',
    schema: z.object({
      selector: z.string(),
    }),
    func: async ({ selector }) => {
      await page.waitForSelector(selector, { visible: true, timeout: 10000 })
      await page.click(selector)
      return `Clicked ${selector}`
    },
  }),

  // Type text
  new DynamicStructuredTool({
    name: 'type_text',
    description: 'Type text into an element',
    schema: z.object({
      selector: z.string(),
      text: z.string(),
    }),
    func: async ({ selector, text }) => {
      await page.waitForSelector(selector)
      await page.click(selector)
      await page.keyboard.type(text)
      return `Typed: ${text}`
    },
  }),

  // Press keys
  new DynamicStructuredTool({
    name: 'press_keys',
    description: 'Press keyboard keys',
    schema: z.object({
      keys: z.string(),
    }),
    func: async ({ keys }) => {
      await page.keyboard.press(keys)
      return `Pressed: ${keys}`
    },
  }),

  // Get page info
  new DynamicStructuredTool({
    name: 'get_page_info',
    description: 'Get current page information',
    schema: z.object({}),
    func: async () => {
      return `URL: ${page.url()}, Title: ${await page.title()}`
    },
  }),
]

Step 5: Create LangChain Agent

import { createReactAgent } from '@langchain/langgraph/prebuilt'

// Create ReAct agent (Reasoning + Acting)
const agent = createReactAgent({
  llm,
  tools,
})

Step 6: Build Facebook Posting Prompt

The agent needs clear instructions:

function buildPostingTaskPrompt(content: string): string {
  return `You are an autonomous browser agent controlling a logged-in Facebook session.

=== OBJECTIVE ===
Publish the following content as a new post on Facebook:

${content}

=== EXECUTION STRATEGY ===

PHASE 0: NAVIGATE TO FACEBOOK
- Use: navigate("https://www.facebook.com")
- Wait for page to load
- Verify you're on facebook.com using get_page_info()

PHASE 1: LOCATE POST COMPOSER
- Search for composer using find_elements_by_text("What's on your mind")
- Verify element is visible and clickable

PHASE 2: OPEN COMPOSER MODAL
- Click the composer trigger
- Wait for modal to load (up to 10 seconds)
- Confirm modal is open

PHASE 3: ENTER CONTENT
- Locate text input with contenteditable="true"
- Click inside editor to focus
- Type the content exactly
- Verify text appears

PHASE 4: SUBMIT POST
- Wait 2-3 seconds after typing
- Use: force_click("div[role='dialog'] div[aria-label='Post'][role='button']")
- Alternative: press_keys("Control+Enter")

PHASE 5: VERIFY SUCCESS
- Wait 3 seconds after clicking Post
- Check if modal closed
- Declare success and STOP

=== CORE PRINCIPLES ===
1. ALWAYS use find_elements_by_text() over CSS selectors
2. Wait for elements before interacting (3-5 second timeouts)
3. If an action fails, retry up to 3 times with 2-second delays
4. Stay within facebook.com
5. Think step-by-step and verify each action

Begin execution now.`
}

Step 7: Execute the Task

async function publishFacebookStatus(content: string) {
  try {
    const startTime = Date.now()

    // Invoke the agent
    const result = await agent.invoke({
      messages: [
        {
          role: 'user',
          content: buildPostingTaskPrompt(content),
        },
      ],
    })

    const duration = Date.now() - startTime

    // Extract token usage (if available)
    const lastMessage = result.messages[result.messages.length - 1]
    const tokenUsage = lastMessage?.usage_metadata

    console.log('Task completed successfully:', {
      steps: result.messages.length,
      duration: `${duration}ms`,
      tokenUsage,
    })

    return {
      success: true,
      output: lastMessage?.content,
      steps: result.messages.length,
      duration,
      tokenUsage,
    }
  } catch (error) {
    console.error('Task failed:', error)
    return {
      success: false,
      error: error.message,
    }
  }
}

Step 8: Complete Example

Putting it all together:

import puppeteer, { Browser, Page } from 'puppeteer-core'
import axios from 'axios'
import md5 from 'md5'
import { ChatOpenAI } from '@langchain/openai'
import { DynamicStructuredTool } from '@langchain/core/tools'
import { createReactAgent } from '@langchain/langgraph/prebuilt'
import { z } from 'zod'

// Multilogin API helpers
async function getToken(email: string, password: string): Promise<string> {
  const response = await axios.post('https://api.multilogin.com/user/signin', {
    email,
    password: md5(password),
  })
  return response.data.data.token
}

async function launchProfile(
  token: string,
  folderId: string,
  profileId: string
): Promise<{ port: number }> {
  const url = `https://launcher.mlx.yt:45001/api/v2/profile/f/${folderId}/p/${profileId}/start?automation_type=puppeteer&headless_mode=false`
  const response = await axios.get(url, {
    headers: { Authorization: `Bearer ${token}` },
  })
  return response.data.data
}

async function stopProfile(token: string, profileId: string): Promise<void> {
  await axios.get(`https://launcher.mlx.yt:45001/api/v2/profile/stop/p/${profileId}`, {
    headers: { Authorization: `Bearer ${token}` },
  })
}

// Create browser tools for LangChain
function createBrowserTools(page: Page) {
  return [
    new DynamicStructuredTool({
      name: 'navigate',
      description: 'Navigate to a URL',
      schema: z.object({ url: z.string().url() }),
      func: async ({ url }) => {
        await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 })
        return `Navigated to ${url}`
      },
    }),
    new DynamicStructuredTool({
      name: 'click_element',
      description: 'Click element by CSS selector',
      schema: z.object({ selector: z.string() }),
      func: async ({ selector }) => {
        await page.waitForSelector(selector, { visible: true, timeout: 10000 })
        await page.click(selector)
        return `Clicked ${selector}`
      },
    }),
    new DynamicStructuredTool({
      name: 'type_text',
      description: 'Type text into an element',
      schema: z.object({ selector: z.string(), text: z.string() }),
      func: async ({ selector, text }) => {
        await page.waitForSelector(selector)
        await page.click(selector)
        await page.keyboard.type(text)
        return `Typed: ${text}`
      },
    }),
  ]
}

async function postFacebookStatus({
  folderId,
  profileId,
  content,
}: {
  folderId: string
  profileId: string
  content: string
}) {
  let browser: Browser | null = null
  let token: string | null = null

  try {
    // Step 1: Authenticate with Multilogin
    token = await getToken(process.env.MULTILOGIN_EMAIL!, process.env.MULTILOGIN_PASSWORD!)

    // Step 2: Launch browser profile
    const { port } = await launchProfile(token, folderId, profileId)
    browser = await puppeteer.connect({
      browserWSEndpoint: `ws://127.0.0.1:${port}`,
    })
    const page = await browser.newPage()
    await page.setViewport({ width: 1920, height: 1080 })

    console.log('✓ Browser profile launched')

    // Step 3: Create LLM
    const llm = new ChatOpenAI({
      modelName: process.env.AI_MODEL || 'gpt-4o-mini',
      temperature: 0,
    })

    // Step 4: Create browser tools
    const tools = createBrowserTools(page)

    // Step 5: Create agent
    const agent = createReactAgent({ llm, tools })

    console.log('✓ AI agent initialized')

    // Step 6: Execute posting task
    const taskPrompt = buildPostingTaskPrompt(content)
    const result = await agent.invoke({
      messages: [{ role: 'user', content: taskPrompt }],
    })

    console.log('✓ Post published successfully')

    return {
      success: true,
      steps: result.messages.length,
    }
  } catch (error: any) {
    console.error('✗ Failed:', error.message)
    throw error
  } finally {
    // Always cleanup
    if (browser) await browser.close()
    if (token) await stopProfile(token, profileId)
    console.log('✓ Browser closed')
  }
}

// Usage
postFacebookStatus({
  folderId: 'your-folder-id',
  profileId: 'your-profile-id',
  content: 'Hello from automated posting!',
})

Scaling to Multiple Accounts

import pLimit from 'p-limit'

const limit = pLimit(5) // Max 5 concurrent posts

const accounts = [
  { folderId: '...', profileId: '...' },
  { folderId: '...', profileId: '...' },
  // ... more accounts
]

const tasks = accounts.map((account) =>
  limit(() =>
    postFacebookStatus({
      ...account,
      content: 'Batch post content',
    })
  )
)

const results = await Promise.all(tasks)

The Power of Scale

Here's what this architecture enables:

// Run 50 tasks in parallel, each with isolated browser profile
const accounts = await getAccountProfiles() // Your 50 profiles

const results = await Promise.all(
  accounts.map((account) =>
    limit(() =>
      runAgentTask({
        profile: account,
        task: 'Check notifications and summarize any important messages',
      })
    )
  )
)

console.log(`Processed ${results.length} accounts in parallel`)

Real numbers from my setup:

  • 50 profiles running simultaneously
  • ~$0.02 per task (GPT-4o-mini)
  • 3-5 minutes per complex task
  • Zero selector maintenance for 6+ months

When to Use This Approach

Perfect for:

  • 🔄 Sites that A/B test constantly (your selectors will break weekly)
  • 🌐 Managing multiple legitimate accounts (social media managers, agencies)
  • 🧪 E2E testing across different user states
  • 📊 Research and data collection at scale
  • 🔍 Ad verification and competitor monitoring

Overkill for:

  • Static sites with stable HTML
  • One-off scripts you'll run once
  • Sub-second latency requirements (AI adds 2-5s per decision)

What's Next?

This architecture opens up interesting possibilities:

  • Multi-agent systems — Specialized agents for different parts of a workflow
  • Vision-based navigation — Using screenshot analysis instead of DOM parsing
  • Self-healing selectors — AI that automatically fixes broken automation

The browser automation landscape is evolving fast. AI agents aren't just a gimmick—they're becoming a practical solution for real-world automation challenges.


Resources