Nadim Tuhin
Published on

Facebook Request Throttle: A Community-Driven WordPress Plugin

Authors

Your WordPress site is suddenly slow. You check the access logs and find thousands of requests from facebookexternalhit—Facebook's crawler is hammering your server every time someone shares a link. Sound familiar?

This is the story of how a quick fix for a friend turned into an open-source plugin that now helps WordPress sites manage aggressive crawler traffic.

TL;DR: What This Plugin Does

ProblemSolution
Facebook crawler overwhelming your serverConfigurable request throttling
No visibility into crawler behaviorBuilt-in logging system
Images getting blocked unexpectedlySmart request filtering
Other aggressive botsMulti-bot protection

The Problem: Facebook's Aggressive Crawler

When someone shares a URL on Facebook, their crawler (facebookexternalhit/1.1) fetches the page to generate a preview. Sounds harmless—until you realize:

  • Facebook re-crawls pages frequently to keep previews fresh
  • Multiple shares trigger multiple crawl requests
  • Viral content can mean hundreds of requests per minute
  • Shared hosting plans can't handle the load

A friend's WordPress site was experiencing exactly this. Their shared hosting provider was threatening to suspend the account due to resource usage. The culprit? Facebook's crawler making requests faster than the server could handle.

The Solution: Request Throttling

The core concept is simple: track when Facebook's crawler last accessed your site, and if it's too soon, respond with a temporary error instead of rendering the full page.

Here's the essential logic:

function check_facebook_throttle() {
    $user_agent = $_SERVER['HTTP_USER_AGENT'] ?? '';

    // Detect Facebook's crawler
    if (strpos($user_agent, 'facebookexternalhit') === false) {
        return; // Not Facebook, proceed normally
    }

    // Get the URL-specific cache key to allow multiple pages to be crawled
    $request_uri = $_SERVER['REQUEST_URI'] ?? '/';
    $cache_key = 'fb_throttle_' . md5($request_uri);

    $throttle_seconds = get_option('fb_throttle_seconds', 60);
    $last_access = get_transient($cache_key);

    if ($last_access && (time() - $last_access) < $throttle_seconds) {
        // Too soon—tell Facebook to retry later
        // Use 429 (Too Many Requests) which is the proper rate-limiting status
        header('HTTP/1.1 429 Too Many Requests');
        header('Retry-After: ' . $throttle_seconds);
        exit;
    }

    // Record this access
    set_transient($cache_key, time(), $throttle_seconds * 2);
}

add_action('init', 'check_facebook_throttle', 1);

Key implementation notes:

  • Uses per-URL cache keys so crawling page A doesn't block page B
  • Uses HTTP 429 (Too Many Requests) instead of 503, which is the standard rate-limiting response
  • The plugin uses WordPress's Transient API to store access times—no database tables needed, and it works with object caching if you have it

Evolution Through Community Feedback

Issue #1: Images Getting Blocked

A user reached out via Facebook message (ironic, right?) reporting that their images weren't appearing in Facebook previews. The problem: the plugin was blocking all Facebook crawler requests, including those fetching og:image assets.

The fix required detecting image requests:

function is_image_request() {
    $request_uri = $_SERVER['REQUEST_URI'] ?? '';

    // Parse the path without query strings
    $path = parse_url($request_uri, PHP_URL_PATH) ?? '';

    // Check for image extensions
    $image_extensions = ['jpg', 'jpeg', 'png', 'gif', 'webp', 'svg'];
    $extension = strtolower(pathinfo($path, PATHINFO_EXTENSION));

    return in_array($extension, $image_extensions, true);
}

Now image requests bypass the throttle, ensuring previews display correctly.

Issue #2: "What's Actually Happening?"

The same user asked how to see what the plugin was doing. Fair question—throttling is invisible by default. This led to adding a logging system:

function log_throttle_event($action, $user_agent) {
    if (!get_option('fb_throttle_logging', false)) {
        return;
    }

    // Sanitize user agent to prevent log injection
    $safe_user_agent = preg_replace('/[^\x20-\x7E]/', '', $user_agent);
    $safe_user_agent = substr($safe_user_agent, 0, 100);

    // Get real IP (check forwarded headers for proxies/CDNs)
    $ip = $_SERVER['HTTP_X_FORWARDED_FOR'] ?? $_SERVER['REMOTE_ADDR'] ?? 'unknown';
    // Take only the first IP if multiple are present
    $ip = explode(',', $ip)[0];
    $ip = filter_var(trim($ip), FILTER_VALIDATE_IP) ?: 'invalid';

    $log_entry = sprintf(
        "[%s] %s | UA: %s | IP: %s\n",
        gmdate('Y-m-d H:i:s'),
        $action,
        $safe_user_agent,
        $ip
    );

    error_log($log_entry, 3, WP_CONTENT_DIR . '/fb-throttle.log');
}

Security note: User agents and IPs are sanitized before logging to prevent log injection attacks.

Pull Request: Multi-Bot Support

A contributor on GitHub pointed out that Facebook isn't the only aggressive crawler. Their PR expanded detection to include:

  • facebookexternalhit - Facebook link previews
  • Facebot - Facebook's general crawler
  • LinkedInBot - LinkedIn preview fetcher
  • Twitterbot - Twitter/X card generator
  • WhatsApp - WhatsApp link previews

Each bot can be enabled/disabled independently in the settings.

Current Feature Set

After months of community-driven development:

Throttling

  • Configurable delay between requests (default: 60 seconds)
  • Per-URL throttling (crawling page A doesn't block page B)
  • Per-bot throttle settings
  • HTTP 429 response with Retry-After header (the standard rate-limiting response)

Visibility

  • Optional logging to wp-content/fb-throttle.log
  • Log rotation to prevent disk bloat
  • Request details: timestamp, user agent, IP, action taken

Flexibility

  • Whitelist specific paths (e.g., /wp-admin/)
  • Image request bypass
  • WP-CLI commands for cache clearing

Installation

  1. Download from GitHub
  2. Upload to wp-content/plugins/
  3. Activate in WordPress admin
  4. Configure under Settings → FB Throttle

Or manually with WP-CLI (download first, then install):

# Download the plugin
curl -L -o fb-throttle.zip https://github.com/nadimtuhin/Facebook-Request-Throttle-WordPress-Plugin/archive/main.zip

# Install and activate
wp plugin install fb-throttle.zip --activate

When to Use This Plugin

Good fit:

  • Shared hosting with limited resources
  • Sites that frequently go viral on social media
  • WordPress installations seeing high crawler traffic in access logs

Not needed:

  • Sites on dedicated servers with resources to spare
  • Low-traffic blogs with occasional social shares
  • If you're already using Cloudflare or similar with bot management

Lessons Learned

Building this plugin taught me a few things about open-source maintenance:

  1. Real users find real bugs. The image-blocking issue would never have surfaced in my testing—I don't share enough content on Facebook.

  2. Logging is not optional. Users need to see what's happening. "Trust me, it's working" isn't good enough.

  3. Start simple, expand based on need. The original plugin was 30 lines. Now it's ~300, but every addition came from a real use case.

Security Considerations

When implementing crawler throttling, be aware of:

  • User-agent spoofing: Any bot can claim to be facebookexternalhit. This plugin is for rate limiting, not security.
  • Log injection: Always sanitize data before writing to logs (the code above demonstrates this).
  • Race conditions: Under very high load, multiple requests may pass the throttle check simultaneously. For most sites, this is acceptable.

Get Involved

The plugin is MIT-licensed and welcomes contributions:

Final Thoughts

What started as a 30-minute fix for a friend has become a useful tool for the WordPress community. The plugin now handles edge cases I never anticipated, thanks entirely to users who took the time to report problems and developers who contributed fixes.

If you're seeing facebookexternalhit flooding your access logs, give it a try. And if you find a bug—please tell me. That's how this thing keeps getting better.


Resources