Resources

Oct 13, 2025

Guilherme Hortinha

How to ensure Cloudflare is not blocking AI bots (October 2025 quick demo)

The short answer: Cloudflare can block AI bots by default, here’s how to allow trusted crawlers. Since July–September 2025, Cloudflare has introduced AI Crawl Control and a default stance that blocks many AI crawlers for new zones unless you opt in. To stay visible in answer engines, explicitly allow reputable bots in the dashboard, align robots.txt, and verify with live tests and logs.

Cloudflare dashboard concept showing allowing AI LLM bots and crawlers via AI Crawl Control for AEO visibility.
Cloudflare dashboard concept showing allowing AI LLM bots and crawlers via AI Crawl Control for AEO visibility.
Cloudflare dashboard concept showing allowing AI LLM bots and crawlers via AI Crawl Control for AEO visibility.
  • What changed: new controls, default blocking for many AI bots, and optional Pay Per Crawl to charge for access (Cloudflare blog, Cloudflare press release).

  • Which bots matter: OpenAI’s GPTBot, Perplexity-User, Google-Extended and standard Googlebot for search inclusion (OpenAI bots).

  • Where to switch it: Cloudflare Dashboard, AI Crawl Control, then per-bot toggles and settings (Cloudflare docs).

  • How to verify: use curl with bot user-agents, check WAF and Bot logs, and confirm genuine IP ranges where published (Google crawler basics).


Why this matters: AI search is siphoning intent and visibility

AI answers now sit between customers and your website, so if reputable crawlers cannot access your pages, your brand may not be seen or cited when people ask tools like ChatGPT or Perplexity for recommendations. Google’s own guidance explains how AI features surface results and link to the web, which means eligibility depends on clear access and helpful content (Google Search Central on AI features).

Independent analyses show rising use of robots.txt to govern AI agents, with explicit directives for GPTBot and others becoming common practice for policy signalling and documentation (Paul Calvano on AI bots and robots.txt).

“Creators should be in the driver’s seat,” Cloudflare notes, as it gives publishers more control over who accesses content and on what terms (Cloudflare blog).

If you need a pragmatic lift, indexLab’s AI Visibility Audit pinpoints where bots are blocked, which entities need strengthening, and which pages are most likely to earn citations across answer engines, then implements fixes end to end.


What changed in Cloudflare: AI Crawl Control and default bot treatment

Default posture and what “blocked” means

Cloudflare’s AI Crawl Control lets you choose allow or block for each known AI crawler, with many domains now seeing a default block during setup unless you opt in to allow trusted bots. You also get a dashboard to manage responses, including a 402 for monetised access if required (Cloudflare docs).

Trusted AI bots versus unwanted scrapers

Cloudflare distinguishes verified crawlers from stealthy scrapers, and has publicly documented non-compliant behaviour from some agents that evade robots.txt and WAF rules, which is why edge enforcement matters for marketers balancing visibility with brand safety (Cloudflare report on stealth crawling).


Which AI bots should you allow for AEO and GEO

Priority crawlers for brand visibility

  • OpenAI GPTBot: allow for retrieval that can cite and link to your pages in ChatGPT answers, while controlling training separately in robots.txt (OpenAI bots).

  • Perplexity-User: enables user-initiated fetching that links to sources, with published IP ranges for verification (Perplexity docs).

  • Google-Extended: controls use of content by Gemini and related AI products, independent from Googlebot’s indexing role for Search (Google AI features guidance).

  • Googlebot: keep allowed for core SEO discovery and indexing, with policies documented in robots.txt and Search Central (Google robots.txt guide).


Handling robots.txt versus Cloudflare controls

Robots.txt declares your policy in public, while Cloudflare enforces it at the edge, including for bots that ignore robots.txt. Use both, and keep rules minimal, explicit, and aligned with your dashboard toggles (Cloudflare integration overview).


Quick demo: ensuring Cloudflare is not blocking AI bots

A focused walkthrough you can complete today to restore visibility for trusted crawlers (Cloudflare getting started).

Step 1: Check the setting when adding a domain to Cloudflare

  • During onboarding, look for the AI crawlers prompt, then choose Allow trusted AI crawlers rather than block.

  • If the zone already exists, proceed to AI Crawl Control after setup to review current toggles (Cloudflare press release).

Cloudflare domain onboarding screen with option selected for allowing AI LLM bots and crawlers instead of blocking.


Step 2: Use AI Crawl Control to allow specific AI bots

  • In the dashboard, open AI Crawl Control, then the Crawlers tab.

  • Toggle Allow for GPTBot, Perplexity-User and Google-Extended, and keep unknown or non-compliant agents blocked.

  • If needed, configure a 402 response for paid access on specific pages.

Cloudflare AI Crawl Control view with per-bot toggles allowing GPTBot, Perplexity-User and Google-Extended


Step 3: Verify: headers, logs and live tests

  • Run curl -A "GPTBot" against a non-sensitive URL and confirm a 200, not a 403.

  • Check Cloudflare logs and Bot Management for hits by specific user-agents.

  • Validate genuine traffic for Perplexity using the published IP list, and keep monitoring for evasion behaviours reported by Cloudflare and others (Perplexity IPs).


Recommended configuration: a balanced allow-list for marketers

AI bots to allow by default, purpose and verification cue

Bot

Why it matters for AEO

How to verify

Notes

GPTBot

Enables ChatGPT to see and cite your pages

User-Agent: GPTBot, 200 in logs

Control training in robots.txt separately (OpenAI bots)

Perplexity-User

User-initiated retrieval with source links

Match documented IPs, 200 status

Monitor for compliance changes (Perplexity docs)

Google-Extended

Signals permission for Gemini and related AI

robots.txt rules visible

Does not affect Search ranking inclusion (Google AI features)

Googlebot

Core crawling for organic SEO

Search logs and crawl stats

Keep allowed for continuity (robots.txt intro)

Note: Publishers report heavy AI access with limited referrals, which is why Cloudflare added monetisation options such as Pay Per Crawl to restore balance (Reuters coverage).


Implementation checklist for teams

  • Align legal, security and marketing on an AI access policy that supports visibility while protecting sensitive content, since some agents have been accused of evading controls (The Verge report).

  • Publish robots.txt rules for each agent you allow or block, and document decisions for stakeholders and auditors (Google robots.txt specification).

  • Prioritise access to high-intent pages: solutions, pricing, comparisons and FAQs that are likely to be cited in AI answers (Google Search Central on AI features).

  • Monitor logs weekly and compare bot hits to downstream engagement. If costs spike or compliance slips, throttle or block, then revisit monthly (Business Insider interview).

  • If you need a quick, low-risk rollout, indexLab can configure AI Crawl Control, ship robots.txt updates and validate access with reproducible tests, so your brand is seen, trusted and chosen in AI answers without adding workload to your team.


Conclusion

Default blocking protects content, but it can also hide your brand from answer engines that buyers now rely on. The practical move is to allow reputable AI bots, align robots.txt, verify with live tests, and monitor ROI as traffic patterns evolve. When you are ready to accelerate, indexLab’s AI Visibility Audit prioritises the exact pages and entities most likely to earn you citations across ChatGPT, Gemini and Perplexity, then implements the fixes with you, fast (indexlab.ai).


People Also Ask

Does allowing AI bots hurt SEO or security?

Allowing reputable bots does not harm your inclusion in Google Search when configured correctly, and you can still block training or misbehaving agents at the edge to protect cost and brand safety

How do robots.txt rules interact with Cloudflare’s AI Crawl Control?

Robots.txt declares your policy while Cloudflare enforces it in real time, including for agents that ignore robots.txt. Keep both aligned and use per-bot toggles for exceptions

Which pages should I allow AI bots to crawl first?

Start with commercial and evaluative intent pages, such as pricing, comparisons and FAQs, as these are most likely to be cited by answer engines during buying journeys