V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
LANCDN

CF 的 ai 反爬 robots.txt 似乎部分情况下跟 Pages 机制冲突?

  •  
  •   LANCDN · Jan 28 · 1386 views
    This topic created in 93 days ago, the information mentioned may be changed or developed.

    触发条件

    • Pages 有一个根域名(二级域名好像没这问题)的自定义域
    • 部署的 Pages 里没有 404.html,有正常的 index.html
    • 仪表板的 AI Crawl Control => Robots.txt => Cloudflare managed 开着

    现象

    • 手动访问 xxx.com/robots.txt 的时候 index.html 的文件内容会出现在 CF 的 robots.txt 模板下面,感觉像 Pages 默认回落的逻辑也跟着执行了。大概就像这样:
    # As a condition of accessing this website, you agree to abide by the following
    # content signals:
    
    ...
    
    # BEGIN Cloudflare Managed content
    
    User-agent: *
    Content-Signal: search=yes,ai-train=no
    Allow: /
    
    ...
    
    # END Cloudflare Managed Content
    
    <!DOCTYPE html>
    <html lang="zh">
    	...
    </html>
    
    No Comments Yet
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   801 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 29ms · UTC 20:35 · PVG 04:35 · LAX 13:35 · JFK 16:35
    ♥ Do have faith in what you're doing.