CF 的 ai 反爬 robots.txt 似乎部分情况下跟 Pages 机制冲突？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

For Existing Member Sign In

This topic created in 93 days ago, the information mentioned may be changed or developed.

触发条件

Pages 有一个根域名（二级域名好像没这问题）的自定义域
部署的 Pages 里没有 404.html，有正常的 index.html
仪表板的 AI Crawl Control => Robots.txt => Cloudflare managed 开着

现象

手动访问 xxx.com/robots.txt 的时候 index.html 的文件内容会出现在 CF 的 robots.txt 模板下面，感觉像 Pages 默认回落的逻辑也跟着执行了。大概就像这样：

# As a condition of accessing this website, you agree to abide by the following
# content signals:

...

# BEGIN Cloudflare Managed content

User-agent: *
Content-Signal: search=yes,ai-train=no
Allow: /

...

# END Cloudflare Managed Content

<!DOCTYPE html>
<html lang="zh">
	...
</html>

No Comments Yet

cf Pages robots.txt