阻止常见采集器的robots.txt 代码

高金铜 发表于 2016-11-9 00:40:39

转来的我不懂
# No thanks Google

User-agent: *
Disallow: /

# No thanks HTTrack etc

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: WebReaper
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: Web Downloader
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: Offline Explorer Pro
Disallow: /

User-agent: HTTrack Website Copier
Disallow: /

User-agent: Offline Commander
Disallow: /

User-agent: Leech
Disallow: /

User-agent: WebSnake
Disallow: /

User-agent: BlackWidow
Disallow: /

User-agent: HTTP Weazel
Disallow: /

michael2016 发表于 2016-11-9 12:56:32

这个是没法防的
我是用Nginx Conf文件里去阻止蜘蛛

高金铜 发表于 2016-11-9 13:10:40

michael2016 发表于 2016-11-9 12:56
这个是没法防的
我是用Nginx Conf文件里去阻止蜘蛛

是的，没法防，这种只是对纯小白，依靠采集，又什么都不懂得人来说的

页: [1]

AdvertCN - 广告中国's Archiver

阻止常见采集器的robots.txt 代码