Cynicus Rex@lemmy.ml to Privacy@lemmy.mlEnglish · 3 months agoHow to block AI Crawler Bots using robots.txt filewww.cyberciti.bizexternal-linkmessage-square62fedilinkarrow-up1107arrow-down132
arrow-up175arrow-down1external-linkHow to block AI Crawler Bots using robots.txt filewww.cyberciti.bizCynicus Rex@lemmy.ml to Privacy@lemmy.mlEnglish · 3 months agomessage-square62fedilink
minus-squareOnno (VK6FLAB)@lemmy.radiolinkfedilinkarrow-up23·3 months agoThis does not block anything at all. It’s a 1994 “standard” that requires voluntary compliance and the user-agent is a string set by the operator of the tool used to access your site. https://en.m.wikipedia.org/wiki/Robots.txt https://en.m.wikipedia.org/wiki/User-Agent_header In other words, the bot operator can ignore your robots.txt file and if you check your webserver logs, they can set their user-agent to whatever they like, so you cannot tell if they are ignoring you.
This does not block anything at all.
It’s a 1994 “standard” that requires voluntary compliance and the user-agent is a string set by the operator of the tool used to access your site.
https://en.m.wikipedia.org/wiki/Robots.txt
https://en.m.wikipedia.org/wiki/User-Agent_header
In other words, the bot operator can ignore your robots.txt file and if you check your webserver logs, they can set their user-agent to whatever they like, so you cannot tell if they are ignoring you.