UngodlyAudrey🏳️⚧️ to Technology@beehaw.org • 1 year agoSites scramble to block ChatGPT web crawler after instructions emergearstechnica.comexternal-linkmessage-square32fedilinkarrow-up1166arrow-down10cross-posted to: tech@kbin.social
arrow-up1166arrow-down1external-linkSites scramble to block ChatGPT web crawler after instructions emergearstechnica.comUngodlyAudrey🏳️⚧️ to Technology@beehaw.org • 1 year agomessage-square32fedilinkcross-posted to: tech@kbin.social
minus-square@ashtrix@lemmy.calinkfedilink21•1 year agoYeah, it’s already too late. Why didn’t they provide this before they already scraped websites?
minus-squareP03 LockelinkfedilinkEnglish13•1 year agoYou think Google thought about robots.txt before they developed their search engine? Nah, it’s all public Internet, and they scraped away. A non-zero percentage of web sites will bother to follow these instructions, but it might as well be zero.
minus-squareScrubbleslinkfedilinkEnglish8•1 year agoYeah I always assumed robots.txt only told them to hide it from search results, but Google still scrapes everything they can from you. The illusion they skipped over you
minus-squareThe DoctorlinkfedilinkEnglish1•1 year agoIf you look in the server logs, you can see what their spiders are grabbing.
minus-squareThe DoctorlinkfedilinkEnglish1•1 year agoVery early on, at least, their spiders respected robots.txt. I know there are folks that have all of the Big G in their robots.txt files on principle, might want to ask them if it works or not.
minus-square@acastcandream@beehaw.orglinkfedilink10•1 year agoI’m guessing this question is rhetorical lol
minus-square@snowbell@beehaw.orglinkfedilink4•1 year agoThere would have been no reason for people to care before they scraped all the websites.
Yeah, it’s already too late. Why didn’t they provide this before they already scraped websites?
You think Google thought about
robots.txt
before they developed their search engine? Nah, it’s all public Internet, and they scraped away.A non-zero percentage of web sites will bother to follow these instructions, but it might as well be zero.
Yeah I always assumed robots.txt only told them to hide it from search results, but Google still scrapes everything they can from you. The illusion they skipped over you
If you look in the server logs, you can see what their spiders are grabbing.
Very early on, at least, their spiders respected robots.txt.
I know there are folks that have all of the Big G in their robots.txt files on principle, might want to ask them if it works or not.
I’m guessing this question is rhetorical lol
There would have been no reason for people to care before they scraped all the websites.