• @ashtrix@lemmy.ca
    link
    fedilink
    2111 months ago

    Yeah, it’s already too late. Why didn’t they provide this before they already scraped websites?

    • P03 Locke
      link
      fedilink
      English
      1311 months ago

      You think Google thought about robots.txt before they developed their search engine? Nah, it’s all public Internet, and they scraped away.

      A non-zero percentage of web sites will bother to follow these instructions, but it might as well be zero.

      • Scrubbles
        link
        fedilink
        English
        811 months ago

        Yeah I always assumed robots.txt only told them to hide it from search results, but Google still scrapes everything they can from you. The illusion they skipped over you

        • The Doctor
          link
          fedilink
          English
          111 months ago

          If you look in the server logs, you can see what their spiders are grabbing.

      • The Doctor
        link
        fedilink
        English
        111 months ago

        Very early on, at least, their spiders respected robots.txt.

        I know there are folks that have all of the Big G in their robots.txt files on principle, might want to ask them if it works or not.