• P03 Locke
    link
    fedilink
    English
    1311 months ago

    You think Google thought about robots.txt before they developed their search engine? Nah, it’s all public Internet, and they scraped away.

    A non-zero percentage of web sites will bother to follow these instructions, but it might as well be zero.

    • Scrubbles
      link
      fedilink
      English
      811 months ago

      Yeah I always assumed robots.txt only told them to hide it from search results, but Google still scrapes everything they can from you. The illusion they skipped over you

      • The Doctor
        link
        fedilink
        English
        111 months ago

        If you look in the server logs, you can see what their spiders are grabbing.

    • The Doctor
      link
      fedilink
      English
      111 months ago

      Very early on, at least, their spiders respected robots.txt.

      I know there are folks that have all of the Big G in their robots.txt files on principle, might want to ask them if it works or not.