• stoy@lemmy.zip
    link
    fedilink
    arrow-up
    35
    ·
    8 months ago

    Google never backed up the internet, sure they did cache pages, but that isn’t even close to backing up the internet

  • alyaza [they/she]@beehaw.orgM
    link
    fedilink
    arrow-up
    33
    ·
    edit-2
    8 months ago

    Google “Search Liaison” Danny Sullivan confirmed the feature removal in an X post, saying the feature “was meant for helping people access pages when way back, you often couldn’t depend on a page loading. These days, things have greatly improved. So, it was decided to retire it.”

    okay but… has it? this seems like an unfounded premise, intuitively speaking

    • Semi-Hemi-Demigod@kbin.social
      link
      fedilink
      arrow-up
      15
      ·
      8 months ago

      “What excuse could we use for this cost-cutting measure?”

      “Uh, we could just say that people don’t need it anymore.”

      “Johnson, get that man a promotion!”

    • Otter@lemmy.ca
      link
      fedilink
      English
      arrow-up
      7
      ·
      8 months ago

      Yea I’ve been using it more and more recently, although part of that is sites like Twitter or Reddit randomly hiding content

    • ciferecaNinjo@fedia.io
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      Bingo. When I read that part of the article, I felt insulted. People see the web getting increasingly enshitified and less accessible. The increased need for cached pages has justified the existence of 12ft.io.

      ~40% of my web access is now dependant on archive.org and 12ft.io.

      So yes, Google is obviously bullshitting. Clearly there is a real reason for nixing cached pages and Google is concealing that reason.

    • Smoke@beehaw.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      8 months ago

      There’s ways to rate limit, like increasing response time per IP address per hour to make rapid, massed requests slower and easier to handle. Taking them all down at once is an extreme move.

    • smeg@feddit.uk
      link
      fedilink
      English
      arrow-up
      5
      ·
      8 months ago

      “Enshittification” isn’t just company did bad thing, you know

      • TheRtRevKaiser@beehaw.orgM
        link
        fedilink
        arrow-up
        14
        ·
        8 months ago

        It isn’t, but I think this probably fits. Enshittification is when a company provides useful, good services to gain users, then once those users are locked in they start degrading those service or removing features to cut costs, right? That seems like a pretty close analogy to what’s going on here, I’d think.

        • Powerpoint@lemmy.ca
          link
          fedilink
          arrow-up
          3
          ·
          8 months ago

          I doubt there’s even a cost cut here. They’re most likely still doing the work, just not making it available.

        • smeg@feddit.uk
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 months ago

          I think that’s still just “what businesses do in general”, enshittification is specifically:

          1. Offer a great service as a middleman so users want to use your platform and customers want to sell through it (i.e. get the market share)
          2. Once the users are used to using it and are sort of locked in, crank up the costs so your customers get their returns
          3. One they are locked in, crank up the costs for them so you profit

          From the original post that defined it:

          First, they are good to their users; then they abuse their users to make things better for their business customers; finally, they abuse those business customers to claw back all the value for themselves. Then, they die.

  • bedrooms@kbin.social
    link
    fedilink
    arrow-up
    6
    ·
    8 months ago

    Maybe they don’t want to give rival AI devs data access? It’s not typical for Google to give up data.

    • ciferecaNinjo@fedia.io
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      As far as we know, Google is not giving up any data. The crawler still must store a copy of the text for the index. The only certainty we have is that Google is no longer sharing it.

  • ciferecaNinjo@fedia.io
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    8 months ago

    From the article:

    “was meant for helping people access pages when way back, you often couldn’t depend on a page loading. These days, things have greatly improved. So, it was decided to retire it.” (emphasis added)

    Bullshit! The web gets increasingly enshitified and content is less accessible every day.

    For now, you can still build your own cache links even without the button, just by going to “https://webcache.googleusercontent.com/search?q=cache:” plus a website URL, or by typing “cache:” plus a URL into Google Search.

    You can also use 12ft.io.

    Cached links were great if the website was down or quickly changed, but they also gave some insight over the years about how the “Google Bot” web crawler views the web. … A lot of Google Bot details are shrouded in secrecy to hide from SEO spammers, but you could learn a lot by investigating what cached pages look like.

    Okay, so there’s a more plausible theory about the real reason for this move. Google may be trying to increase the secrecy of how its crawler functions.

    The pages aren’t necessarily rendered like how you would expect.

    More importantly, they don’t render the way authors expect. And that’s a fucking good thing! It’s how caching helps give us some escape from enshification. From the 12ft.io faq:

    “Prepend 12ft.io/ to the URL webpage, and we’ll try our best to remove the popups, ads, and other visual distractions.

    It also circumvents #paywalls. No doubt there must be legal pressure on Google from angry website owners who want to force their content to come with garbage.

    The death of cached sites will mean the Internet Archive has a larger burden of archiving and tracking changes on the world’s webpages.

    The possibly good news is that Google’s role shrinks a bit. Any Google shrinkage is a good outcome overall. But there is a concerning relationship between archive.org and Cloudflare. I depend heavily on archive.org largely because Cloudflare has broken ~25% of the web. The day #InternetArchive becomes Cloudflared itself, we’re fucked.

    We need several non-profits to archive the web in parallel redundancy with archive.org.

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    2
    ·
    8 months ago

    🤖 I’m a bot that provides automatic summaries for articles:

    Click here to see the summary

    Google Search’s “cached” links have long been an alternative way to load a website that was down or had changed, but now the company is killing them off.

    The feature has been appearing and disappearing for some people since December, and currently, we don’t see any cache links in Google Search.

    Cached links used to live under the drop-down menu next to every search result on Google’s page.

    As the Google web crawler scoured the Internet for new and updated webpages, it would also save a copy of whatever it was seeing.

    That quickly led to Google having a backup of basically the entire Internet, using what was probably an uncountable number of petabytes of data.

    In 2020, Google switched to mobile-by-default, so for instance, if you visit that cached Ars link from earlier, you get the mobile site.


    Saved 68% of original text.

  • ciferecaNinjo@fedia.io
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    8 months ago

    Here’s the heart of the not-so-obvious problem:

    Websites treat the Google crawler like a 1st class citizen. Paywalls give Google unpaid junk-free access. Then Google search results direct people to a website that treats humans differently (worse). So Google users are led to sites they cannot access. The heart of the problem is access inequality. Google effectively serves to refer people to sites that are not publicly accessible.

    I do not want to see search results I cannot access. Google cache was the equalizer that neutralizes that problem. Now that problem is back in our face.