CEO Steve Huffman says tech giants should not be able to trawl Reddit’s huge store of data for free. But that information came from users, not the company

That “corpus of data” is the content posted by millions of Reddit users over the decades. It is a fascinating and valuable record of what they were thinking and obsessing about. Not the tiniest fraction of it was created by Huffman, his fellow executives or shareholders. It can only be seen as belonging to them because of whatever skewed “consent” agreement its credulous users felt obliged to click on before they could use the service.

Ouch

  • Margot Robbie@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Funniest thing to do is honestly replace your old comments with ChatGPT refusals. If you put “As an AI language model” everywhere, it’ll really mess with the ML algorithms to make your data useless.

  • impulse@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 years ago

    The more I think about it, the more I come to the conclusion that what really made me delete my account early (I initially wanted to wait until the 30th to see how things play out) was the ridiculous number of people defending this bullshit and promoting the official Reddit app as the superior option.

    Some going as far as saying 3rd party devs are leeches and scammers.

    I can only tolerate so much stupidity and ignorance before I bail.

    • LittleKerr@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 years ago

      Wait, you mean there’s people -actual real and not-paid by who knows people- who believes that the official Reddit app is superior?? I know a few that believe it’s not thaaat bad, but ‘superior’? Lmao

      • Sckharshantallas@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 years ago

        I see this kind of behavior happen a lot online, and asked ChatGPT about it:

        Yes, there is a term that describes this phenomenon. It’s called “oppositional belief perseverance” or “belief polarization.” This term refers to the tendency of individuals to cling to their initial beliefs even when presented with evidence that contradicts those beliefs. In the context you described, someone may initially take the opposite side of a discussion due to an opposition bias, but over time, they may start to internalize and genuinely believe the opposing viewpoint, thereby demonstrating belief polarization.

  • Nix@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 years ago

    It is rather interesting to note that this Corpus of data may not be as valuable if it cannot be used without always being legally in several grey areas (perhaps even red areas in some jurisdictions).

    Currently, an increasingly large pool of artist/writters/singers and other people (even corporations such as studios and large right holders) are exercising their rights to not have their creations and derived works be used or slurped into AI models without their express consent.

    Corporations making use of those AI models may find themselves in expensive legal limbo now and the foreseeable future.

    Considering no redditor imagined nor consented to have their post and comment history be comprehensively abused (as in “improper treatment or usage; application to a wrong or bad purpose; an unjust, corrupt or wrongful practice or custom”).

    We may enter a period where lawlessness pervades AI models (just like any gold rush, for example the current crypto craze). Eventually, the legal framework will catch up and will probably make any dubious Corpus of data untouchable.

    How long this takes is anyone’s guess. I surmise several large profile lawsuits would suffice.

    • JuxtaposedJaguar@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      I agree that this is a grey area, but it could really go either way. Anyway, giant corporations have been abusing individuals who can’t afford lawsuits for decades. Even with precedent on your side, that probably wouldn’t change.

      • Archer@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        2 years ago

        Yeah, if you think the current right-wing supreme court will find any big case in favor of an individual vs corporations, that’s wishful thinking

  • Zerlyna@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 years ago

    I said it with Facebook and would do the same for Reddit, I would happily pay a little each month to not have my data sold or used inappropriately and be ad free.

    • yacht_boy@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 years ago

      I am trying out the Kagi search tool for that very reason, and their Orion browser. Have not yet signed up to pay them $5/month but am leaning towards it.

      But when I mentioned it here I had someone immediately saying they couldn’t see spending $60/year on something they are used to getting for “free.”

      • WillfulBedder@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        2 years ago

        I’ve been using Kagi for the past two months. Honestly, I’m pretty happy with it and I can’t think of any major misses in terms of search accuracy. I think it’s very difficult to get out of the mindset that search should be free, but I’m trying to put my money where my mouth is as it were, to try to support adtech free services.

  • wotsit_sandwich@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 years ago

    I am enjoying being able to observe this story from the beginning, before the media started writing about it. It’s been an interesting few weeks.

  • constantokra@lemmy.one
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    2 years ago

    Wide op for ai scraping and nothing are not the only two options. They could easily limit api calls to what would be good for single users or mods and have each user generate their own key. Apps could let users input their key. Most users wouldn’t bother and would switch to their app anyway so it would get them 95% or what they claim to want without being a dick about it.