• x00z@lemmy.world
    link
    fedilink
    English
    arrow-up
    143
    ·
    2 months ago

    Hi, I’m new here. Because of the bullshit with Reddit. Greetings fellow Lemmy people.

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      58
      ·
      2 months ago

      Blocking other search engines will hurt Reddit, all else held equal. But not by that much. Google is seriously dominant in the search engine market.

      kagis

      Yeah.

      https://gs.statcounter.com/search-engine-market-share

      According to this, Google has 91.06% of the search engine market. So for Reddit, they’re talking about cutting themselves off from a little under 9% of people searching out there. Which…I mean, it isn’t insignificant, but it isn’t likely gonna hurt them all that badly.

      • eronth@lemmy.world
        link
        fedilink
        English
        arrow-up
        28
        ·
        2 months ago

        It’s also worth noting that the 9% they cut off was probably the group more inclined to already be using alternatives to Reddit anyways.

          • whatwhatwhatwhat@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            23 days ago

            Seconding this. I work in IT, and the number of tech-illiterate people using DuckDuckGo as their default search engine is astounding. It’s got to be about 10% of our users (none of whom are in tech roles).

      • scarabic@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 months ago

        Yeah I thought the same so it’s good to see the numbers. I don’t think people realize that to support a search engine means letting them crawl your pages which means serving all your pages to them, which costs server resources. A lot of sites get more crawler load than load from actual users viewing pages. It’s a real cost.

        Still, you’d think they could manage to support DuckDuckGo at least. Or a small set of search giants to give some appearance of supporting competition.

    • kratoz29@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 months ago

      One only can hope, but until people learns that you can use other browser and other search engine not likely (I am talking on Google side ofc, Reddit might be affected by this in the long run).

  • z3rOR0ne@lemmy.ml
    link
    fedilink
    English
    arrow-up
    76
    arrow-down
    1
    ·
    2 months ago

    I’ve posted this elsewhere, but it bears repeating:

    Just use ddg bangs if you use Duckduckgo and you can search reddit directly.

    !reddit search term
    

    or:

    !r search term
    

    It still picks up latest posts related to reddit, it just searches reddit directly instead of searching Bing’s results. It’s that simple.

    You can even use a redirect extension like Libredirect in conjunction with this Duckduckgo feature to redirect your search to a privacy respecting frontend like redlib.

      • lennivelkant@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        2 months ago

        I used to sneer at the kids in my class that used it. Must have been fairly shortly after it launched, something like fourteen to fifteen years ago. I’m still grappling with a certain inertia when it comes to switching away from something I have relied on for so long, but I’m coming around to the idea of giving DDG a try at least (irrational as it is, I’ve been reluctant to even try - I suspect out of fear of liking it and having to change).

        Past Me would be exasperated that Present Me is even toying with the idea. But then, Past Me had a lot of stupid takes anyway.

        • unconfirmedsourcesDOTgov@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          6
          ·
          2 months ago

          I went through the same process that you’re describing. In the end, I gave it a shot and, anecdotally, I feel like I find the things I’m looking for faster than I was with Google and with no shoddy ai summaries.

          • noli@lemmy.zip
            link
            fedilink
            English
            arrow-up
            6
            ·
            2 months ago

            I like to say that DDG gives you what you searched for while google gives you what it thinks you wanted.

        • KillingTimeItself@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          ever wonder how to deal with it? Just switch to something and deal with the consequences of switching, don’t bother thinking about it. There are things worth thinking about, and then there are things worth having experience with, most of the time, having experience is more worthwhile.

          • Kyouki@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            2 months ago

            I like this one, i tend to do this as well. Possibly discover something new and more geared or useful to you; or else an experience that tells you what doesn’t fit for you.

            I’ve gotten really good with ddg searches to where I find much more than I did on Google bypassing the first big payers to Google to stay on top… Even if it’s not relevant to my search. I stuck around with ddg and now as I grown into other area’s of IT like Linux, I noticed there were a lot of great bangs that could get me towards the information I wanted.

            Same goes for ddg as for Linux to develop new workflows to keep it fresh and make computing fun again.

            • KillingTimeItself@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              2
              ·
              2 months ago

              yup, it also applies in other areas of life, hobbies, projects, work, whatever, you can apply it basically anywhere and get something interesting out of it.

    • squidspinachfootball@lemm.ee
      link
      fedilink
      English
      arrow-up
      8
      ·
      2 months ago

      I think !reddit just sends you directly to reddit and uses reddit’s search engine, which has been infamously bad. Has that changed? It doesn’t seem to be quite the same as appending “reddit” to queries to search for reddit posts, but using better search engines.

      • z3rOR0ne@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        2 months ago

        Honestly, reddit’s search engine is okay, but yeah it doesn’t get as exact as standard search engines because I think it prioritizes keywords from the post title over comments and also prioritizes most recent posts over subject relevance. That said, the old reddit posts are still going to be accessible via standard not google search engines.

        I’ll admit this is somewhat of a bandaid fix, as should reddit keep this deal with google going, eventually this workaround will prove less effective than it currently is.

        This workaround just gets you the newest posts related to your query, and otherwise, for older posts, the search term reddit in search engines is still superior. So I don’t know, it’s the best solution I can think of for now.

  • Dr. Moose@lemmy.world
    link
    fedilink
    English
    arrow-up
    59
    arrow-down
    3
    ·
    edit-2
    2 months ago

    Reddit responded: “Only google pays us”. The content is not yours. You built this of naive user base that just wanted to share now these fuckers are taking it as their entitlement. As early an reddit user - fuck that place, I’m still angry.

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        21
        arrow-down
        4
        ·
        2 months ago

        No, I don’t think so. Just because you put a clause in ToS doesn’t make it legally binding and most precedent is in favor of the original copyright owner.

      • Jeffool @lemmy.world
        link
        fedilink
        English
        arrow-up
        13
        ·
        2 months ago

        If someone posts a copyright violation on YouTube, YouTube can go free under the safe harbor provisions of the DMCA. (In the US.) YouTube just points a finger at the user and says “it’s their fault”, because the user owns (or claims to own) the content. YouTube is just hosting it.

        I don’t know of any reason to think it’s not the same for written works. User posts them, Reddit hosts them, user still owns them. Like YouTube, the user gives the host a lot of license for that content, so that they can technically copy and transmit it. But ultimately the user owns it. I assume by the time Reddit made the AI deal they probably put in wording to include “selling a copy of the data” to active they want in the TOS.

        Now, determining if the TOS holds up in court is of course trickier. And did they even make us click our permission away again after they added it, it just change something we already clicked? I don’t recall.

        • Tja@programming.dev
          link
          fedilink
          English
          arrow-up
          5
          ·
          2 months ago

          Usually any hosting platform has some kind of wording to the tune of “you give us permanent and unrestricted right to use your content however we want”. Copyright is still yours, but you can’t use it against the platform. Applies to social networks, YouTube, Flickr, anything I can think of.

  • didnt1able@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    38
    arrow-down
    1
    ·
    2 months ago

    I wish we had a government that functioned. This shot is 100% antitrust. How is it that this shit is let fly.

    • MeatsOfRage@lemmy.world
      link
      fedilink
      English
      arrow-up
      19
      ·
      edit-2
      2 months ago

      Around here we love the idea of Reddit being totally devoid of life but the fact is it’s still one of the most active public facing sites on the web. The attrition to sites like Lemmy is pretty negligible to the overall Reddit activity and bot AI activity only really affects the largest subreddits which have always been a bit spammy and click batey. The medium and small subreddits are still full of active people. Don’t get me wrong, Lemmy is my daily driver for this content but I won’t pretend everyone fled Reddit for this.

      Additionally, exclusivity with Google isn’t necessary just to keep the search results but to prevent their biggest AI competition ChatGPT and their ties to Microsoft from getting access to what is the Internet’s largest database of public facing conversation.

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      1
      ·
      edit-2
      2 months ago

      I wonder what kind of contract they went with.

      https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/

      SAN FRANCISCO, Feb 21 (Reuters) - Social media platform Reddit has struck a deal with Google (GOOGL.O) , opens new tab to make its content available for training the search engine giant’s artificial intelligence models, three people familiar with the matter said.

      The contract with Alphabet-owned Google is worth about $60 million per year, according to one of the sources.

      For perspective:

      https://www.cbsnews.com/news/google-reddit-60-million-deal-ai-training/

      In documents filed with the Securities and Exchange Commission, Reddit said it reported net income of $18.5 million — its first profit in two years — in the October-December quarter on revenue of $249.8 million.

      So if you annualize that, Reddit’s seeing revenue of about $1 billion/year, and net income of about $74 million/year.

      Given that Reddit granting exclusive indexing to Google happened at about the same time, I would assume that that AI-training deal included the exclusivity indexing agreement, but maybe it’s separate.

      My gut feeling is that the exclusivity thing is probably worth more than $60 million/year, that Google’s probably getting a pretty good deal. Like, Google did not buy Reddit, and Google’s done some pretty big acquisitions, like YouTube, and that’d have been another way for Google to get exclusive access. So I’d think that this deal is probably better for Google than buying Reddit. Reddit’s market capitalization is $10 billion, so Google is maybe paying 0.6% the value of Reddit per year to have exclusive training rights to their content and to be the only search engine indexing them; aside from Reddit users themselves running into content in subreddits, I’d guess that those two forms are probably the main way in which one might leverage the content there.

      Plus, my impression is that the idea that a number of companies have – which may or may not be valid – is that this is the beginning of the move away from search engines. Like, the idea is that down the line, the typical person doesn’t use a search engine to find a webpage somewhere that’s a primary source to find material. Instead, they just query an AI. That compiles all the data that it can see and spits out an answer. Saves some human searcher time and reduces complexity, and maybe can solve some problems if AIs can ultimately do a better job of filtering out erroneous information than humans. We definitely aren’t there yet in 2024, but if that’s where things are going, I think that it might make a lot of strategic sense for Google. If Google can lock up major sources of training data, keep Microsoft out, then it’s gonna put Microsoft in a difficult spot if Microsoft is gunning for the same thing.

        • tal@lemmy.today
          link
          fedilink
          English
          arrow-up
          4
          ·
          edit-2
          2 months ago

          If we do end up at a point without search engines, where AI does the search and summarizes an answer, what do you think their level of ability to tie back to source material will be?

          I haven’t used the text-based search queries myself; I’ve used LLM software, but not for this, so I don’t know what the current situation is like. My understanding is that current approach doesn’t really permit for it. And there are two issues with that:

          • There isn’t a direct link between one source and what’s being generated; the model isn’t really structured so as to retain this.

          • Many different sources probably contribute to the answer.

          All information contributes a little bit to the probability of the next word that the thing is spitting out. It’s not that the software rapidly looks through all pages out there and then finds a given single reputable source that could then cite, the way a human might. That is, you aren’t searching an enormous database when the query comes in, but repeatedly making use of a prediction that the next word in the correct response is a given word, and that probability is derived from many different sources. Maybe tens of thousands of people have made posts on a given subject; the response isn’t just a quote from one, and the generated text may appear in none of them.

          To maybe put that in terms of how a human might think, place you in the generative AI’s shoes, suppose I say to you “draw a house”. You draw a house with two windows, a flowerbed out front, whatever. I say “which house is that”? You can’t tell me, because you’re not trying to remember and present one house – you’re presenting me with a synthetic aggregate of many different houses; probably all houses have mentally contributed a bit to it. Maybe you could think of a given house that you’ve seen in the past that looks a fair bit like that house, but that’s not quite what I’m asking you to tell me. The answer is really “it doesn’t reflect a single house in the real world”, which isn’t really what you want to hear.

          It might be possible to basically run a traditional search for a generated response to find an example of that text, if it amounts to a quote (which it may not!)

          And if Google produces some kind of “reliability score” for a given piece of material and weights the material in the training set by that (which I will guess that if they don’t now, they will), they could maybe use the reliability score to try to rank various sources when doing that backwards search for relevant sources.

          But there’s no guarantee that that will succeed, because they’re ultimately synthesizing the response, not just quoting it, and because it can come from many sources. There may potentially be no one source that says what Google is handing back.

          It’s possible that there will be other methods than the present ones used for generating responses in the future, and those could have very different characteristics. Like, I would not be surprised, if this takes off, if the resulting system ten years down the road is considerably more complex than what is presently being done, even if to a user, the changes under the hood aren’t really directly visible.

          There’s been some discussion about developing systems that do permit for this, and I believe that if you want to read up on it, the term used is “attributability”, but I have not been reading research on it.

    • GreatAlbatross@feddit.uk
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 months ago

      At least on some smaller subs, there seems to be a suspicious amount of brand new accounts asking one question to get human answers.
      It would not surprise me if reddit, or some other service, are seeding to get more LLM-able content. Of course, this might backfire if people start giving stupid answers to eff up the data.

      • ✺roguetrick✺@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        2 months ago

        If I’m not mistaken, Reddit has actual staff centered around asking questions to get engagement in small communities. Not so much for LLM reasons but to actually grow those communities (and thus edge out competition).

    • gedaliyah@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      18
      ·
      2 months ago

      “We always obey the robots.txt”

      • A bunch of corporations that have no accountability and plenty of incentive to just ignore it and have all been caught training AI on off-limits data.
    • Kairos@lemmy.today
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      2 months ago

      They’re likely blocking user agents too, which I think also doesn’t have legal enforcement (as in DuckDuckGo can just use “Google” unless they said otherwise.

  • Burn_The_Right@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    2 months ago

    Google just enshittifying even harder. Reddit results in Google searches are often old and anemic these days.

    I used to want Reddit threads to show up in search results. Now I avoid them because they are so often a waste of time. More reason to use Duck Duck Go.

    • ChronosTriggerWarning@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 months ago

      I saw Reddit results in a search last night using DDG. It just said something like “It’s here on Reddit, but we’re not allowed to show you.” I wasn’t planning on using Reddit (never again), but that just irritated me.

  • Azzu@lemm.ee
    link
    fedilink
    English
    arrow-up
    19
    ·
    2 months ago

    I wish Lemmy were searchable better. The search function actually works decently well, but it’s not on the same level of actual search engines, it doesn’t seem to look for related/similar terms and also relevancy doesn’t seem right.

    • gedaliyah@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      14
      ·
      2 months ago

      I do occasionally find Lemmy in web search results. The platform is not that big (or old), but as long as it sticks around then eventually searchability will improve.

  • recapitated@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    ·
    2 months ago

    I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.

    I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I’m not saying it’s the right or best path.)

    • cordlesslamp@lemmy.today
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      2 months ago

      Can you use something like the DDOS filter to prevent AI automated scrapings (too many requests per second)?

      I’m not a tech person so probably don’t even know what I’m talking about.

      • generaldenmark@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        2 months ago

        I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP… I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered…

        They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data…

        I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly… meaning if you try’n ban IP’s, you might hit real users as well… which would be unfortunate.

      • GenosseFlosse@feddit.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        2 months ago

        Blocking bots is hard, because with some work they can be made to look like users, down to simulating curved mouse movements from one button to the next if you are really ambitious.

        • JovialMicrobial@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          So your saying reddit’s activity analytics can’t necessarily tell the difference between human activity and bot activity?

          So the actual number of people using reddit vs bots isn’t very clear. Someone should tell Reddit’s share holders that’s there’s no way to tell if the advertisements are actually being viewed by people, and there’s no way to tell how much the activity reports have been inflated by bots. I bet they wouldn’t like that very much.

          • GenosseFlosse@feddit.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            2 months ago

            Always has been. Technically the server sees no difference in what a browser does vs what a bot does: Downloading files and submitting requests.

  • MehBlah@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    ·
    2 months ago

    Bing it is then. I hate Microsoft with the intensity of thousand suns but bing is now my jam as long as this lasts.

        • CileTheSane@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          8
          ·
          2 months ago

          Yes, duckduckgo uses other search engines to provide its results. Your point?

          I don’t care where duckduckgo gets the links from, I care how relevant the top links are and that they aren’t being crowded out by ads.

          • Emmie@lemm.ee
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            2 months ago

            No need to be defensive, ddg uses bing which means it is part of the big five under the hood. That always will have certain ramifications in the long run.

            I also use it but I am looking for decentralised alternatives in meantime not because ddg is bad but because sooner or later it will get worse.

            Also why are you so aggressive anyway, it’s super weird and doesn’t fit Lemmy

        • CileTheSane@lemmy.ca
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          5
          ·
          2 months ago

          At best this is as intelligent as saying Google Maps is YouTube by another name because they’re both on Google servers. Even that would be smarter to say actually, because Google Maps and YouTube are owned by the same company.

          • MehBlah@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            4
            ·
            edit-2
            2 months ago

            When bing goes down so does duckduckgo but somehow your apples to oranges argument is somehow comparative to you.

            • CileTheSane@lemmy.ca
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              5
              ·
              2 months ago

              They share hosting servers, that doesn’t make them the same service. When the power goes out do you think you and your neighbors live in the same house?

              • MehBlah@lemmy.world
                link
                fedilink
                English
                arrow-up
                6
                arrow-down
                1
                ·
                2 months ago

                Just keep sucking down the hype. They don’t share the same hosting for the frontend but they both use the same backend. The backend is of course owned by microsoft. duckduckgo uses bings backend and somehow you have convinced yourself beyond all evidence to the contray that it isn’t bing with a different wrapper.

                • CileTheSane@lemmy.ca
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  9
                  ·
                  2 months ago

                  When you can’t pay Stardew Valley (because Steam is down) you also can’t play Eldenring. They must use the same backend and Eldenring is just Stardew Valley by another name.

                  You’re going to need a better source than “they go down at the same time”.

    • buttfarts@lemy.lol
      link
      fedilink
      English
      arrow-up
      11
      ·
      2 months ago

      I’ve started a Kagi subscription for my new search engine. Basically $6 USD per month but because it’s a user-pay model they have a really good privacy policy and don’t sell/analyze your data.

      It’s currently better than Google (which I still use search in the maps for reviews)