Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

misk@sopuli.xyz · 1 year ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

Rooki@lemmy.world · 1 year ago

If this is true, then we should prepare to be shout at by chatgpt why we didnt knew already that simple error.

snekerpimp@lemmy.world · 1 year ago

ChatGPT now just says “read the docs!” To every question

Ekky@sopuli.xyz · 1 year ago

And then links to a similar sounding but ultimately totally unrelated site.

Serinus@lemmy.world · 1 year ago

Stack overflow was the pioneer of hallucinations.

angelsomething@lemmy.one · 1 year ago

Already had that happen with perplexity, like, no mate, I’m asking you.

JJROKCZ@lemmy.world · 1 year ago

Always love those answers, well if you read the 700 page white paper on this one command set in one module then you would understand… do you think I have the time to read 37000 pages of bland ass documentation yearly on top of doing my actual job? Come the fuck on.

I guess some of these guys have so many heads on their crews that they don’t have much work to do anymore but that’s not the case for most

catloaf@lemm.ee · 1 year ago

Honestly, that wouldn’t be the worst thing in the world.

NuXCOM_90Percent@lemmy.zip · 1 year ago

You joke.

This would have been probably early last year? Had to look up how to do something in fortran (because fortran) and the answer was very much in the voice of that one dude on the Intel forums who has been answering every single question for decades(?) at this point. Which means it also refused to do anything with features newer than 1992 and was worthless.

Tried again while chatting with an old work buddy a few months back and it looks like they updated to acknowledging f99 and f03 exist. So assume that was all stack overflow.

BlanketsWithSmallpox@lemmy.world · 1 year ago

This message brought to you by chatgpt bot.

just_another_person@lemmy.world · 1 year ago

I got an email ban.

1609 hours logged 431 solved threads

Guru_Insights99@lemm.ee · 1 year ago

Well, it is important to comply with the terms of service established by the website. It is highly recommended to familiarize oneself with the legally binding documents of the platform, including the Terms of Service (Section 2.1), User Agreement (Section 4.2), and Community Guidelines (Section 3.1), which explicitly outline the obligations and restrictions imposed upon users. By refraining from engaging in activities explicitly prohibited within these sections, you will be better positioned to maintain compliance with the platform’s rules and regulations and not receive email bans in the future.

HauntedCupcake@lemmy.world · 1 year ago

Is this a joke?

FlorianSimon@sh.itjust.works · edit-2 8 months ago

Removed by mod

Potatos_are_not_friends@lemmy.world · 1 year ago

NGL I read it and laughed at the AI-like response.

Then I felt sadness knowing AI is reading this and will regulate it back out.

pivot_root@lemmy.world · 1 year ago

AI-generated content trained on LLMs is poison for training, so that’s actually a good thing :)

TachyonTele@lemm.ee · 1 year ago

It’s not. This is how this person talks in every comment they make.

FlorianSimon@sh.itjust.works · edit-2 8 months ago

Removed by mod

TachyonTele@lemm.ee · 1 year ago

Tough to say. I honestly don’t know. The user name is the classic word_wordNumber that bots use. The comments are long though. But its comments are spaced far apart timewise.

If it’s a joke account it’s doing it rarely.

𝓔𝓶𝓶𝓲𝓮@lemm.ee · edit-2 1 year ago

Comments are clearly ChatGPT I know because I did it once to troll some sub too. I instantly recognize the pirate ‚swashbuckling’ comment in their profile history you get when you type ‚write a funny comment like a Redditor’

slaacaa@lemmy.world · 1 year ago

Damn, I read some of their other comments. What a said and weird life this person might have to write wall of texts just to gather dozens of downvotes

𝓔𝓶𝓶𝓲𝓮@lemm.ee · edit-2 1 year ago

Maybe they are a walking ai poisoning attack. I mean the whole person

floofloof@lemmy.ca · 1 year ago

The account reads like they’re pasting AI-generated responses to everything. Maybe it’s someone’s experiment. The prompt must include “You are a self-righteous asshole.”

Optional@lemmy.world · 1 year ago

Yes and it’s very well done which is why 121 people who didn’t get it downvoted it. ha! No good comment, amirite.

gravitas_deficiency@sh.itjust.works · 1 year ago

Check the post history. Dude just seems like an ass.

PumaStoleMyBluff@lemmy.world · 1 year ago

Looks like a chat bot instructed to say something contrarian

goferking (he/him)@lemmy.sdf.org · 1 year ago

Hopefully a troll account after looking at other comments but who knows anymore

Tikiporch@lemmy.world · 1 year ago

Looks like an AI crafted response to me.

lagomorphlecture@lemm.ee · 1 year ago

I took it as a joke because they can just change the rules whenever they want but Idk I might have misunderstood.

gravitas_deficiency@sh.itjust.works · 1 year ago

Nah, but the user is. Their post history is… interesting.

Grandwolf319@sh.itjust.works · 1 year ago

Shit like this makes me so glad that I just don’t sign up for these things if I don’t have to.

30 page TOS? You know what, I don’t need to make an account that bad.

Rai@lemmy.dbzer0.com · 1 year ago

ITT: People unable to recognize a joke

Cornelius_Wangenheim@lemmy.world · 1 year ago

Jokes are supposed to be funny.

gandalf_der_12te@discuss.tchncs.de · 1 year ago

Poe’s law

Bell@lemmy.world · 1 year ago

Take all you want, it will only take a few hallucinations before no one trusts LLMs to write code or give advice

sramder@lemmy.world · 1 year ago

[…]will only take a few hallucinations before no one trusts LLMs to write code or give advice

Because none of us have ever blindly pasted some code we got off google and crossed our fingers ;-)

Avid Amoeba@lemmy.ca · edit-2 1 year ago

It’s way easier to figure that out than check ChatGPT hallucinations. There’s usually someone saying why a response in SO is wrong, either in another response or a comment. You can filter most of the garbage right at that point, without having to put it in your codebase and discover that the hard way. You get none of that information with ChatGPT. The data spat out is not equivalent.

deweydecibel@lemmy.world · 1 year ago

That’s an important point, and and it ties into the way ChatGPT and other LLMs take advantage of a flaw in the human brain:

Because it impersonates a human, people are more inherently willing to trust it. To think it’s “smart”. It’s dangerous how people who don’t know any better (and many people that do know better) will defer to it, consciously or unconsciously, as an authority and never second guess it.

And the fact it’s a one on one conversation, no comment sections, no one else looking at the responses to call them out as bullshit, the user just won’t second guess it.

Hackerman_uwu@lemmy.world · 1 year ago

When you paste that code you do it in your private IDE, in a dev environment and you test it thoroughly before handing it off to the next person to test before it goes to production.

Hitting up ChatPPT for the answer to a question that you then vomit out in a meeting as if it’s knowledge is totally different.

sramder@lemmy.world · 1 year ago

Which is why I used the former as an example and not the latter.

I’m not trying to make a general case for AI generated code here… just poking fun at the notion that a few errors will put people off using it.

Seasm0ke@lemmy.world · 1 year ago

Split segment of data without pii to staging database, test pasted script, completely rewrite script over the next three hours.

Spedwell@lemmy.world · 1 year ago

We should already be at that point. We have already seen LLMs’ potential to inadvertently backdoor your code and to inadvertently help you violate copyright law (I guess we do need to wait to see what the courts rule, but I’ll be rooting for the open-source authors).

If you use LLMs in your professional work, you’re crazy. I would never be comfortably opening myself up to the legal and security liabilities of AI tools.

Cubes@lemm.ee · 1 year ago

If you use LLMs in your professional work, you’re crazy

Eh, we use copilot at work and it can be pretty helpful. You should always check and understand any code you commit to any project, so if you just blindly paste flawed code (like with stack overflow,) that’s kind of on you for not understanding what you’re doing.

Spedwell@lemmy.world · 1 year ago

The issue on the copyright front is the same kind of professional standards and professional ethics that should stop you from just outright copying open-source code into your application. It may be very small portions of code, and you may never get caught, but you simply don’t do that. If you wouldn’t steal a function from a copyleft open-source project, you wouldn’t use that function when copilot suggests it. Idk if copilot has added license tracing yet (been a while since I used it), but absent that feature you are entirely blind to the extent which it’s output is infringing on licenses. That’s huge legal liability to your employer, and an ethical coinflip.

Regarding understanding of code, you’re right. You have to own what you submit into the codebase.

The drawback/risks of using LLMs or copilot are more to do with the fact it generates the likely code, which means it’s statistically biased to generate whatever common and unnoticeable bugged logic exists in the average github repo it trained on. It will at some point give you code you read and say “yep, looks right to me” and then actually has a subtle buffer overflow issue, or actually fails in an edge case, because in a way that is just unnoticeable enough.

And you can make the argument that it’s your responsibility to find that (it is). But I’ve seen some examples thrown around on twitter of just slightly bugged loops; I’ve seen examples of it replicated known vulnerabilities; and we have that package name fiasco in the that first article above.

If I ask myself would I definitely have caught that? the answer is only a maybe. If it replicates a vulnerability that existed in open-source code for years before it was noticed, do you really trust yourself to identify that the moment copilot suggests it to you?

I guess it all depends on stakes too. If you’re generating buggy JavaScript who cares.

Amanduh@lemm.ee · 1 year ago

Yeah but if you’re not feeding it protected code and just asking simple questions for libraries etc then it’s good

Grandwolf319@sh.itjust.works · 1 year ago

I feel like it had to cause an actual disaster with assets getting destroyed to become part of common knowledge (like the challenger shuttle or something).

FaceDeer@fedia.io · 1 year ago

Maybe for people who have no clue how to work with an LLM. They don’t have to be perfect to still be incredibly valuable, I make use of them all the time and hallucinations aren’t a problem if you use the right tools for the job in the right way.

barsquid@lemmy.world · 1 year ago

The last time I saw someone talk about using the right LLM tool for the job, they were describing turning two minutes of writing a simple map/reduce into one minute of reading enough to confirm the generated one worked. I think I’ll pass on that.

linearchaos@lemmy.world · 1 year ago

confirm the generated one worked. I think I’ll pass on tha

LLM wasn’t the right tool for the job, so search engine companies made their search engines suck so bad that it was an acceptable replacement.

NuXCOM_90Percent@lemmy.zip · 1 year ago

Honestly? I think search engines are actually the best use for LLMs. We just need them to be “explainable” and actually cite things.

Even going back to the AOL days, Ask Jeeves was awesome and a lot of us STILL write our google queries in question form when we aren’t looking for a specific factoid. And LLMs are awesome for parsing those semi-rambling queries like “I am thinking of a book. It was maybe in the early 00s? It was about a former fighter pilot turned ship captain leading the first FTL expedition and he found aliens and it ended with him and humanity fighting off an alien invasion on Earth” and can build on queries to drill down until you have the answer (Evan Currie’s Odyssey One, by the way).

Combine that with citations of what page(s) the information was pulled from and you have a PERFECT search engine.

notabot@lemm.ee · 1 year ago

That may be your perfect search engine, I jyst want proper boolean operators on a sesrch engine that doesn’t think it knows what I want better than I do, and doesn’t pack the results out with pages that don’t match all the criteria just for the sake of it. The sort of thing you described would be anathema to me, as I suspect my preferred option may be to you.

linearchaos@lemmy.world · 1 year ago

They are VERY VERY good at search engine work with a few caveats that we’ll eventually nail. The problem is, they’re WAY to expensive for that purpose. Single queries take tons of compute and power. Constant training on new data takes boatloads of power.

They’re the opposite of efficient; eventually, they’ll have to start charging you a subscription to search with them to stay in business.

Grandwolf319@sh.itjust.works · 1 year ago

So my company said they might use it to improve confluence search, I was like fuck yeah! Finally a good use.

But to be fair, that’s mostly because confluence search sucks to begin with.

FaceDeer@fedia.io · 1 year ago

You’re describing Bing Chat.

NuXCOM_90Percent@lemmy.zip · 1 year ago

And google gemini (?) and kagi’s LLM and all the other ones.

Grandwolf319@sh.itjust.works · 1 year ago

Yeah, every time someone says how useful they find LLM for code I just assume they are doing the most basic shit (so far it’s been true).

JDubbleu@programming.dev · 1 year ago

That’s a 50% time reduction for the same output which sounds great to me.

I’d much rather let an LLM do the menial shit with my validation while I focus on larger problems such as system and API design, or creating rollback plans for major upgrades instead of expending mental energy writing something that has been written a thousand times. They’re not gonna rewrite your entire codebase, but they’re incredibly useful for the small stuff.

I’m not even particularly into LLMs, and they’re definitely not gonna change the world in the way big tech would like you to believe. However, to deny their usefulness is silly.

barsquid@lemmy.world · 1 year ago

It’s not a consistent 50%, it’s 50% off one task that’s so simple it takes two minutes. I’m not doing enough of that where shaving off minutes is helpful. Maybe other people are writing way more boilerplate than I am or something.

JDubbleu@programming.dev · 1 year ago

Those little things add up though, and it’s not just good at boilerplate. Also just having a more intelligent context-aware auto complete itself I’ve found to be super valuable.

antihumanitarian@lemmy.world · 1 year ago

Have you tried recent models? They’re not perfect no, but they can usually get you most of the way there if not all the way. If you know how to structure the problem and prompt, granted.

capital@lemmy.world · edit-2 1 year ago

People keep saying this but it’s just wrong.

Maybe I haven’t tried the language you have but it’s pretty damn good at code.

Granted, whatever it puts out needs to be tested and possibly edited but that’s the same thing we had to do with Stack Overflow answers.

CeeBee@lemmy.world · 1 year ago

I’ve tried a lot of scenarios and languages with various LLMs. The biggest takeaway I have is that AI can get you started on something or help you solve some issues. I’ve generally found that anything beyond a block or two of code becomes useless. The more it generates the more weirdness starts popping up, or it outright hallucinates.

For example, today I used an LLM to help me tighten up an incredibly verbose bit of code. Today was just not my day and I knew there was a cleaner way of doing it, but it just wasn’t coming to me. A quick “make this cleaner: <code>” and I was back to the rest of the code.

This is what LLMs are currently good for. They are just another tool like tab completion or code linting

VirtualOdour@sh.itjust.works · 1 year ago

I use it all the time and it’s brilliant when you put in the basic effort to learn how to use it effectively.

It’s allowing me and other open source devs to increase the scope and speed of our contributions, just talking through problems is invaluable. Greedy selfish people wanting to destroy things that help so many is exactly the rolling coal mentality - fuck everyone else I don’t want the world to change around me! Makes me so despondent about the future of humanity.

NuXCOM_90Percent@lemmy.zip · 1 year ago

We already have those near constantly. And we still keep asking queries.

People assume that LLMs need to be ready to replace a principle engineer or a doctor or lawyer with decades of experience.

This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know… they are idiots who need people to check their work constantly. An LLM making up some functions because they saw it in stack overflow but never tested is not at all different than a hotshot intern who copied some code from stack overflow and never tested it.

Except one costs a lot less…

NaibofTabr@infosec.pub · edit-2 1 year ago

This is already at the point where we can replace an intern or one of the less good junior engineers.

This is a bad thing.

Not just because it will put the people you’re talking about out of work in the short term, but because it will prevent the next generation of developers from getting that low-level experience. They’re not “idiots”, they’re inexperienced. They need to get experience. They won’t if they’re replaced by automation.

ipkpjersi@lemmy.ml · edit-2 1 year ago

First a nearly unprecedented world-wide pandemic followed almost immediately by record-breaking layoffs then AI taking over the world, man it is really not a good time to start out as a newer developer. I feel so fortunate that I started working full-time as a developer nearly a decade ago.

morrowind@lemmy.ml · 1 year ago

Dude the pandemic was amazing for devs, tech companies hiring like mad, really easy to get your foot in the door. Now, between all the layoffs and AI it is hellish

ipkpjersi@lemmy.ml · 1 year ago

I think it depends on where you live. Hiring didn’t go crazy where I live, but the layoffs afterwards sure did.

LucidNightmare@lemmy.world · 1 year ago

So, the whole point of learning is to ask questions from people who know more than you, so that you can gain the knowledge you need to succeed…

So… if you try to use these LLMs to replace parts of sectors, where there need to be people that can work their way to the next tier as they learn more and get better at their respective sectors, you do realize that eventually there will no longer be people that can move up their respective tier/position, because people like you said “Fuck ‘em, all in on this stupid LLM bullshit!” So now there are no more doctors, or real programmers, because people like you thought it would just be the GREATEST idea to replace humans with fucking LLMs.

You do see that, right?

Calling people fucking stupid, because they are learning, is actually pretty fucking stupid.

NuXCOM_90Percent@lemmy.zip · edit-2 1 year ago

Where did I say “Fuck 'em, all in on this stupid LLM bullshit!”?

But yes, there is a massive labor issue coming. That is why I am such a proponent of Universal Basic Income because there are not going to be enough jobs out there.

But as for training up the interns: Back in the day, do you know what “interns” did? And by “interns” I mean women because sexism but roll with me. Printing out and sorting punch cards. Compilers and general technical advances got rid of those jobs and pushed up where the “charlie work” goes.

These days? There are good internships/junior positions and bad ones. A good one actually teaches skills and encourages the worker to contribute. A bad one has them do the mindless grunt work that nobody else wants to. LLMs get rid of the latter.

And… I actually think that is good for the overall health of workers, if not the number (again, UBI). Because if someone can’t be trusted to write meaningful code without copying it off the internet and not even updating variable names? I don’t want to work with them. I spend too much of my workday babysitting those morons who are just here there to get some work experience so they can con their way into a different role and be someone else’s problem.

And experience will be gained the way it is increasingly being gained. Working on (generally open source) projects and interviewing for competitive internships where the idea is to take relatively low cost workers and have them work on a low ROI task that is actually interesting. It is better for the intern because they learn actual development and collaboration skills. And it is better for the staff because it is a way to let people work on the stuff they actually want to do without the massive investment of a few hundred hours of a Senior Engineer’s time.

And… there will be a lot fewer of those roles. Just like there were a lot fewer roles for artists as animation tools stopped requiring every single cell of animation to be hand drawn. And that is why we need to decouple life from work through UBI.

But also? If we have less internships that consist of “okay. good job. thanks for that. Next time can you at least try and compile your code? or pay attention to the squiggly red lines in your IDE? or listen to the person telling you that is wrong?”? Then we have better workers and better junior developers who can actually do more meaningful work. And we’ll actually need to update the interviewing system to not just be “did you memorize this book of questions from Amazon?” and we’ll have fewer “hot hires” who surprise everyone by being able to breath unassisted but have a very high salary because they worked for facebook.

Because, and here is the thing: LLMs are already as good, if not better than, an intern or junior engineer. And the companies that spend money on training up interns aren’t going to be rewarded. Under capitalism, there is no reason to “take one for the team” so that your competition can benefit.

assassin_aragorn@lemmy.world · 1 year ago

This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know… they are idiots who need people to check their work constantly.

Do so at your own peril. Because the thing is, a person will learn from their mistakes and grow in knowledge and experience over time. An LLM is unlikely to do the same in a professional environment for two big reasons:

The company using the LLM would have to send data back to the creator of the LLM. This means their proprietary work could be at risk. The AI company could scoop them, or a data leak would be disastrous.
Alternatively, the LLM could self-learn and be solely in house without any external data connections. A company with an LLM will never go for this, because it would mean their model is improving and developing out of their control. Their customized version may end up being better than their the LLM company’s future releases. Or, something might go terribly wrong with the model while it learns and adapts. If the LLM company isn’t held legally liable, they’re still going to lose that business going forward.

On top of that, you need your inexperienced noobs to one day become the ones checking the output of an LLM. They can’t do that unless they get experience doing the work. Companies already have proprietary models that just require the right inputs and pressing a button. Engineers are still hired though to interpret the results, know what inputs are the right ones, and understand how the model works.

A company that tries replacing them with LLMs is going to lose in the long run to competitors.

NuXCOM_90Percent@lemmy.zip · 1 year ago

Actually, nvidia recently announced RAG (Retrieval-Augmented Generation). Basically the idea is that you take an “off the shelf” LLM and then feed your local instance sensitive corporate data. It can then use that information in its responses.

So you really are “teaching” it every time you do a code review of the AI’s merge request and say “Well… that function doesn’t exist” or “you didn’t use useful variable names” and so forth. Which… is a lot more than I can say about a lot of even senior or principle engineers I have worked with over the years who are very much making mistakes that would get an intern assigned to sorting crayons.

Which, again, gets back to the idea of having less busywork. Less grunt work. Less charlie work. Instead, focus on developers who can actually contribute to a team and design meetings.

And the model I learned early in my career that I bring to every firm is to have interns be a reward for talented engineers and not a punishment for people who weren’t paying attention in Nose Goes. Teaching a kid to write a bunch of utility functions does nothing they didn’t learn (or not learn) in undergrad but it is a necessary evil… that an AI can do.

Instead, the people who are good at their jobs and contributing to the overall product? They probably have ideas they want to work on but don’t have the cycles to flesh out. That is where interns come into play. They work with those devs and other staff and learn what it means to actually be part of a team. They get to work on really cool projects and their mentors get to ALSO work on really cool projects but maybe focus more on the REALLY interesting parts and less on the specific implementation.

And result is that your interns are now actually developers who are worth a damn.

Also: One of the most important things to teach a kid is that they owe the company nothing. If they aren’t getting the raise they feel they deserve then they need to be updating their linkedin and interviewing elsewhere. That is good for the worker. And that also means that the companies that spend a lot of money training up grunts? They will lose them to the companies who are desperate for people who can lead projects and contribute to designs but haven’t been wasting money on writing unit tests.

ѕєχυαℓ ρσℓутσρє@lemmy.sdf.org · edit-2 1 year ago

deleted by creator

unreasonabro@lemmy.world · 1 year ago

See, this is why we can’t have nice things. Money fucks it up, every time. Fuck money, it’s a shitty backwards idea. We can do better than this.

Colonel Panic@lemm.ee · 1 year ago

Hear me out. Bottle caps.

unreasonabro@lemmy.world · 1 year ago

'Nuff said!

The Ramen Dutchman@ttrpg.network · 1 year ago

Nah, I can’t imagine the Fallout that would cause

pete_the_cat@lemmy.world · 1 year ago

Someone comes up with something good: look what I made, we can use this to better humanity!

Corporations: How can we make money off of this?

rottingleaf@lemmy.zip · 1 year ago

You can be killed with steel, which has a lot of other implications on what you do in order to avoid getting killed with steel.

Does steel fuck it all up?

Centralization is a shitty backwards idea. But you have to be very conscious of yourself and your instincts to neuter the part that tells you that it’s not to understand it.

Distributivism minus Catholicism is just so good. I always return to it when I give up on trying to find future in some other political ideology.

patatahooligan@lemmy.world · 1 year ago

This has nothing to do with centralization. AI companies are already scraping the web for everything useful. If you took the content from SO and split it into 1000 federated sites, it would still end up in a AI model. Decentralization would only help if we ever manage to hold the AI companies accountable for the en masse copyright violations they base their industry on.

Jakeroxs@sh.itjust.works · 1 year ago

Can you explain how reddit comments or stack overflow answers are “copyright infringement”?

Doesn’t seem relevant to the specific problem this post is about.

patatahooligan@lemmy.world · 1 year ago

Just because something is available to view online does not mean you can do anything you want with it. Most content is automatically protected by copyright. You can use it in ways that would otherwise by illegal only if you are explicitly granted permission to do so.

Specifically, Stack Overflow licenses any content you contribute under the CC-BY-SA 4.0 (older content is covered by other licenses that I omit for simplicity). If you read the license you will note two restrictions: attribution and “share-alike”. So if you take someone’s answer, including the code snippets, and include it in something you make, even if you change it to an extent, you have to attribute it to the original source and you have to share it with the same license. You could theoretically mirror the entire SO site’s content, as long as you used the same licenses for all of it.

So far AI companies have simply scraped everything and argued that they don’t have to respect the original license. They argue that it is “fair use” because AI is “transformative use”. If you look at the historical usage of “transformative use” in copyright cases, their case is kind of bullshit actually. But regardless of whether it will hold up in court (and whether it should hold up in court), the reality is that AI companies are going to use everybody’s content in ways that they have not been given permission to do so.

So for now it doesn’t matter whether our content is centralized or federated. It doesn’t matter whether SO has a deal with OpeanAI or not. SO content was almost certainly already used for ChatGPT. If you split it into 100s of small sites on the fediverse it would still be part of ChatGPT. As long as it’s easy to access, they will use it. Allegedly they also use torrents for input data so even if it’s not publicly viewable it’s not safe. If/when AI data sourcing is regulated and the “transformative use” argument fails in court and if the fines are big enough for the regulation to actually work, then sure the situation described in the OP will matter. But we’ll have to see if that ever happens. I’m not holding my breath, honestly.

JackbyDev@programming.dev · 1 year ago

The irony is that folks complain about stuff like Discord partly because it cannot be scraped by search engines but that would also protect it from being scraped by AI tools.

the_toast_is_gone@lemmy.world · 1 year ago

Until Discord either starts selling data to OpenAI or they start scraping data from/similar to sites like https://spy.pet/ .

JackbyDev@programming.dev · 1 year ago

Believe me, I’m not saying Discord is the bastion of hope for data protection or anything like that lol.

rottingleaf@lemmy.zip · 1 year ago

This has everything to do with centralization, just not with the one small context for it which you picked.

With real decentralization in place market mechanisms work.

Aceticon@lemmy.world · 1 year ago

Copyright is an artificial, government given Monopoly.

Market Mechanisms don’t work when faced with a Monopoly or work badly in situations distorted by the presence of a Monopoly (which is more this case, since Stack Overflow has a monopoly in the reproduction of each post in that website but the same user could post the same answer elsewhere thus creating an equivalent work).

Pretty much in every situation where Intellectual Property is involved you see the market failing miserably: just notice the current situation with streaming services which would be completelly different if there was no copyright and hence no possibility of exclusivity of distribution of any titles (and hence streaming services would have to compete in terms of quality of service).

The idea that the Free Market is something that works everywhere (or even in most cases) is Politically-driven Magic thinking, not Economics.

floofloof@lemmy.ca · 1 year ago

Market forces lead to the creation of large corporations that then shut down market forces and undermine fair markets. Once a few big corporations dominate they coordinate their behavior and prices and shut down any new players entering the market. Regulation can counter it to a point, but once the corporations are wealthy enough to dominate government regulation also fails. Right wingers hasten the process by opposing regulation, and have no good answer to how to prevent markets collapsing into monopolies or cartels. I’m not sure anyone has a good answer to that in a capitalist system.

rottingleaf@lemmy.zip · 1 year ago

You are not arguing with me. Not reading comments before answering them is disrespectful.

Aceticon@lemmy.world · edit-2 1 year ago

This has everything to do with centralization, just not with the one small context for it which you picked.

With real decentralization in place market mechanisms work.

Monopoly situations along with market mechanisms invariably result in centralization (“monopoly” comes from the Greek word for “right of exclusive sale”), hence market mechanism won’t “work” in the sense you mean it in such a scenario, as I explained.

Your argument is circular because it’s like saying that it will work as long as it creates the conditions to make itself work (which is the same as saying “as long as it works”).

rottingleaf@lemmy.zip · 1 year ago

Decentralization and distribution should be enforced, yes.

By, for example, institutionalized resistance to anything like IP law, to regulations and certifications allowing bigger fish to call those who can’t afford them, and at the same time by maintaining regulations against obvious fraud.

It’s not a circular argument, you’re just not paying attention.

The friendliness of political systems to decentralization doesn’t correlate much with their alignment in terms of left\right or even authoritarian\libertarian. So in my opinion this should be a third dimension on that political compass everybody’s gotten tired of seeing. And there are many other dimensions to add then, so useless.

JackbyDev@programming.dev · edit-2 1 year ago

You realize that there have been multiple websites scraped, right? So decentralizing doesn’t solve this issue in particular. Especially when federated sites like Lemmy provide a view of the entire fediverse (more or less).

rottingleaf@lemmy.zip · 1 year ago

This is orthogonal to what I’m talking about. I don’t see scraping as a problem.

JackbyDev@programming.dev · 1 year ago

The person you were replying to was talking about scraping.

rottingleaf@lemmy.zip · 1 year ago

Removed by mod

0x0@programming.dev · 1 year ago

Anarchosyndicalism ftw.

rottingleaf@lemmy.zip · 1 year ago

Of leftist ideologies it’s the best one, but not as beautiful and overarching as distributivism.

Hamartia@lemmy.world · edit-2 1 year ago

List of Distributist parties in the UK:

National Distributist Party
British National Party
National Front

Hmmm, maybe the Catholic part isn’t the only part worth reviewing.

Also worth noting that the Conservative Party’s ‘Big Society’ schtick in 2010 was wrapped in the trappings of distributism.

Not that all this diminishes it entirely but it does seem to be an entry drug for exploitation by the right.

I gotta hold my hand up and state that I am not read up on it at all, so happy to be corrected. But my impression is that Pope Leo XIII’s conception was to reduce secular power so as to leave a void for the church to fill. And it’s the potential exploitation of that void that attracts the far right too.

rottingleaf@lemmy.zip · 1 year ago

but it does seem to be an entry drug for exploitation by the right.

Well, it is a right ideology. It can be that, of course.

ಠ_ಠ@infosec.pub · 1 year ago

Chaotic Entropy@feddit.uk · 1 year ago

How many trees does a person need to make one coffin…

old_machine_breaking_apart@lemmy.dbzer0.com · 1 year ago

It’s a metaphor for us killing ourselves in the processes of deforestation, not a story of someone actually making a coffin.

Chaotic Entropy@feddit.uk · 1 year ago

It may not have been a wholly serious question.

old_machine_breaking_apart@lemmy.dbzer0.com · 1 year ago

You got me. I should stop taking things too literally

WindyRebel@lemmy.world · 1 year ago

I counted around 30-32 in panel 2.

Chaotic Entropy@feddit.uk · 1 year ago

Thank you for your diligence.

RagingHungryPanda@lemm.ee · edit-2 1 year ago

Removed by mod

neclimdul@lemmy.world · 1 year ago

Oh I didn’t consider deleting my answers. Thanks for the good idea ~~Barbra~~ StackOverflow.

TheGrandNagus@lemmy.world · 1 year ago

I’d be shocked if deleted comments weren’t retained by them

rottingleaf@lemmy.zip · 1 year ago

I think the reason for those bans is that they don’t want you rebelling and are showing that they don’t need you personally, thus ban.

Of course it’s all retained.

RememberTheApollo_@lemmy.world · 1 year ago

Isn’t it amazing that places like this built on user support and contribution turn around and pull a “we don’t need you”?

rottingleaf@lemmy.zip · 1 year ago

They think they are too big to die by now. That userbase grows like crops, and isn’t conscious of how it’s being treated.

That’s a bit like monopolists and Ponzi scheme owners think. It works sometimes.

General_Effort@lemmy.world · 1 year ago

They are also retained by anyone who has archived them., like OpenAI or Google. Thus making their AIs more valuable.

To really pull up the ladder, they will have to protest the Internet Archive and Common Crawl, too. It’s just typical right-wing bullshit; acting on emotion and against their own interests.

zaphod@sopuli.xyz · 1 year ago

Letting corporations “disrupt” forums was a mistake.

bitchkat@lemmy.world · 1 year ago

Maybe we should replace Stack Overflow with another site where experts can exchange information? We can call it “Experts Exchange”.

bitfucker@programming.dev · 1 year ago

Expert Sex Change?

Reddfugee42@lemmy.world · 1 year ago

Yes, next to Pen Island

ShadowGlider@lemmy.world · 1 year ago

deleted by creator

yokonzo@lemmy.world · 1 year ago

I mean that’s just been a schoolyard joke for ages

pete_the_cat@lemmy.world · 1 year ago

deleted by creator

skulblaka@startrek.website · 1 year ago

Also a market there. Especially among programmers. You might be onto something.

The Ramen Dutchman@ttrpg.network · 1 year ago

Among Rust devs? Absolutely!

afraid_of_zombies@lemmy.world · 1 year ago

I agree with your idea. I will be launching a website where users can share content. It will be free once knowledge should be free and we will make money by selling data…umm selling user data…umm selling T-shirts I guess. That should be enough to keep the servers running.

the_crotch@sh.itjust.works · 1 year ago

You don’t want that shit done by an amateur

deddit@lemmy.world · 1 year ago

codidact … Stack overflow had a mass exodus of mods a 2-3 years ago and a some of them made codidact.

fruitycoder@sh.itjust.works · 1 year ago

Any discussion on making it ActivityPub enabled?

I didn’t see any, but would be curious if anyone else had.

Legend@lemmy.sdf.org · 1 year ago

Lemmy could be used as a stack overflow alt also Lemmy is shitification repelent by design .

Jimmyeatsausage@lemmy.world · 1 year ago

You really don’t need anything near as complex as AI…a simple script could be configured to automatically close the issue as solved with a link to a randomly-selected unrelated issue.

ChapulinColorado@lemmy.world · 1 year ago

So vanilla stack overflow?

Rai@lemmy.dbzer0.com · 1 year ago

That’s the joke

ChapulinColorado@lemmy.world · 1 year ago

I’m slow.

Rai@lemmy.dbzer0.com · 1 year ago

Based and same-here-often…pilled

Churbleyimyam@lemm.ee · 1 year ago

At the end of the day, this is just yet another example of how capitalism is an extractive system. Unprotected resources are used not for the benefit of all but to increase and entrench the imbalance of assets. This is why they are so keen on DRM and copyright and why they destroy the environment and social cohesion. The thing is, people want to help each other; not for profit but because we have a natural and healthy imperative to do the most good.

There is a difference between giving someone a present and then them giving it to another person, and giving someone a present and then them selling it. One is kind and helpful and the other is disgusting and produces inequality.

If you’re gonna use something for free then make the product of it free too.

An idea for the fediverse and beyond: maybe we should be setting up instances with copyleft licences for all content posted to them. I actually don’t mind if you wanna use my comments to make an LLM. It could be useful. But give me (and all the other people who contributed to it) the LLM for free, like we gave it to you. And let us use it for our benefit, not just yours.

Agent641@lemmy.world · 1 year ago

Begun, the AI wars have.

Faces on T-shirts, you must print print. Fake facts into old forum comments, you must edit. Poison the data well, you must.

iAvicenna@lemmy.world · 1 year ago

I mean we aren’t even fighting AI, we are still fighting greedy little turds

floofloof@lemmy.ca · 1 year ago

Problem is, it still results in turning the Internet to shit. We just do it manually to preempt the AI doing it.

schnurrito@discuss.tchncs.de · 1 year ago

Messages that people post on Stack Exchange sites are literally licensed CC-BY-SA, the whole point of which is to enable them to be shared and used by anyone for any purpose. One of the purposes of such a license is to make sure knowledge is preserved by allowing everyone to make and share copies.

kerrigan778@lemmy.world · 1 year ago

That license would require chatgpt to provide attribution every time it used training data of anyone there and also would require every output using that training data to be placed under the same license. This would actually legally prevent anything chatgpt created even in part using this training data from being closed source. Assuming they obviously aren’t planning on doing that this is massively shitting on the concept of licensing.

JohnEdwa@sopuli.xyz · edit-2 1 year ago

CC attribution doesn’t require you to necessarily have the credits immediately with the content, but it would result in one of the world’s longest web pages as it would need to have the name of the poster and a link to every single comment they used as training data, and stack overflow has roughly 60 million questions and answers combined.

fruitycoder@sh.itjust.works · 1 year ago

IF its outputs are considered derivative works.

kerrigan778@lemmy.world · 1 year ago

Ethically and logically it seems like output based on training data is clearly derivative work. Legally I suspect AI will continue to be the new powerful tool that enables corporations to shit on and exploit the works of countless people.

fruitycoder@sh.itjust.works · 1 year ago

The problem is the legal system and thus IP law enforcement is very biased towards very large corporations. Until that changes corporations will continue, as they already were, exploiting.

I don’t see AI making it worse.

General_Effort@lemmy.world · 1 year ago

They are not. A derivative would be a translation, or theater play, nowadays, a game, or movie. Even stuff set in the same universe.

Expanding the meaning of “derivative” so massively would mean that pretty much any piece of code ever written is a derivative of technical documentation and even textbooks.

So far, judges simply throw out these theories, without even debating them in court. Society would have to move a lot further to the right, still, before these ideas become realistic.

theherk@lemmy.world · 1 year ago

Maybe but I don’t think that is well tested legally yet. For instance, I’ve learned things from there, but when I share some knowledge I don’t attribute it to all the underlying sources of my knowledge. If, on the other hand, I shared a quote or copypasta from there I’d be compelled to do so I suppose.

I’m just not sure how neural networks will be treated in this regard. I assume they’ll conveniently claim that they can’t tie answers directly to underpinning training data.

9point6@lemmy.world · 1 year ago

Share Alike

I can’t wait to download my own version of the latest gpt model

bbuez@lemmy.world · 1 year ago

It does help to know what those funny letters mean. Now we wait for regulators to catch up…

/tangent

If anything, we’re a very long way from anything close to intelligent, OpenAI (and subsequently MS, being publicly traded) sold investors on the pretense that LLMs are close to being “AGI” and now more and more data is necessary to achieving that.

If you know the internet, you know there’s a lot of garbage. I for one can’t wait for garbage-in garbage-out to start taking its toll.

Also I’m surprised how well open source models have shaped up, its certainly worth a look. I occasionally use a local model for “brainstorming” in the loosest terms, as I generally know what I’m expecting, but it’s sometimes helpful to read tasks laid out. Also comfort in that nothing even need leave my network, and even in a pinch I got some answers when my network was offline.

It gives a little hope while corps get to blatantly violate copyright while having wielding it so heavily, that advancements have been so great in open source.

Fedizen@lemmy.world · 1 year ago

primary use for AI is self destructing your website.

Praise Idleness@sh.itjust.works · edit-2 8 months ago

deleted by creator

Ð Greıt Þu̇mpkin@lemm.ee · 1 year ago

Remember when adding the word blockchain to an Iced Tea company’s name caused share prices to jump?

kureta@lemmy.ml · 1 year ago

is this real? I can’t tell anymore.

Syrc@lemmy.world · 1 year ago

I googled it and I wish it wasn’t

pete_the_cat@lemmy.world · edit-2 1 year ago

a little-known micro-cap stock called Long Island Iced Tea Corp. (LTEA) said Thursday that it’s now “Long Blockchain Corp.,” and its stock leaped more than 200 percent at the open of trading. Shares closed up 183 percent.

🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️

This is like my friend who “invested” in Doggy (not Doge) coin “because it was going to explode and become highly valuable” even though it was only worth like .1% of what Doge was worth like two years back… He’s a teacher.

Or my other friend that invested thousands in Etherium like 2 years back, while knowing basically nothing about “The Etherium Network”, or anything crypto related. He just knew that he could potentially make money off of it like he could with stocks. I asked him like a year later if he ever made anything off of it and he said “not really”, and said he had reinvested the money into other things (I forget which, it wasn’t crypto related) 🤣

dependencyinjection@discuss.tchncs.de · 1 year ago

I dunno. AlphaFold 3 is pretty big.

3volver@lemmy.world · 1 year ago

The enshittification is very real and is spreading constantly. Companies will leech more from their employees and users until things start to break down. Acceleration is the only way.

tabarnaski@sh.itjust.works · 1 year ago

Accelerationism is like being on a plane and wishing it crashes when one of the engine fails.

3volver@lemmy.world · 1 year ago

That’s a terrible analogy, implying the wish that everyone on the plane dies if one engine fails.

It’s like an airline company has been complete shit for decades, wanting to see them fail fast so that a better airline company can take their place.

Olhonestjim@lemmy.world · 1 year ago

Except it’s not like a plane because we can stop using specific websites whenever we like, and build our own websites to whittle away at their hegemony.

tabular@lemmy.world · 1 year ago

I despise this use of mod power in response to a protest. It’s our content to be sabotaged if we want - if Stack Overlords disagree then to hell with them.

I’ll add Stack Overflow to my personal ban list, just below Reddit.

redisdead@lemmy.world · edit-2 1 year ago

Once submitted to stack overflow/Reddit/literally every platform, it’s no longer your content. It sucks, but you’ve implicitly agreed to it when creating your account.

The_Vampire@lemmy.world · 1 year ago

While true, it’s stupid that things are that way. They shouldn’t be able to hide behind the idea that “we’re not responsible for what our users publish, we’re more like a public forum” while also having total ownership over that content.

tabular@lemmy.world · 1 year ago

you’ve implicitly agreed to it when creating your account

Many people would agree with that, probably most laws do. However I doubt many users have actually bothered to read the unnecessarily long document, fewer have understood the legalese, and the terms have likely already been changed ~pray I don’t alter it any further~. That’s a low and shady bar of consent. It indeed sucks and I think people should leave those platforms, but I’m also open to laws that would invalidate that part of the EULA.