Glad this is becoming a meme

lledrtx@lemmy.world · 8 months ago

Glad this is becoming a meme

zinderic@programming.dev · 8 months ago

It’s almost impossible to audit what data got into an AI model. Until this is true companies could scrape and use whatever they like and no one would be the wiser to what data got used or misused in the process. That makes it hard to make such companies accountable to what and how they are using.

po-lina-ergi@kbin.social · 8 months ago

Then it needs to be on companies to prove their audit trail, and until then require all development to be open source

zinderic@programming.dev · 8 months ago

That would be amazing. But it won’t happen any time soon if ever… I mean - just think about all that investment in GPU compute and the need to realize good profit margins. Until there are laws and legislation that requires AI companies to open their data pipelines and make public all details about the data sources I don’t think much would happen. They’ll just keep feeding any data they get their hands on and nothing can stop that today.

ipkpjersi@lemmy.ml · edit-2 8 months ago

Until there are laws and legislation that requires AI companies to open their data pipelines and make public all details about the data sources I don’t think much would happen.

I don’t expect those laws to ever happen. They don’t benefit large corporations so there’s no reason those laws would ever be prioritized or considered by lawmakers, sadly.

InputZero@lemmy.ml · edit-2 8 months ago

Maybe not today and maybe not every AI but maybe some AI in the near future will have it’s data sources made explainable. There are a lot of applications where deploying AI would be an improvement over what we have. One example I can bring up is in-silico toxicology experiments. There’s been a huge desire to replace as many in-vivo experiments with in-vitro or even better in-silico to minimize the number of live animals tested on. Both for ethical reasons and cost savings. AI has been proposed as a new tool to accomplish this but it’s not there yet. One of the biggest challenges to overcome is making the AI models used in-silico to be explainable, because we can not regulate effectively what we can not explain. Regardless there is a profits incentive for AI developers to make at least some AI explainable. It’s just not where the big money is. To which end that will apply to all AI I haven’t the slightest idea. I can’t imagine OpenAI would do anything to expose their data.

po-lina-ergi@kbin.social · 8 months ago

Then it needs to be on companies to prove their audit trail, and until then require all development to be open source

dislocate_expansion@reddthat.com · 8 months ago

Anyone know why most are a 2021 internet data cut off?

Donkter@lemmy.world · 8 months ago

I think it’s just that most are based on chatgpt which cuts off at 2021.

can@sh.itjust.works · 8 months ago

Hey, did you know your profile is set to appear as a bot and as a result many may be filtering your posts and comments? You can change this in your Lemmy settings.

Unless you are a bot… In which case where did you get your data?

dislocate_expansion@reddthat.com · 8 months ago

The data wasn’t stolen, I can at least assure you of that

can@sh.itjust.works · 8 months ago

You paid Hoffman?

potustheplant@feddit.nl · 8 months ago

Where do you get that from? At least ChatGPT isn’t limited to data from 2021. I haven’t researched about other models.

RatBin@lemmy.world · 8 months ago

Gpt 3.5 is limited to 2021. Gpt 4; 4.5; the imaginary upcoming gpt 5 models are not, but that does not mean they aren’t limited in their own ways.

dislocate_expansion@reddthat.com · 8 months ago

Are you sure those aren’t trained until 2021, frozen, and then fine tuned on later data?

RatBin@lemmy.world · 8 months ago

I really don’t know, I’m speculating, but neither does openai know, that’s sure. So we have the most popular ML system used by millions based on…what exactly?

dislocate_expansion@reddthat.com · 8 months ago

Yeah GPT 3.5 and some other FOSS models also say 2021

potustheplant@feddit.nl · 8 months ago

OpenAI stated in a tweet a few months ago that the limitation is no longer in place.

webghost0101@sopuli.xyz · 8 months ago

To be fair this tweet doesn’t say anything about training data but simply that it theoretically can use present day data if it looks it up online.

For gpt4 i think its was initially trained up to 2021 but it has gotten updates where data up to december 2023 was used in training. It “knows” this data and does not need to look ut up.

Whether they managed to further train the initial gpt4 model to do so or added something they trained separately is probably a trade secret.

dislocate_expansion@reddthat.com · 8 months ago

Thanks!

turkishdelight@lemmy.ml · 8 months ago

What’s wrong with her face?

AnUnusualRelic@lemmy.world · 8 months ago

Poor training data presumably.

turkishdelight@lemmy.ml · 8 months ago

🤣

SpaceCowboy@lemmy.ca · 8 months ago

It’s this face: https://www.compdermcenter.com/wp-content/uploads/2016/09/vanheusen_5BSQnoz.jpg

She was asked about openai using copyrighted material for training data and literally made that face. Only thing more perfect would’ve been if she tugged at her collar while doing the face.