It’s almost impossible to audit what data got into an AI model. Until this is true companies could scrape and use whatever they like and no one would be the wiser to what data got used or misused in the process. That makes it hard to make such companies accountable to what and how they are using.
Then it needs to be on companies to prove their audit trail, and until then require all development to be open source
That would be amazing. But it won’t happen any time soon if ever… I mean - just think about all that investment in GPU compute and the need to realize good profit margins. Until there are laws and legislation that requires AI companies to open their data pipelines and make public all details about the data sources I don’t think much would happen. They’ll just keep feeding any data they get their hands on and nothing can stop that today.
Until there are laws and legislation that requires AI companies to open their data pipelines and make public all details about the data sources I don’t think much would happen.
I don’t expect those laws to ever happen. They don’t benefit large corporations so there’s no reason those laws would ever be prioritized or considered by lawmakers, sadly.
Maybe not today and maybe not every AI but maybe some AI in the near future will have it’s data sources made explainable. There are a lot of applications where deploying AI would be an improvement over what we have. One example I can bring up is in-silico toxicology experiments. There’s been a huge desire to replace as many in-vivo experiments with in-vitro or even better in-silico to minimize the number of live animals tested on. Both for ethical reasons and cost savings. AI has been proposed as a new tool to accomplish this but it’s not there yet. One of the biggest challenges to overcome is making the AI models used in-silico to be explainable, because we can not regulate effectively what we can not explain. Regardless there is a profits incentive for AI developers to make at least some AI explainable. It’s just not where the big money is. To which end that will apply to all AI I haven’t the slightest idea. I can’t imagine OpenAI would do anything to expose their data.
Then it needs to be on companies to prove their audit trail, and until then require all development to be open source
Anyone know why most are a 2021 internet data cut off?
I think it’s just that most are based on chatgpt which cuts off at 2021.
Hey, did you know your profile is set to appear as a bot and as a result many may be filtering your posts and comments? You can change this in your Lemmy settings.
Unless you are a bot… In which case where did you get your data?
The data wasn’t stolen, I can at least assure you of that
You paid Hoffman?
Where do you get that from? At least ChatGPT isn’t limited to data from 2021. I haven’t researched about other models.
Gpt 3.5 is limited to 2021. Gpt 4; 4.5; the imaginary upcoming gpt 5 models are not, but that does not mean they aren’t limited in their own ways.
Are you sure those aren’t trained until 2021, frozen, and then fine tuned on later data?
I really don’t know, I’m speculating, but neither does openai know, that’s sure. So we have the most popular ML system used by millions based on…what exactly?
Yeah GPT 3.5 and some other FOSS models also say 2021
To be fair this tweet doesn’t say anything about training data but simply that it theoretically can use present day data if it looks it up online.
For gpt4 i think its was initially trained up to 2021 but it has gotten updates where data up to december 2023 was used in training. It “knows” this data and does not need to look ut up.
Whether they managed to further train the initial gpt4 model to do so or added something they trained separately is probably a trade secret.
Thanks!
What’s wrong with her face?
Poor training data presumably.
🤣
It’s this face: https://www.compdermcenter.com/wp-content/uploads/2016/09/vanheusen_5BSQnoz.jpg
She was asked about openai using copyrighted material for training data and literally made that face. Only thing more perfect would’ve been if she tugged at her collar while doing the face.