Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?
“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
Exactly! You can glean so much from a single work, not just about the work itself but who created it and what ideas were they trying to express and what does that tell us about the world they live in and how they see that world.
This doesn’t even touch the fact that I’m learning to draw not by looking at other drawings but what exactly I’m trying to draw. I know at a base level, a drawing is a series of shapes made by hand whether it’s through a digital medium or traditional pen/pencil and paper. But the skill isn’t being able replicate other drawings, it’s being able to convert something I can see into a drawing. If I’m drawing someone sitting in a wheelchair, then I’ll get the pose of them sitting in the wheelchair but I can add details I want to emphasise or remove details I don’t want. There’s so much that goes into creative work and I’m tired of arguing with people who have no idea what it takes to produce creative works.
It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than “well anyone can draw, children do it all the time”. They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They don’t get it, and to an extent, that’s fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.
Well, that’s not me. I’m a creative, and I see deep parallels between how LLMs work and how my own mind works.
Either you’re vastly overestimating the degree of understanding and insight AIs possess, or you’re vastly underestimating your own capabilities. :)
Alternatively, you might be vastly overestimating human “understanding and insight”, or how much of it is really needed to create stuff.
Average humans, sure, don’t have a lot of understanding and insight, and little is needed to be able to draw a doodle on some paper. But trained artists have a lot of it, because part of the process is learning to interpret artworks and work out why the artist used a particular composition or colour or object. To create really great art, you do actually need a lot of understanding and insight, because everything in your work will have been put there deliberately, not just to fill up space.
An AI doesn’t know why it’s put an apple on the table rather than an orange, it just does it because human artists have done it - it doesn’t know what apples mean on a semiotic level to the human artist or the humans that look at the painting. But humans do understand what apples represent - they may not pick up on it consciously, but somewhere in the backs of their minds, they’ll see an apple in a painting and it’ll make the painting mean something different than if the fruit had been an orange.
Interestingly, LLMs seem to show emerging semiotic organization. By analyzing the activation space of the neural network, related concepts seem to get trained into similar activation patterns, which is what allows LLMs to zero shot relationships when executed at a “temperature” (randomness level) in the right range.
Pairing an LLM with a stable diffusion model, allows the resulting AI to… well, judge by yourself: https://llm-grounded-diffusion.github.io/
I’m unconvinced that the fact they’re getting better at following instructions, like putting objects where the prompter specifies, or changing the colour, or putting the right number of them, etc means the model actually understands what the objects mean beyond their appearance. It doesn’t understand the cultural meanings attached to each object, and thus is unable to truly make a decision about why it should place an apple rather than an orange, or how the message within the picture changes when it’s a red sports car rather than a beige people-carrier.
Well, that’s part of the LLM step, so let’s put it to the test:
Image prompt:
Create an ad for a car. The target audience are young adults, with high income, and thrill seeking. Come up with a brand name fitting such a car, and include it in the image. For the car, come up with a shape, and color, best fitting of the target audience. Come up with an image layout, art style, and camera angle, best fitting of the target audience. Include between zero and two additional items that will make the overall picture more attractive to the target audience.
ChatGPT:
[(‘an advertisement for a car’, [45, 58, 422, 396]), (‘a high-performance car’, [123, 193, 266, 128]), (‘an edgy and dynamic brand name logo’, [188, 10, 136, 30]), (‘a sleek, red sports car’, [168, 254, 176, 102])] Background prompt: An adrenaline-pumping car advertisement targeting young adults with high income and a thrill-seeking spirit. The layout includes a bold brand name logo, a sleek red sports car, and a dynamic composition to captivate the audience. Negative prompt: additional items
How did it know to pick a “sleek red sports car”? Or the rest of the elements.