Artificial General Intelligence Is Already Here | NOEMA

hedge@beehaw.org · 1 year ago

Artificial General Intelligence Is Already Here | NOEMA

Dizzy Devil Ducky@lemm.ee · edit-2 1 year ago

Calling the over glorified chatbots and LLMs like GPT or Claude AGI would be like me calling a preschool finger painting a master class work of art, from my understanding of them. Though, I can’t say I’m anywhere near an expert, so definitely take what I say with a major grain of salt.

What these AI chatbots and LLMs can do is sometimes impressive, but that’s all I can say about them. Intelligence is definitely not their strong suit when half of the time you’ll ask for a summary of a well known and loved TV show only for it to just make up anything that sounds right.

ConsciousCode@beehaw.org · 1 year ago

LLMs are not chatbots, they’re models. ChatGPT/Claude/Bard are chatbots which use LLMs as part of their implementation. I would argue in favor of the article because, while they aren’t particularly intelligent, they are general-purpose and exhibit some level of intelligence and thus qualify as “general intelligence”. Compare this to the opposite, an expert system like a chess computer. You can’t even begin to ask a chess computer to explain what a SQL statement does, the question doesn’t even make sense. But LLMs are capable of being applied to virtually any task which can be transcribed. Even if they aren’t particularly good, compared to GPT-2 which read more like a markov chain they at least attempt to complete the task, and are often correct.

jarfil@beehaw.org · edit-2 1 year ago

LLMs are capable of being applied to virtually any task which can be transcribed

Where “transcribed” means using any set of tokens, be it extracted from human written languages, emojis, pieces of images, audio elements, spatial positions, or any other thing in existence that can be divided and represented by tokens.

PS: actually… why “in existence”? Why not throw in some “customizable tokens” into an LLM, for it to come up with whatever meaning it fancies for them?

ConsciousCode@beehaw.org · edit-2 1 year ago

There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

ConsciousCode@beehaw.org · 1 year ago

There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

(deleted original because I got token embeddings and the embedding dimensions mixed up, essentially assuming a new token would use the “extreme option”).