• jsdz@lemmy.ml
      link
      fedilink
      arrow-up
      25
      ·
      11 months ago

      It is if you redefine AGI to mean the thing that’s already here, although to get away with that it helps if at the same time you overestimate what LLMs are capable of doing.

      • The Doctor@beehaw.org
        link
        fedilink
        English
        arrow-up
        5
        ·
        11 months ago

        On the whole, I think I prefer “we figured out how to do this, so it’s not actually AI” over crappy thinkpieces like this.

  • The Doctor@beehaw.org
    link
    fedilink
    English
    arrow-up
    34
    ·
    11 months ago

    Oh, for fuck’s sake… no. It isn’t. And I find myself pondering whether or not the article’s authors are themselves sapient.

    • khalic@beehaw.org
      link
      fedilink
      arrow-up
      10
      ·
      edit-2
      11 months ago

      I kind of regret learning ML sometimes. Being one of the 10 people per km2 who understand how it works is so annoying. It’s just a fancy mirror ffs, stop making weird faces at it you baboons!

      • SenorBolsa@beehaw.org
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        11 months ago

        The best part is it’s not even that complicated of a thing conceptually. Like you don’t need to study it to kind of understand the idea and some of its limitations.

      • jarfil@beehaw.org
        link
        fedilink
        arrow-up
        1
        ·
        11 months ago

        Do you really understand how it works? What would you call a neural network with mirror neurons primed to react to certain stimuli patterns as the network gets trained… a mirror, or a baboon?

          • jarfil@beehaw.org
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            11 months ago

            What do you call a neuron “that reacts both when a particular action is performed and when it is only observed”? Current LLMs are made out exclusively of mirror neurons, since their output (what they perform) is the same action as their input (what they observe).

            • EthicalAI@beehaw.org
              link
              fedilink
              arrow-up
              1
              ·
              11 months ago

              I can’t even parse what you mean when you say their input is the same as their output, that would imply they don’t transform their input, which would defeat their purpose. This is nonsense.

  • Dizzy Devil Ducky@lemm.ee
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    11 months ago

    Calling the over glorified chatbots and LLMs like GPT or Claude AGI would be like me calling a preschool finger painting a master class work of art, from my understanding of them. Though, I can’t say I’m anywhere near an expert, so definitely take what I say with a major grain of salt.

    What these AI chatbots and LLMs can do is sometimes impressive, but that’s all I can say about them. Intelligence is definitely not their strong suit when half of the time you’ll ask for a summary of a well known and loved TV show only for it to just make up anything that sounds right.

    • ConsciousCode@beehaw.org
      link
      fedilink
      arrow-up
      5
      ·
      11 months ago

      LLMs are not chatbots, they’re models. ChatGPT/Claude/Bard are chatbots which use LLMs as part of their implementation. I would argue in favor of the article because, while they aren’t particularly intelligent, they are general-purpose and exhibit some level of intelligence and thus qualify as “general intelligence”. Compare this to the opposite, an expert system like a chess computer. You can’t even begin to ask a chess computer to explain what a SQL statement does, the question doesn’t even make sense. But LLMs are capable of being applied to virtually any task which can be transcribed. Even if they aren’t particularly good, compared to GPT-2 which read more like a markov chain they at least attempt to complete the task, and are often correct.

      • jarfil@beehaw.org
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        11 months ago

        LLMs are capable of being applied to virtually any task which can be transcribed

        Where “transcribed” means using any set of tokens, be it extracted from human written languages, emojis, pieces of images, audio elements, spatial positions, or any other thing in existence that can be divided and represented by tokens.

        PS: actually… why “in existence”? Why not throw in some “customizable tokens” into an LLM, for it to come up with whatever meaning it fancies for them?

        • ConsciousCode@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          11 months ago

          There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

          Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

        • ConsciousCode@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          11 months ago

          There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

          Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

          (deleted original because I got token embeddings and the embedding dimensions mixed up, essentially assuming a new token would use the “extreme option”).

  • ConsciousCode@beehaw.org
    link
    fedilink
    arrow-up
    7
    ·
    11 months ago

    Actually a really interesting article which makes me rethink my position somewhat. I guess I’ve unintentionally been promoting LLMs as AGI since GPT-3.5 - the problem is just with our definitions and how loose they are. People hear “AGI” and assume it would look and act like an AI in a movie, but if we break down the phrase, what is general intelligence if not applicability to most domains?

    This very moment I’m working on a library for creating “semantic functions”, which lets you easily use an LLM almost like a semantic processor. You say await infer(f"List the names in this text: {text}") and it just does it. What most of the hype has ignored with LLMs is that they are not chatbots. They are causal autoregressive models of the joint probabilities of how language evolves over time, which is to say they can be used to build chatbots, but that’s the first and least interesting application.

    So yeah, I guess it’s been AGI this whole time and I just didn’t realize it because they aren’t people, and I had assumed AGI implied personhood (which it doesn’t).

    • sandriver@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      11 months ago

      I’m not sure how the tech is progressing, but ChatGPT was completely dysfunctional as an expert system, if the AI field still cares about those. You can adapt the Chinese Room problem to whether a model actually has applicability outside of a particular domain (say, anything requiring guessing words on probabilities, or stabilising a robot).

      Another problem is that probabilistic reasoning requires data. Just because a particular problem solving approach is very good at guessing words based on a huge amount of data from a generalist corpus, doesn’t mean it’s good at guessing in areas where data is poor. Could you comment on whether LLMs have good applicability as expert systems in, say, medicine? Especially obscure diseases, or heterogeneous neurological conditions (or both like in bipolar disorders and schizophrenia-related disorders)?

      • ConsciousCode@beehaw.org
        link
        fedilink
        arrow-up
        2
        ·
        11 months ago

        LLMs are not expert systems, unless you characterize them as expert systems in language which is fair enough. My point is that they’re applicable to a wide variety of tasks which makes them general intelligences, as opposed to an expert system which by definition can only do a handful of tasks.

        If you wanted to use an LLM as an expert system (I guess in the sense of an “expert” in that task, rather than a system which literally can’t do anything else), I would say they currently struggle with that. Bare foundation models don’t seem to have the sort of self-awareness or metacognitive capabilities that would be required to restrain them to their given task, and arguably never will because they necessarily can only “think” on one “level”, which is the predicted text. To get that sort of ability you need cognitive architectures, of which chatbot implementations like ChatGPT are a very simple version of. If you want to learn more about what I mean, the most promising idea I’ve seen is the ACE framework. Frameworks like this can allow the system to automatically look up an obscure disease based on the embedded distance to a particular query, so even if you give it a disease which only appears in the literature after its training cut-off date, it knows this disease exists (and is a likely candidate) by virtue of it appearing in its prompt. Something like “You are an expert in diseases yadda yadda. The symptoms of the patient are x y z. This reminds you of these diseases: X (symptoms 1), Y (symptoms 2), etc. What is your diagnosis?” Then you could feed the answer of this question to a critical prompting, and repeat until it reports no issues with the diagnosis. You can even make it “learn” by using LoRA, or keep notes it writes to itself.

        As for poorer data distributions, the magic of large language models (before which we just had “language models”) is that we’ve found that the larger we make them, and the more (high quality) data we feed them, the more intelligent and general they become. For instance, training them on multiple languages other than English somehow allows them to make more robust generalizations even just within English. There are a few papers I can recall which talk about a “phase transition” which happens during training where beforehand, the model seems to be literally memorizing its corpus, and afterwards (to anthropomorphize a bit) it suddenly “gets” it and that memorization is compressed into generalized understanding. This is why LLMs are applicable to more than just what they’ve been taught - you can eg give them rules to follow within the conversation which they’ve never seen before, and they are able to maintain that higher-order abstraction because of that rich generalization. This is also a major reason open source models, particularly quantizations and distillations, are so successful; the models they’re based on did the hard work of extracting higher-order semantic/geometric relations, and now making the model smaller has minimal impact on performance.

  • baggachipz@kbin.social
    link
    fedilink
    arrow-up
    4
    ·
    11 months ago

    Comparing current LLMs to the ENIAC is thought-provoking; I understand the eagerness to extrapolate in that direction. That being said, I don’t think it will be linear or even logarithmic in progress. The current state of computing and technological advancement has become:

    1. Initial introduction or release
    2. Major hype and influx of greed money. <- we are here
    3. Failure to live up to the hype, resulting in the tech becoming a punchline and gobs of money lost
    4. Renaissance of the tech as its true potential is eventually realized, which doesn’t match the original hype but ends up very useful
    5. Iteration and improvement with no clear “done” or “achieved” milestone, it just becomes part of society
  • CanadaPlus@lemmy.sdf.org
    link
    fedilink
    arrow-up
    4
    ·
    11 months ago

    I wonder how many people here actually looked at the article. They’re arguing that ability to do things not specifically trained on is a more natural benchmark of the transition from traditional algorithm to intelligence than human-level performance. Honestly, it’s an interesting point; aliens would not be using human-level performance as a benchmark so it must be subjective to us.

    • Kaldo@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      11 months ago

      I guess the point I have an issue with here is ‘ability to do things not specifically trained on’. LLMs are still doing just that, and often incorrectly - they basically just try to guess the next words based on a huge dataset they trained on. You can’t actually teach it anything new, or to put it better it can’t actually derive conclusions by itself and improve in such way - it is not actually intelligent, it’s just freakishly good at guessing.

      • upstream@beehaw.org
        link
        fedilink
        arrow-up
        2
        ·
        11 months ago

        Heck, sometimes someone comes to me and asks if some system can solve something they just thought of. Sometimes, albeit very rarely, it just works perfectly, no code changes required.

        Not going to argue that my code is artificial intelligence, but huge AI models obviously has a higher odds of getting something random correct, just because it correlates.

      • CanadaPlus@lemmy.sdf.org
        link
        fedilink
        arrow-up
        1
        ·
        11 months ago

        You can’t actually teach it anything new, or to put it better it can’t actually derive conclusions by itself and improve in such way

        That is true, at least after training. They don’t have any long-term memory. Short term you can teach them simple games, though.

        Of course, this always goes into Chinese room territory. Is simply replicating intelligent behavior not enough to be equivalent to it? I like to remind people we’re just a chemical reaction ourselves, according to all our science.