• MonkeMischief@lemmy.today
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      5 months ago

      Expertly explained. Thank you! It’s pretty rad what you can get out of a quantized model on home hardware, but I still can’t understand why people are trying to use it for anything resembling productivity.

      It sounds like the typical tech industry:

      “Look how amazing this is!” (Full power)

      “Uh…uh oh, that’s unsustainable. Let’s quietly drop it.” (Way reduced power)

      “People are saying it’s not as good, we can offer them LLM+ plus for better accuracy!” (3/4 power with subscription)

    • mcv@lemmy.zip
      link
      fedilink
      arrow-up
      2
      ·
      5 months ago

      But if that’s how you’re going to run it, why not also train it in that mode?

      • Xylight‮@lemdro.id
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        That is a thing, and it’s called quantization aware training. Some open weight models like Gemma do it.

        The problem is that you need to re-train the whole model for that, and if you also want a full-quality version you need to train a lot more.

        It is still less precise, so it’ll still be worse quality than full precision, but it does reduce the effect.