I prefer waterfox, OpenAI can keep its Chat chippy tea browser.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    4 hours ago

    Not anymore.

    I can run GLM 4.6 on a Ryzen/single RTX 3090 desktop at 7 tokens/s, and it blows lesser API models away. I can run 14-49Bs (or GLM Air) in more utilitarian cases that do just fine.

    And I can reach for free/dirt cheap APIs called locally when needed.

    But again, it’s all ‘special interest tinkerer’ tier. You can’t do that with ollama run, you have to mess with exotic libraries and tweaked setups and RAG chains to squeeze out that kind of performance. But all that getting simplified is inevitable.