I’m interested in hosting something like this, and I’d like to know experiences regarding this topic.

The main reason to host this for privacy reasons and also to integrate my own PKM data (markdown files, mainly).

Feel free to recommend me videos, articles, other Lemmy communities, etc.

  • CubitOom@infosec.pub
    link
    fedilink
    English
    arrow-up
    10
    ·
    10 months ago

    Checkout ollama.

    There’s a lot of models you can pull from the official library.

    Using ollama, you can also run external gguf models found on places like huggingface if you use a modelfile with something as simple as

    echo "FROM ~/Documents/ollama/models/$model_filepath" >| ~/Documents/ollama/modelfiles/$model_name.modelfile
    
    • SoleInvictus@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      10 months ago

      It’s good for me because I’m piss poor at programming. In my defense, I’m not a programmer or even programmer adjacent. I do see how it wouldn’t be useful to a pro. It also has occasionally given me garbage advice that an expert would spot right away while I had to figure out in my own that it was ‘hallucinating’ again. There’s nothing better for learning than troubleshooting, though!

      • bogo@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        10 months ago

        I can absolutely see it getting useful for a pro. It’s already a better version of IDE templates. If you have to write boilerplate code this can already do that. It’s a huge time saver for the things you’d have to go look up to remember how to do and piece together yourself.

        Example: today I wanted a quick way to serve my current working directory over HTTP so I could do some quick web work. I asked ChatGPT to write me a bash function I could stick in my profile to do this, and I told it to pick a random unused port. That would have taken me much longer had I went to lookup how to do that all. The only hint I gave it was to use the Python builtin module for serving http.

    • scarilog@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 months ago

      There’s a project called Tabby that your can host as a server on a machine that has a GPU, and has a VSCode extension that connects to the server.

      The default model is called starcoder, and it’s the small version, 1B parameters. The downside is that it’s not super smart (but still an improvement over built in tools), but since it’s such a small model, I’m getting sub-second processing times.

    • amzd@kbin.social
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      10 months ago

      You should make sure you are running a model that fits in your vram, for me it runs faster than any online LLM I’ve tried.

  • Buffalobuffalo@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    10 months ago

    Dbzero Lemmy has a relationship with the Horde AI shared LLM group. My primary use is for chat roleplay but they have streamlined guides to hosting your own models for personal or horde use. One of the primary interfaces is SillyTavern but they integrate numerous models

    • TCB13@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      4
      ·
      10 months ago

      “Uncensored” models are bullshit everything but uncensored. Just ask them for a Windows XP Pro key and you’ll see how uncensored they really are.

  • Haggunenons@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    10 months ago

    Mixtral is an amazing one that isn’t super slow or require incredible hardware foe a decent speed.

    In general this guy has really good videos/tutorials for the latest tools.

  • amzd@kbin.social
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    10 months ago

    ollama + codellama works perfect, I use it from neovim with a plug-in called gen-nvim I think

  • hottari@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    ·
    10 months ago

    Last time I checked this, out of all the options available Serge was the simplest to host and use. Though you need a beefy computer to get fast and/or good responses.

  • SuperiorOne@lemmy.ml
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    2
    ·
    10 months ago

    I’m actively using ollama with docker to run llama2:13b model. It’s generally works fine but heavy on resources as expected.

  • db0@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    10 months ago

    If you want to be able to use your models from everywhere sefurely, then koboldcpp on the ai horde is your best option. Super easy to set up

  • Imacat@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    2
    ·
    edit-2
    10 months ago

    There’s a local llama subreddit with a lot of good information and 4chan’s /g/ board will usually have a good thread with a ton of helpful links in the first post. Don’t think there’s anything on lemmy yet. You can run some good models on a decent home pc but training and fine tuning will likely require renting out some cloud gpus.

  • beta_tester@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    2
    ·
    10 months ago

    Not with success but I’m using huggingface since a couple of days. You may want to have a look into it