• panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    23 hours ago

    I’ve been thinking of having a small model like a long context qwen 4b run and do quick code review to check for these issues, then just correct the main model.

    It feels like a secondary model that only exists to validate that a task was actually completed could work.

    • FishFace@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      21 hours ago

      Yeah, it can work, because it’ll trigger the recall of different types of input data. But it’s not magic and if you have a 25% chance of the model you’re using hallucinating, you probably end up still with an 8.5% chance of getting bullshit after doing this.