The Great Software Quality Collapse: How We Normalized Catastrophe

onehundredsixtynine@sh.itjust.works · 3 months ago

The Great Software Quality Collapse: How We Normalized Catastrophe

panda_abyss@lemmy.ca · 3 months ago

I’ve been thinking of having a small model like a long context qwen 4b run and do quick code review to check for these issues, then just correct the main model.

It feels like a secondary model that only exists to validate that a task was actually completed could work.

FishFace@lemmy.world · 3 months ago

Yeah, it can work, because it’ll trigger the recall of different types of input data. But it’s not magic and if you have a 25% chance of the model you’re using hallucinating, you probably end up still with an 8.5% chance of getting bullshit after doing this.