Sept

Xylight@lemdro.id · 8 hours ago

There is a reason there is sometimes a notable decrease in quality of the same AI model a while after it’s released.

Hosters of the models (like OpenAI or Microsoft) may have switched to a quantized version of their model. Quantization is a common practice to increase power efficiency and make the model easier to run, by essentially rounding the weights of the model to a lower precision. This decreases VRAM and storage usage significantly, at the cost of a bit of quality, where higher quantization results in worse quality.

For example, the base model will likely be in FP16, full floating point precision. They may switch to a Q8 version, which nearly halves the size of the model, with about a 3-7% decrease in quality.