综上所述,解决“cannot merge adapters to a quantized model”的问题需要你仔细检查和调整你的模型量化方法和适配器兼容性。如果问题仍然存在,寻求社区帮助可能是一个不错的选择。
Ideally I was more interested to run in 4bit than 8bit (i.e. trying the highest reduction in memory footprint) so I didnt try to solve this issue with the backward in 8bit. But with batch size 1 and an A100-80GB GPU, I get OOM above MSL=3000 tokens for both the unquantized LoR...