windows gpu=6G显存 环境,CPU启动可以正常使用。换成cuda启动web_demo,提问时报错。 加载模型配置: model = AutoModel.from_pretrained("model", trust_remote_code=True).half().cuda() 错误信息: Traceback (most recent call last): File "C:\Python39\lib\site-packages\gradio\routes.py", line 394, ...
self.code = code self._function_names = function_names self._cmodule = LazyKernelCModule(self.code) for name in self._function_names: setattr(self, name, KernelFunction(self._cmodule, name)) quantization_code = "$QlpoOTFBWSZTWU9yuJUAQHN///f/n/8/n///n//bt4dTidcVx8X3V9FV...
Current Behavior 使用cli_demo.py尝试第一次对话时出现爆显存,具体如下: Traceback (most recent call last): File "C:\gits\ChatGLM-6B\cli_demo.py", line 44, in <module> main() File "C:\gits\ChatGLM-6B\cli_demo.py", line 34, in main for response, history in model.stream_chat(tok...