Cache shape torch.Size([163840, 64]) [2024-08-02 08:40:50,110] [WARN] /usr/local/api/chat_router.py(115):__init__: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size. [2024-08-02 08:40:50,111] [WARN] /usr/...
(66H, F2H, F3H) in both the legacy format and in the EVEX prefix format, these legacy SIMD prefixes are encoded into the SIMD prefix encoding field; and at runtime are expanded into the legacy SIMD prefix prior to being provided to the decoder's PLA (so the PLA can execute both ...