activated+params+37b

2025-03-26 18:08:31

拼音 [ 拼音 ]

Feature Request: MoE only load activated expert(s) to GPU...

But since V3 and R1 has only 37B activated params (INT4 37B weights is 18.5GB), is it possible for the MoE inference to only load the 37B "activated experts (s)" related weights to GPU mem, and leave other non-activated or non-used expert's weight some in CPU memory(e.g.32GB)...