FuseMoE kernel should not depend on the size of input prompts while decoding because there is no dependency with the input length, but output tokens/sec during decoding significantly changes in Mixtral model when we change input prompt length. It degrades around 50% when input length increases ...
If encryption is disabled, incoming HTTP/1.1 connections can be upgraded to HTTP/2 through HTTP Upgrade. On the other hard, backend connections are not encrypted by default. To encrypt backend connections, use tls keyword in --backend option....
If encryption is disabled, incoming HTTP/1.1 connections can be upgraded to HTTP/2 through HTTP Upgrade. On the other hard, backend connections are not encrypted by default. To encrypt backend connections, use tls keyword in --backend option....