Then I explained the concept of GQA and asked it for the parts enabling GQA: The key difference between Implementation A and B that enables Grouped Query Attention is having separate n_kv_heads and n_heads arguments. In Implementation B, n_kv_heads allows having fewer key/value projections ...