me+utils+sparse+quantize

2025-05-11 19:39:46

拼音 [ 拼音 ]

simplify readme · hiyouga/LLaMA-Factory@92dab8a · GitHub

353 - from deepspeed.utils import set_z3_leaf_modules # type: ignore 354 + if getattr(model.config, "model_type", None) == "mixtral": 354 355 from transformers.models.mixtral.modeling_mixtral import MixtralSparseMoeBlock 355 356 356 - set_z3_leaf_modules(model, [MixtralSparse...
llm-compressor/README.md at main · vllm-project/llm...

Uses (1) semi-structured sparsity (SparseGPT), where, for every four contiguous weights in a tensor, two are set to zero. (2) Uses channel-wise quantization to compress weights to 8 bits and dynamic per-token quantization to compress activations to 8 bits. Useful for better inference than...
README.md · GSH_Coder/Tensorflow-bin - Gitee.com

sparse_feature_cross_op_kernel", "//tensorflow/contrib/nearest_neighbor:nearest_neighbor_ops_kernels", "//tensorflow/contrib/rnn:all_kernels", "//tensorflow/contrib/seq2seq:beam_search_ops_kernels", "//tensorflow/contrib/tensor_forest:model_ops_kernels", "//tensorflow/contrib/tensor_forest:...
Amazing-Resources/README.md at main · Mr-lander/Amazing...

Fast Llama 2 on CPUs With Sparse Fine-Tuning and DeepSparse (https://neuralmagic.com/blog/fast-llama-2-on-cpus-with-sparse-fine-tuning-and-deepsparse/) LLM Distillation Playbook (by Predibase) - Practical best practices for distilling large language models (https://github.com/predibase/llm...