git clone https://github.com/openppl-public/ppl.llm.serving.git Building from Source ./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87'" -DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87'" NCCL is required if...
git clone https://github.com/openppl-public/ppl.llm.kernel.cuda.git ./build.sh -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87'"-DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87'" License
46 changes: 39 additions & 7 deletions 46 src/ppl/kernel/llm/cuda/flash_attn2/fmha.cu Original file line numberDiff line numberDiff line change @@ -55,13 +55,17 @@ ppl::common::RetCode flash_attn2_fmha( const int64_t mask_stride_s, // can be broadcasted to batches and heads ...
#include "ppl/kernel/llm/cuda/common/matrix_layout.h" namespace ppl { namespace kernel { namespace llm { namespace cuda { namespace pmx { namespace f8f8 { ppl::common::RetCode cast_fp16( cudaStream_t stream, const void* input, // fp16, [batch, quant_dim] const int64_t batch, ...
GitHub Advanced Security Enterprise-grade security features Copilot for business Enterprise-grade AI features Premium Support Enterprise-grade 24/7 support Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Provide feedback We read every piece of feedback...
The code support calculating LongPPL on customized LLMs and datasets. Please run: pip install longppl or git clone https://github.com/PKU-ML/LongPPL.git cd LongPPL pip install -e . and use the following code to calculate LongPPL: from longppl import compute_longppl output = compute_...
lower(): # ipex-llm gptq from ipex_llm.transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained(args.model_path, load_in_4bit=True, torch_dtype=torch.float, use_cache=args.use_cache, trust_remote_code=True) else: # ipex-llm from ipex_llm.transformers import...
LLM Model Zoo LLaMA 1/2/3 ChatGLM 2/3 Baichuan 1/2 7B InternLM 1 InternLM 2 Mixtral Qwen 1/1.5 Falcon Bigcode Hello, world! Installing prerequisites: On Debian or Ubuntu: apt-get install build-essential cmake git python3 python3-dev ...
LLM Model Zoo LLaMA 1/2/3 ChatGLM 2/3 Baichuan 1/2 7B InternLM 1 InternLM 2 Mixtral Qwen 1/1.5 Falcon Bigcode Hello, world! Installing prerequisites: On Debian or Ubuntu: apt-get install build-essential cmake git python3 python3-dev ...
git clone https://github.com/openppl-public/ppl.llm.serving.git Exporting Models Refer toppl.pmxfor details. Running client: send request through gRPC to query the model ./ppl-build/client_sample 127.0.0.1:23333 Seetools/client_sample.ccfor more details. ...