This adds an option to compute perplexity over the prompt input similar to https://huggingface.co/docs/transformers/perplexity. It does so by chunking up the prompt into non-overlapping chunks of the context window size. It then runs the forward pass and computes the softmax probability of the...
// from llama_eval above. Now, based on https://huggingface.co/docs/transformers/perplexity, // calculate the perplexity over the last half the window (so the model always has // some context to predict the token). // // We rely on the fact that attention in the forward pass only ...
docker run --privileged --gpus'"all"'--shm-size 10g --rm -it --name axolotl --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --mount type=bind,src="${PWD}",target=/workspace/axolotl -v${HOME}/.cache/huggingface:/root/.cache/huggingface winglian/axolotl:main-latest ...