Train LLM with deepspeed in pipeline mode This repo provides a codebase based on deepspeed pipeline mode with which you can pretrain or finetune LLM faster and more memory-efficiently than zero mode. Currently,
Add the environment tag to the es_manager section in config/base.yaml Evaluation RAGEN provides a easy way to evaluate a model: python -m ragen.llm_agent.agent_proxy --config-name <eval_config> You only need to set model and environment to evaluate in config/<eval_config>.yaml Feedback...
Reinforcement Learning (RL) with rule-based rewards has shown promise in enhancing reasoning capabilities of large language models (LLMs). However, existing approaches have primarily focused on static, single-turn tasks like math reasoning and coding. Extending these methods to agent scenarios introduce...
It exists for demo'ing ability to use ggml for finetuning LLM, so don't expect it to be performance or efficient (at least for now) 👍 1 Contributor teleprint-me commented Jul 26, 2024 • edited I don't even have the words for all feels I'm feeling right now over this. ...
Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024) - Organize llm selftrain and update README (#163) · sotopia-lab/sotopia-pi@e6584dd
vLLM A40 GPU batch_size=100, n=10 7094.5 Pretrain Please refer to PRETRAIN.md for instructions on how to pretrain TinyLlama. Finetune We include a simple full-parameter finetuning & inference script in sft. Our V0.1 chat model is finetuned using this script. The FT dataset we use is...
scale experiments such that developers can run and train models on single-gpu machines Contributing (Ranked by Urgency) Bug Fixes Poor memory scheduling (vLLM server shutdowns when switching between episode generation and policy training) Refactors Some files exceed 1000 lines, especially in episode...
FrameworkDeviceSettingsThroughput (tokens/sec) Llama.cpp Mac M2 16GB RAM batch_size=1; 4-bit inference 71.8 vLLM A40 GPU batch_size=100, n=10 7094.5PretrainPlease refer to PRETRAIN.md for instructions on how to pretrain TinyLlama.
FrameworkDeviceSettingsThroughput (tokens/sec) Llama.cpp Mac M2 16GB RAM batch_size=1; 4-bit inference 71.8 vLLM A40 GPU batch_size=100, n=10 7094.5PretrainPlease refer to PRETRAIN.md for instructions on how to pretrain TinyLlama.
This codebase is built on top of verl, and we use the core functionalities in their codebase heavily. We thank the authors of verl for providing us with an extremely easy-to-work-with codebase!Contemporary work such as MM-UPT have tried a similar idea for training multi-modal LLMs. ...