DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | [? HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Base) | | DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | [? HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct) | ...
- [DeepSeek-Coder-V2](https://github.com/deepseek-ai/DeepSeek-Coder-V2) - [ ] DeepSeek-Coder-V2-Lite-Instruct FastApi 部署调用 - [ ] DeepSeek-Coder-V2-Lite-Instruct langchain 接入 - [ ] DeepSeek-Coder-V2-Lite-Instruct WebDemo 部署 - [ ] DeepSeek-Coder-V2-Lite-Instruct vLLM 部...
本节我们简要介绍如何基于 transformers、peft 等框架,对DeepSeek-Coder-V2-Lite-Instruct 模型进行 Lora 微调。Lora 是一种高效微调方法,深入了解其原理可参见博客:[知乎|深入浅出Lora](https://zhuanlan.zhihu.com/p/650197598)。 这个教程会在同目录下给大家提供一个[nodebook](./04-DeepSeek-Coder-V2-Lite-...
> 考虑到部分同学配置环境可能会遇到一些问题,我们在 `AutoDL` 平台准备了 `DeepSeek-Coder-V2-Lite-Instruct` 的环境镜像。点击下方链接并直接创建 `Autodl` 示例即可。 > ***https://www.codewithgpu.com/i/datawhalechina/self-llm/deepseek-coder*** > 考虑到部分同学配置环境可能会遇到一些问题,我们在...
model_type: str = "deepseek_v2" vocab_size: int = 102400 hidden_size: int = 4096 intermediate_size: int = 11008 moe_intermediate_size: int = 1407 num_hidden_layers: int = 30 num_attention_heads: int = 32 num_key_value_heads: int = 32 n_shared_experts: Optional[int] = None n...
教程地址:https://github.com/datawhalechina/self-llm/tree/master/DeepSeek-Coder-V2 👍3w1z1x0, ZedB, and cqh963852 reacted with thumbs up emoji 👍 Contributor guodaycommentedJul 3, 2024 Thank you very much for your contribution. We will guide those who need SFT to this link....
Jul 14, 2024 Contributor godkuncommentedJul 13, 2024 KMnO4-zxmerged commit349fafcintodatawhalechina:masterJul 14, 2024 Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment 2 participants
DeepSeek-Coder-V2-Lite-Instruct也是在determine_num_available_blocks处fails, 但是报一个NCCL error: (RayWorkerWrapper pid=23558, ip=10.0.128.18) ERROR 07-28 13:53:40 worker_base.py:382] RuntimeError: NCCL Error 3: internal error - please report this issue to the NCCL developers ...
# DeepSeek-Coder-V2-Lite-Instruct Lora 微调 本节我们简要介绍如何基于 transformers、peft 等框架,对 Qwen2-7B-Instruct 模型进行 Lora 微调。Lora 是一种高效微调方法,深入了解其原理可参见博客:[知乎|深入浅出Lora](https://zhuanlan.zhihu.com/p/650197598)。 本节我们简要介绍如何基于 transformers、peft 等...
> 考虑到部分同学配置环境可能会遇到一些问题,我们在 `AutoDL` 平台准备了 `DeepSeek-Coder-V2-Lite-Instruct` 的环境镜像。点击下方链接并直接创建 `Autodl` 示例即可。 > ***https://www.codewithgpu.com/i/datawhalechina/self-llm/deepseek-coder*** > 考虑到部分同学配置环境可能会遇到一些问题,我们在...