Transformers: 自注意力机制,组件,计算优化,架构变化,注意力计算过程的加速和可解释性 Large language models:基础知识,预训练,prompt工程,微调,偏好微调,模型压缩 Applications: 评估框架,经典任务,普适性任务 梳理完,我发现这本书叫super study guide很贴切,有点像一个地图,重要的概念和技术都点到了,对数学公司有...
以下内容翻译自SEBASTIAN RASCHKA的《Understanding Large Language Models》翻译中有删减。 原文地址:magazine.sebastianraschka.com 大型语言模型席卷了公众的注意力。在短短五年内,大型语言模型——Transformer——几乎完全改变了自然语言处理领域。此外,他们还开始革新计算机视觉和计算生物学等领域。以下列表主要按时间顺序阅...
natural-language-processingdeep-learningtransformerspytorchsupervised-learningsemi-supervised-learningfew-shot-learningpre-trained-language-modelscode-understandingknowledge-enhancementprompt-based-learning UpdatedAug 7, 2023 Python zjunlp/MolGen Star149 Code ...
Scalable Diffusion Models with Transformers; William Peebles et al Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers; Katherine Crowson et al Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers; Peng...
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists...
The authors compared the model’s performance on word-level and character-level datasets and compared them to other prominent models (RNNs and Transformers). Transformer-XL achieved state-of-the-art (SOTA) results on several different datasets benchmarks: ...
Recent large language models (LLMs), such as ChatGPT, have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains and compute-limited settings has created a burgeoning need for i
1、Hugging Face的Transformers库(https://github.com/huggingface/transformers) 2、Llama Factory hiyouga/LLaMA-Factory: Unify Efficient Fine-Tuning of 100+ LLMs (github.com)关于Llama Factory集成方法,包括: 连续预训练:通过持续优化模型,使其能够更好地处理新的任务和数据。
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX. Hugging Face 提供了最重要的开源深度学习资源库,它本身并不是一个深度学习框架。Hugging Face 的目标是扩展到文本之外,支持图像、音频、视频、物体检测等。
【LLM】BitNet:为大型语言模型扩展1位Transformers (BitNet: Scaling 1-bit Transformers for Large Language Models) 无影寺 微信公众号:AI帝国;分享大模型相关的最新论文、动态当前大型语言模型已经在各种任务中带来了显著的改进。然而,由于高推理成本和能源消耗(energy consumption),托管大型语言模型是昂贵的。随着...