IntroSparse AutoEncode (SAE) TLDR 就是一个宽度很大的linear proj + 激活函数 + linear proj(有可能再加一个threshold i.e. JumpReLU),通过loss设计让激活稀疏化。 根据 https://transformer-circuits.pub/20…
我是2023年10月开始关注到可以用 Sparse Autoencoder (SAE)来解释LLM的,到25年3月这一年半的时间里:(1) 训出了一系列基于Mistral-7b-inst的SAE模型;(2) 探索如何利用SAE的解释来提升LLM在生成任务的安全性和分类任务(e.g., Reward Modeling)的泛化性;(3) 参与了一篇SAE+LLM的survey。有人或许会问我为啥...
10:59 [动手写神经网络] pytorch 高维张量 Tensor 维度操作与处理,einops 23:03 [动手写 Transformer] 手动实现 Transformer Decoder(交叉注意力,encoder-decoder cross attentio) 14:43 [动手写神经网络] kSparse AutoEncoder 稀疏性激活的显示实现(SAE on LLM) 16:22 [...
简介:本文将提供一个简单的稀疏自编码器(Sparse Autoencoder, SAE)的PyTorch代码示例,以及如何将其堆叠(Stack)以创建栈式稀疏自编码器(Stacked Sparse Autoencoders, SSAE)。 满血版DeepSeek,从部署到应用,全栈都支持 快速部署、超低价格、极速蒸馏、应用开发、即时调用 立即体验 在深度学习中,自编码器是一种无监...
Files main sae-viewer public src .gitignore README.md package-lock.json package.json tailwind.config.js tsconfig.json sparse_autoencoder .gitignore .pre-commit-config.yaml LICENSE README.md SECURITY.md pyproject.tomlBreadcrumbs sparse_autoencoder / sae-viewer/ Directory actions More options...
A sparse autoencoder is one of a range of types of autoencoder artificial neural networks that work on the principle of unsupervised machine learning. Autoencoders are a type of deep network that can be used for dimensionality reduction – and to reconstruct a model through backpropagation. Adve...
9 Dec 2024·Bart Bussmann,Patrick Leask,Neel Nanda· Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting language model activations by decomposing them into sparse, interpretable features. A popular approach is the TopK SAE, that uses a fixed number of the most active ...
Files main sae-viewer public src .gitignore README.md package-lock.json package.json tailwind.config.js tsconfig.json sparse_autoencoder .gitignore .pre-commit-config.yaml LICENSE README.md SECURITY.md pyproject.tomlBreadcrumbs sparse_autoencoder /sae-viewer / .gitignore ...
Paper tables with annotated results for Sparse Autoencoder Features for Classifications and Transferability
Different from traditional stacked autoencoders, the ESGSAE model considers the complementarity between the original feature and the hidden outputs by embedding the original features into hidden layers. To alleviate the impact of the small sample problem on the generalization of the proposed ESGSAE ...