sparse+autoencoder+llm

2025-06-09 00:07:21

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

聊聊Sparse Autoencoder对于LLM解释性的重塑 - 知乎

我是2023年10月开始关注到可以用 Sparse Autoencoder (SAE)来解释LLM的,到25年3月这一年半的时间里:(1) 训出了一系列基于Mistral-7b-inst的SAE模型;(2) 探索如何利用SAE的解释来提升LLM在生成任务的安全性和分类任务(e.g., Reward Modeling)的泛化性;(3) 参与了一篇SAE+LLM的survey。有人
...of Sparse Autoencoders for LLM Interpretability 稀疏自编码器...

还存在另一层不匹配,即我们的主观可解释性评估是我们真正目标“这个模型是如何工作的”的代理。有可能LLMs中的一些重要概念不容易解释,如果我们盲目地优化可解释性,可能会忽略这些概念。有关SAE 评估方法的更详细讨论以及使用棋盘游戏模型 SAE 的评估方法,请参阅我的博客文章《Evaluating Sparse Autoencoders with B...
...kSparse AutoEncoder 稀疏性激活的显示实现(SAE on LLM)_哔哩...

10:59 [动手写神经网络] pytorch 高维张量 Tensor 维度操作与处理,einops 23:03 [动手写 Transformer] 手动实现 Transformer Decoder(交叉注意力,encoder-decoder cross attentio) 14:43 [动手写神经网络] kSparse AutoEncoder 稀疏性激活的显示实现(SAE on LLM) 16:22 [...
...Across Large Language Models via Sparse Autoencoders |...

Since comparing features across LLMs is challenging due to polysemanticity, in which LLM neurons often correspond to multiple unrelated features rather than to distinct concepts, sparse autoencoders (SAEs) have been employed to disentangle LLM neurons into SAE features corresponding to distinct ...
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Paper tables with annotated results for Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
...Large Language Model With KAN-based Sparse Autoencoders

KAN-LLaMA: An Interpretable Large Language Model With KAN-based Sparse Autoencoders Topics sparse-autoencoders kolmogorov-arnold-networks llm-interpretability Resources Readme Activity Stars 1 star Watchers 0 watching Forks 0 forks Report repository Releases No releases published Packages No...
sparse · GitHub Topics · GitHub

Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-...
[Sparse AutoEncoder] 将SAE扩展到多层 - 知乎

Sparse AutoEncode (SAE) TLDR 就是一个宽度很大的linear proj + 激活函数 + linear proj(有可能再加一个threshold i.e. JumpReLU),通过loss设计让激活稀疏化。根据transformer-circuits.pub 的说法,LLM本身的latent space是高度多义性的 e.g. 一个高维vector表达多种人类语义下的信息 ...
【领域论文】Transformer/LLM稀疏网络(Sparse)系列论文总结 - 知乎

前言本文总结了Transformer/LLM中稀疏网络(Sparse),包含:LLM/语言模型、VLM/视觉语言模型、Prompt/提示词、Agent/智能体、CoT/思维链、MoE/混合专家模型、CLIP/图像语言模型、RAG/检索增强、SSM/状态空间模型、M…
Paper tables with annotated results for Sparse Autoencoder...

Paper tables with annotated results for Sparse Autoencoder Features for Classifications and Transferability

快搜汉语词典

sparse+autoencoder+llm

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

聊聊Sparse Autoencoder对于LLM解释性的重塑 - 知乎

...of Sparse Autoencoders for LLM Interpretability 稀疏自编码器...

...kSparse AutoEncoder 稀疏性激活的显示实现(SAE on LLM)_哔哩...

...Across Large Language Models via Sparse Autoencoders |...

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

...Large Language Model With KAN-based Sparse Autoencoders

sparse · GitHub Topics · GitHub

[Sparse AutoEncoder] 将SAE扩展到多层 - 知乎

【领域论文】Transformer/LLM稀疏网络(Sparse)系列论文总结 - 知乎

Paper tables with annotated results for Sparse Autoencoder...

缩写

今日热搜

上海网友集中晒蘑菇

近反义词

相关词语

相关搜索