The neuron_explainer.models.autoencoder module implements a sparse autoencoder trained on the GPT-2 small model's activations. The autoencoder's purpose is to expand the MLP layer activations into a larger number of dimensions, providing an overcomplete basis of the MLP activation space. The lear...
arxiv:HINT: Hierarchical Neuron Concept Explainer github:https://github.com/AntonotnaWang/HINT Abstract 解释深度网络的一种主要方法是将神经元与人类可理解的概念相关联。本文研究了受人类层次化认知过程启发的层次化概念,提出分层神经元概念解释器(HINT),以低成本和可扩展的方式有效地建立神经元和分层概念之间的...