Improving Neural Network SubspacesIn spite of the success of deep learning, we know relatively little about the many possible solutions to which a trained network can converge. Networks generally converge to some local minima—a region in space where the loss function increases in every direction—...
In this work, we study how well the learned weights of a neural network utilize the space available to them. This notion is related to capacity, but additionally incorporates the interaction of the network architecture with the dataset. Most learned weights appear to be full rank, and are ther...
In this paper, we propose the compression of deep models based on learning lower dimensional subspaces from their latent representations while maintaining a minimal loss of performance. We leverage on the premise that deep convolutional neural networks extract many redundant features to learn new ...
The representation of subspaces increases the efficiency of algorithms by mapping the original data to a low-dimensional space while preserving its useful features. Based on subspace representation, FSCIL projects new-class data into the subspace composed of base or old-class features, thereby enabling...
The self-attention sub-layer consists of multiple attention heads, which enable ResInf to concurrently integrate information from different representation subspaces while simultaneously account for diverse positional contexts. For the attention head j, taking the input \({{{\bf{e}}}_{i}^{p}\) as...
The process projects the input into multiple subspaces, which allow the model to attend to different parts of the input simultaneously. The transformer architecture typically includes feed-forward neural networks (FFNNs) in each layer. These FFNNs introduce non-linearity to the model and enable it...
In this work, we introduced, Subspace Adaptation Prior (SAP), a novel meta-learning algorithm that jointly learns a good neural network initialization and good parameter subspaces (or subsets of operations) in which new tasks can be learned within a few gradient descent updates from a few data...
High Variance of Neural Network Models Training deep neural networks can be very computationally expensive. Very deep networks trained on millions of examples may take days, weeks, and sometimes months to train. Google’s baseline model […] was a deep convolutional neural network […] that had...
However, these methods perform unsatisfactorily on datasets where the intrinsic subspaces of samples are not independent or have a significant intersection. The main reason is that minimizing the reconstruction loss in the CNNs cannot guarantee the discriminative ability of learned latent representation [...
In this paper, we propose a new approach to federated learning that directly aims to efficiently identify distribution similarities among clients by analyzing the principal angles between the client data subspaces. Each client applies a truncated singular value decomposition (SVD) step on its local ...