In this work, we make progress in this direction by introducing MultiMix, which interpolates an arbitrary number $n$ of tuples, each of length $m$, with one vector $\lambda$ per tuple. On sequence data, we further extend to dense interpolation and loss computation over all spatial ...
To construct Hiera’s full 56×56 position embedding, we tile the window embedding and interpolate the global embedding, then add the results (see Fig. 3). This is similar to the learned behavior of absolute embeddings (Fig. 2), but we can now interpolate properly (Fig. 1e): tile the...