Exploring Alternatives to Softmax Function.doi:10.5220/0010502000810086Kunal BanerjeeC Vishak PrasadRishi Raj GuptaKarthik VyasH AnushreeBiswajit MishraSCITEPRESS - Science and Technology PublicationsSymposium/Workshop on Electronic Design, Test and Applications...
Recently, the most compelling methods for learning representations without labels have been unsupervised contrastive learning [55, 34, 73, 13, 12], which significantly outperformed other pretext task-based alternatives [43, 26, 18, 54]. With a similar idea to exemplar l...
An empirical comparison of the original BERT-style objective to these three alternatives is shown in Table 5. We find that in our setting, all of these variants perform similarly. The only exception was that dropping corrupted tokens completely produced a small improvement in the GLUE score ...
(Hu et al. [13] ex- plored alternatives to the dot product, but these alternatives operated on scalar weights that were likewise shared across channels.) This construction does not adapt the attention weights at different channels. Although this can be mitigated to some extent by introducing ...
In that case, the 1D-CNN models—𝐶𝑁𝑁64CNN64, 𝐶𝑁𝑁64CNN64, and 𝐶𝑁𝑁64_64CNN64_64—offer realistic alternatives to the FCN since they offer high test accuracies (as shown in Table 2), minimal false alarm rates (as shown in Figure 10a–c), and comparatively ...