Also, using higher learning rates made shallow architectures like VGGs very unstable and prevented them from convergence. Comparing the results that we obtained with those obtained by other authors was difficult because the other methods used either different evaluation criteria or different datasets. ...