Figure 7. Depiction of an attribute localization module [111]. The input is a combined feature vector and the output is a prediction for a single attribute. This module is used in this work. Training takes advantage of the deep supervision method [115], where ground-truth data are used ...