Exemplar captions generated by LSTM-R w/o geometrical relationship (W/O Geo-Rel), LSTM-R (W/ Geo-Rel) and from human annotations. The words in same colors indicate the exact semantic matches of OCR tokens between the generated captions and the human- provided...