As an alternative, we directly train models for entry-level categories from data where people have provided entry- level labels – in the form of nouns present in visually de- scriptive image captions. We postulate that these nouns represent examples of entry-level labels because they have been...
The online destination image refers to an online portrayal of knowledge, collective beliefs, ideas, feelings, and overall impression people hold about a destination [57]. There are three basic dimensions of a destination image: cognitive, emotional (named also affective image), and the overall ...
The hidden Markov model (HMM) was used to model the generation process of captions. Although the sentences generated by this method are both readable and relevant for given images, there are some fails of the predicted nouns or verbs, since the detected objects may be mistaken. 2.2. Retrieval...
According to the theory of Baron and Herslund, English has a tendency to have names for collective concepts such as "chairs" and "bowls", whereas French more or less consistently lacks names for collective concepts and, instead, has different names for different chairs and bowls. This ...
There are provided methods and systems for providing viewers of a digital image with information about identifiable and scenes within the image. In an embodiment, digital images, up
The conceptual framework of vocabulary knowledge for the present study is largely based on the collective strength of some definitions discussed above (e.g., Chapelle, 1998; Henriksen, 1999; Nation, 2001; Qian, 1998, 1999). This framework comprises the following dimensions: (a) vocabulary size;...
In Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 427–434). Association for Computational Linguistics. Gupta, A., & Davis, L. S. (2008). Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In Computer vision–...
the image is disregarded, posing caption evaluation as a purely linguistic task similar to machine translation (MT) evaluation. However, because we exploit the semantic structure of scene descriptions and give primacy to nouns, our approach is better suited to evaluating computer generated image ...