Authors
Abstract
For some images, descriptions written by multiple people are consistent with each other. But for other images, descriptions across people vary considerably. In other words, some images are specific − they elicit consistent descriptions from different people − while other images are ambiguous. Applications involving images and text can benefit from an understanding of which images are specific and which ones are ambiguous. For instance, consider text-based image retrieval. If a query description is moderately similar to the caption (or reference description) of an ambiguous image, that query may be considered a decent match to the image. But if the image is very specific, a moderate similarity between the query and the reference description may not be sufficient to retrieve the image.
In this paper, we introduce the notion of image specificity. We present two mechanisms to measure specificity given multiple descriptions of an image: an automated measure and a measure that relies on human judgement. We analyze image specificity with respect to image content and properties to better understand what makes an image specific. We then train models to automatically predict the specificity of an image from image features alone without requiring textual descriptions of the image. Finally, we show that modeling image specificity leads to improvements in a text-based image retrieval application.
Dataset browser
We provide a dataset browser (click the image below to start exploring!) that allows the reader to better understand the concepts described in the paper and the effects associated with it.
BibTeX
@inproceedings{jas2015specificity,
Author = {Mainak Jas and Devi Parikh},
Title = {{Image Specificity}},
Year = {2015},
booktitle = {{IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}}
}