This document discusses image parsing to text generation. It describes how an Image to Text (I2T) system works by 1) detecting objects in an image using an image parsing model, 2) mapping detected objects to text labels using an ontology, and 3) generating a natural language description of the image based on the labels and their relationships. Examples are given of boat movements extracted from a video and described in text. References are provided on image parsing techniques, ontologies for mapping labels, and lexical databases used.
34. Yao, Benjamin Z and other. (2010).
I2T: Image Parsing to Text
Description. Cited January
14, 2013, Available from:
http://ieeexplore.ieee.org/xpl/art
icleDetails.jsp?tp=&arnumber=54
87377
McGuinness, Deborah L. and Harmelen, Frank
van. (2012). OWL Web Ontology
Language. Cited January 14, 2013, Available
from: http://www.w3.org/TR/owl-
features/
G. A. Miller. (2012). Wordnet: A
lexical database for English. Cited
January 14, 2013, Available from:
http://wordnet.princeton.edu/