RESUMO
Experiments on visually grounded, definite reference production often manipulate simple visual scenes in the form of grids filled with objects, for example, to test how speakers are affected by the number of objects that are visible. Regarding the latter, it was found that speech onset times increase along with domain size, at least when speakers refer to nonsalient target objects that do not pop out of the visual domain. This finding suggests that even in the case of many distractors, speakers perform object-by-object scans of the visual scene. The current study investigates whether this systematic processing strategy can be explained by the simplified nature of the scenes that were used, and if different strategies can be identified for photo-realistic visual scenes. In doing so, we conducted a preregistered experiment that manipulated domain size and saturation; replicated the measures of speech onset times; and recorded eye movements to measure speakers' viewing strategies more directly. Using controlled photo-realistic scenes, we find (1) that speech onset times increase linearly as more distractors are present; (2) that larger domains elicit relatively fewer fixation switches back and forth between the target and its distractors, mainly before speech onset; and (3) that speakers fixate the target relatively less often in larger domains, mainly after speech onset. We conclude that careful object-by-object scans remain the dominant strategy in our photo-realistic scenes, to a limited extent combined with low-level saliency mechanisms. A relevant direction for future research would be to employ less controlled photo-realistic stimuli that do allow for interpretation based on context.
Assuntos
Movimentos Oculares , Fala , Humanos , Masculino , Feminino , Adulto , Adulto Jovem , Percepção Visual , Atenção , Estimulação LuminosaRESUMO
When referring to a target object in a visual scene, speakers are assumed to consider certain distractor objects to be more relevant than others. The current research predicts that the way in which speakers come to a set of relevant distractors depends on how they perceive the distance between the objects in the scene. It reports on the results of two language production experiments, in which participants referred to target objects in photo-realistic visual scenes. Experiment 1 manipulated three factors that were expected to affect perceived distractor distance: two manipulations of perceptual grouping (region of space and type similarity), and one of presentation mode (2D vs. 3D). In line with most previous research on visually-grounded reference production, an offline measure of visual attention was taken here: the occurrence of overspecification with color. The results showed effects of region of space and type similarity on overspecification, suggesting that distractors that are perceived as being in the same group as the target are more often considered relevant distractors than distractors in a different group. Experiment 2 verified this suggestion with a direct measure of visual attention, eye tracking, and added a third manipulation of grouping: color similarity. For region of space in particular, the eye movements data indeed showed patterns in the expected direction: distractors within the same region as the target were fixated more often, and longer, than distractors in a different region. Color similarity was found to affect overspecification with color, but not gaze duration or the number of distractor fixations. Also the expected effects of presentation mode (2D vs. 3D) were not convincingly borne out by the data. Taken together, these results provide direct evidence for the close link between scene perception and language production, and indicate that perceptual grouping principles can guide speakers in determining the distractor set during reference production.
RESUMO
This paper investigates developmental changes in children's processing of redundant information in definite object descriptions. In two experiments, children of two age groups (6 or 7, and 9 or 10 years old) were presented with pictures of sweets. In the first experiment (pairwise comparison), two identical sweets were shown, and one of these was described with a redundant modifier. After the description, the children had to indicate the sweet they preferred most in a forced-choice task. In the second experiment (graded rating), only one sweet was shown, which was described with a redundant color modifier in half of the cases (e.g., "the blue sweet") and in the other half of the cases simply as "the sweet." This time, the children were asked to indicate on a 5-point rating scale to what extent they liked the sweets. In both experiments, the results showed that the younger children had a preference for the sweets described with redundant information, while redundant information did not have an effect on the preferences for the older children. These results imply that children are learning to distinguish between situations in which redundant information carries an implicature and situations in which this is not the case.
RESUMO
In two experiments, we investigate to what extent various visual saliency cues in realistic visual scenes cause speakers to overspecify their definite object descriptions with a redundant color attribute. The results of the first experiment demonstrate that speakers are more likely to redundantly mention color when visual clutter is present in a scene as compared to when this is not the case. In the second experiment, we found that distractor type and distractor color affect redundant color use: Speakers are most likely to overspecify if there is at least one distractor object present that has the same type, but a different color than the target referent. Reliable effects of distractor distance were not found. Taken together, our results suggest that certain visual saliency cues guide speakers in determining which objects in a visual scene are relevant distractors, and which not. We argue that this is problematic for algorithms that aim to generate human-like descriptions of objects (such as the Incremental Algorithm), since these generally select properties that help to distinguish a target from all objects that are present in a scene.
Assuntos
Atenção/fisiologia , Modelos Teóricos , Reconhecimento Visual de Modelos/fisiologia , Percepção Visual/fisiologia , Adolescente , Adulto , Algoritmos , Sinais (Psicologia) , Feminino , Humanos , Masculino , Estimulação Luminosa , Adulto JovemRESUMO
When speakers describe objects with atypical properties, do they include these properties in their referring expressions, even when that is not strictly required for unique referent identification? Based on previous work, we predict that speakers mention the color of a target object more often when the object is atypically colored, compared to when it is typical. Taking literature from object recognition and visual attention into account, we further hypothesize that this behavior is proportional to the degree to which a color is atypical, and whether color is a highly diagnostic feature in the referred-to object's identity. We investigate these expectations in two language production experiments, in which participants referred to target objects in visual contexts. In Experiment 1, we find a strong effect of color typicality: less typical colors for target objects predict higher proportions of referring expressions that include color. In Experiment 2 we manipulated objects with more complex shapes, for which color is less diagnostic, and we find that the color typicality effect is moderated by color diagnosticity: it is strongest for high-color-diagnostic objects (i.e., objects with a simple shape). These results suggest that the production of atypical color attributes results from a contrast with stored knowledge, an effect which is stronger when color is more central to object identification. Our findings offer evidence for models of reference production that incorporate general object knowledge, in order to be able to capture these effects of typicality on determining the content of referring expressions.
RESUMO
This study investigates to what extent the amount of variation in a visual scene causes speakers to mention the attribute color in their definite target descriptions, focusing on scenes in which this attribute is not needed for identification of the target. The results of our three experiments show that speakers are more likely to redundantly include a color attribute when the scene variation is high as compared with when this variation is low (even if this leads to overspecified descriptions). We argue that these findings are problematic for existing algorithms that aim to automatically generate psychologically realistic target descriptions, such as the Incremental Algorithm, as these algorithms make use of a fixed preference order per domain and do not take visual scene variation into account.
Assuntos
Percepção de Cores , Cor , Idioma , Sinais (Psicologia) , Feminino , Humanos , Masculino , Estimulação Luminosa/métodos , Psicolinguística , Adulto JovemRESUMO
In a recent article published in this journal (van Deemter, Gatt, van der Sluis, & Power, 2012), the authors criticize the Incremental Algorithm (a well-known algorithm for the generation of referring expressions due to Dale & Reiter, 1995, also in this journal) because of its strong reliance on a pre-determined, domain-dependent Preference Order. The authors argue that there are potentially many different Preference Orders that could be considered, while often no evidence is available to determine which is a good one. In this brief note, however, we suggest (based on a learning curve experiment) that finding a Preference Order for a new domain may not be so difficult after all, as long as one has access to a handful of human-produced descriptions collected in a semantically transparent way. We argue that this is due to the fact that it is both more important and less difficult to get a good ordering of the head than of the tail of a Preference Order.