1. A brief description
This paradigm relies on two seminal work published by Cooper (1974) and by Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy (1995). In a typical eye tracking study using the visual world paradigm, participants’ eye movements to objects or pictures in the visual workspace are recorded via an eye tracker as the participant produces or comprehends a spoken language describing the concurrent visual world. This paradigm has high versatility, as it can be used in a wide range of populations, including those who cannot read and/or who cannot overtly give their behavioral responses, such as preliterate children, elderly adults, and patients. More importantly, the paradigm is extremely sensitive to fine grained manipulations of the speech signal, and it can be used to study the online processing of most topics in language comprehension at multiple levels, such as the fine grained acoustic phonetic features, the properties of words, and the linguistic structures.
2. Recent Applications
- Groot, Huettig, & Olivers (2017)
On all trials, participants memorized a spoken word for a verbal recognition test at the end of the trial. During the retention period, they performed a visual search task. In crucial trials, the search target were absent. In a crucial trial, for example, the word to remember was “banana”. They then saw four object printed on the screen. These contained an object that was semantically related (such as the monkey), an object that was visually related (such as the canoe), and two objects that were unrelated (such as the hat and the tambourine). In the visual search stage, participant were asked to search the banana (the template condition) or the figurine (Accessory condition). The article observed that participants’ eye movements are significantly different between the accessory condition and the template condition, suggesting that language-induced attentional biases are subject to task requirements.
- Saryazdi & Chambers (2018)
To explore the effects of the degree of image realism, researchers conducted two eye tracking studies using the visual world paradigm. The test image consist of four objects, such as a cigarette, a banana, an earings, and an apple. The test images consist of both the phorographs and the clipart images of the same objects. The two experiments differ in whether the test audios are noun-biased (Experiment 1), such as John will move the apple/banana, or verb-biased (Experiment 2), such as John will move/peal the apple/banana. Researchers found a modest benefit for clipart stimuli during real-time processing, but only for noun-driving mappings, i.e., the effect of realism was observed in experiment 1 but not in experiment 2.
- Kreysa, Nunnemann, & Knoeferle (2018)
The authors monitored participants’ eye movements to mentioned characters while they listened to transitive sentences. They varied whether speaker gaze, a depicted action, neither, or both of these visual cues were available, as well as whether both cues we deictic(Experiment 1) or only speaker gaze(Experiment 2). Speaker gaze affected eye movements during comprehension similarly early to a single deictic action depiction, but significantly earlier than non-deictic action depictions; conversely, depicted actions but not speaker gaze positively affected later recall of sentence content.
- Thothathiri, Asaro, Hsu, & Novick (2018)
Figuring out who did what to whom is a critical component in sentence comprehension. This so called themantic role assignment is influenced by both syntactic and semantic cues. Conflict between these cues can result in temporary consideration of multiple incompatible interpretations during real-time sentence processing. The authors conduced two stuides to test whether the resolution of syntax-semantics conflict can be expedited by the online engagement of cognitive control processes that are routinely used to regulate behavior across domains.
In the two experiments, critical stroop-sentence pairs consisted of a stroop trial (trial n-1) followed by a sentence trial (trial n). In the Stroop task, participants viewed words on a computer screen and indicated the font colors of the words, such as blue (congruent), red (incongruent). On sentence comprehension trials, participants hear a sentence and select a picture from four options, such as the rabbit was chased by the fox (congruent) or the fox was chased by the rabbit (incongruent).
The results showed that the prior incongruent stroop trial can faliciate the resolution of syntax-semantics conflict in the sentence comprehension trial, reflected in a) fewer looks to a picture illustrating the competing but incorrect interpretation (Experiment 1), and b) steeper growth in looks to a picture illustrating the correct interpretation (Experiment 2).
- Yamashiro & Vouloumanos (2018)
Speech allows humans to communicate and to navigate the social world. The authors reported an visual world experiment to explore whether infants, like adults, process communicative events while the event is occurring. In their experiment, infants saw a sequence of three trials in a pre-recorded video of a third-party communicative interaction. During the familiarization trials, the Communicator looked at two novel objects and grasped the target object. During the action segment of the test trial, the Communicator could no longer reach the objects, so she vocalized to the Listener using speech or cough. The Listener selected either the target object or the non-target object. In the still image segment, the final image of the test trial froze for the remainder of the test trial. Areas of interest used in all trials are shown in the image of the still image segment of the test trial. The results showed that children by 12 months, like adults can immediately evaluate the communicator’s speech, but not her cough, as communicative and recognized that the Listener should select the target object only when the Communicator spoke.
- McMurray, Danelz, Rigler, & Seedorff (2018)
The authors reported an visual-world study exploring the development of speech categorization from children to adolescence. Children from 3 age groups (7–8, 12–13, and 17–18 years) heard a token from either a b/p or s/ continua spanning 2 words (beach/peach, ship/sip) and selected its referent from a screen containing 4 pictures of potential lexical candidates. Eye movements to each object were monitored as a measure of how strongly children were committing to each candidate as perception unfolds in real-time. Results showed an ongoing sharpening of speech categories through 18, which was particularly apparent during the early stages of real-time perception.
- Lowder & Ferreira (2018)
Participants eye-movement on a picture consisting of a reparandum(e.g., cat), a repair (e.g., a dog), and two unrelated distractors (e.g., a plant and a dishtowel), were monitored as they were listening the test sentences.
The contextual plausibility of the misspoken word and the certainty with which the speaker uttered this word were systematically manipulated:
- Every Saturday, Bill likes to grab a book and sit on the couch with his cat, uh I mean his dog, where they spend the afternoon. (Plausible-Certain)
- Every Saturday, Bill likes to grab a Frisbee and go to the park with his cat, uh I mean his dog, where they spend the afternoon. (Implausible-Certain)
- Every Saturday, Bill likes to grab a book and sit on the couch with his uh cat, uh I mean his dog, where they spend the afternoon. (Plausible-Uncertain)
- Every Saturday, Bill likes to grab a Frisbee and go to the park with his uh cat, uh I mean his dog, where they spend the afternoon. (Implausible-Uncertain)
Results showed that listeners immediately exploited these cues to generate top-down expectations regarding the speaker’s communicative intention. Crucially, listeners used these expectations to constrain the bottom-up speech input and mentally correct perceived speech errors, even before the speaker initiated the correction.
Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6(1), 84–107. Journal Article. doi:10.1016/0010-0285(74)90005-x
Groot, F. de, Huettig, F., & Olivers, C. N. L. (2017). Language-induced visual and semantic biases in visual search are subject to task requirements. Visual Cognition, 25(1-3), 225–240. Journal Article. doi:10.1080/13506285.2017.1324934
Kreysa, H., Nunnemann, E. M., & Knoeferle, P. (2018). Distinct effects of different visual cues on sentence comprehension and later recall: The case of speaker gaze versus depicted actions. Acta Psychologica, 188, 220–229. Journal Article. doi:10.1016/j.actpsy.2018.05.001
Lowder, M. W., & Ferreira, F. (2018). I see what you meant to say: Anticipating speech errors during online sentence processing. Journal of Experimental Psychology: General. Journal Article. doi:10.1037/xge0000544
McMurray, B., Danelz, A., Rigler, H., & Seedorff, M. (2018). Speech categorization develops slowly through adolescence. Developmental Psychology, 54(8), 1472–1491. Journal Article. doi:10.1037/dev0000542
Saryazdi, R., & Chambers, C. G. (2018). Mapping language to visual referents: Does the degree of image realism matter? Acta Psychologica, 182, 91–99. Journal Article. doi:10.1016/j.actpsy.2017.11.003
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268(5217), 1632–1634. Journal Article. doi:10.1126/science.7777863
Thothathiri, M., Asaro, C. T., Hsu, N. S., & Novick, J. M. (2018). Who did what ? A causal role for cognitive control in thematic role assignment during sentence comprehension. Cognition, 178, 162–177. Journal Article. doi:10.1016/j.cognition.2018.05.014
Yamashiro, A., & Vouloumanos, A. (2018). How do infants and adults process communicative events in real time? Journal of Experimental Child Psychology, 173, 268–283. Journal Article. doi:10.1016/j.jecp.2018.04.011