60 pages • 2 hours read
Steven PinkerA modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.
The eye has repeatedly come up in the book as an example of natural selection and what it can accomplish. This chapter delves into how the eye works. Pinker explains its abilities by discussing illusions. We take advantage of optical illusions all the time, and how they work has been a fruitful way to study vision. Illusions take advantage of the fact that human vision is an “ill-posed problem.” As discussed in Chapter 1, such problems (also called inverse problems) have a known outcome (there is something we are seeing) and some known inputs (light activating rods and cones in our retina), but we must solve how to go from the inputs to the outcome. To do so requires some assumptions about the environment and knowledge of typical situations to determine what we are most likely seeing. Some of the assumptions are that surfaces have even markings or patterns, that objects are parallel and right-angled to the ground, and that objects have regular silhouettes. Whenever these assumptions are supposedly violated, we assume it is a matter of perspective and not that the object violates the assumptions. Because it is not a perfect representation of what we are seeing, vision can be exploited to create optical illusions.
One illusion called a stereogram takes advantage of stereoscopic vision, which is our ability to perceive depth because we have two eyes. The slightly different images presented to the right and left eye tell us the angles and distance of perceived objects. We can perceive some distance with only one eye, but we rely on motion and knowledge of the scene being observed, which isn’t always perfect. Stereograms present one image to the right eye and one image to the left eye, and the brain will create depth from these images. However, the eye muscles will naturally cross or spread the lens wider to accommodate the visual landscape (nearer images require more-crossed eyes), and the pupil naturally thickens or flattens to view objects closer or further away, respectively. These accommodations are coupled such that stereograms won’t work naturally. By using prisms and glass dividers and other physical manipulations, inventors have created devices that uncouple the two accommodations, allowing stereograms. There are a few natural stereograms, such as repeating patterns on wallpaper. The repeating images make the brain think that each eye is seeing the same image, producing a single image from multiple images that pops out with depth on the wall.
In daily life, we reconcile input from complex images using stereoscopic vision and many other visual abilities. Our brain must make a few assumptions to reconcile the input into useful and accurate images. First, every mark is in only one place at one time. Second, a dot in one eye will match only one other dot in the other eye. Third, two patches of input next to each other will most likely be from a smooth and cohesive object. Neighboring patches do not likely come from items far apart or drastically different. To resolve all the various inputs into a cohesive image, we also use constraint satisfaction, as discussed in Chapter 2. Guesses that violate any of our assumptions or don’t match any other guesses are discarded until a single solution is found.
Stereoscopic vision develops in infants through about four months of age, which is when they can see stereograms. Other creatures develop stereoscopic vision earlier, but typically the vision develops after a period of input from the outside world. Full development, especially in humans, requires input from the outside world. It is hypothesized that because human eyes will grow and move to a wider point on the face that isn’t pre-determined, they must be able to constantly adjust to the new eye positions until they reach their final location. Creatures such as rabbits who do not have a long time to reach adulthood often have fully developed vision when born. They have skills like stereoscopic vision earlier, but they also don’t have the apparatus to adjust to a different or changing environment. Human infants have some built-in flexibility to the visual environment. Part of the development of stereoscopic vision must do with neuronal wiring at birth. Neurons are not connected to input from a single eye right away. Instead, they develop an affinity for one eye over time. Without this specialization, stereoscopic vision doesn’t exist.
Specialization is also how the brain handles bringing all the pieces of information in an image together. There are “experts” in the brain that process lighting, shape, and reflectance. These experts are neurons that respond to these elements of an image, and each one contributes its processing to an executive that takes each individual input and, knowing each expert’s role, pieces together the whole image.
One issue with vision is how we deal with constantly changing input. As we move or the scene in front of us moves, we register a new version of what we are seeing, but the body needs to use that information to move around and avoid injury or attain a goal. When the rest of the body uses visual information, it needs that information to be accurate and not old by the time it is processed. One way we help this processing along is by using reference frames. The first reference is ourselves. We use our own position as a reference point, and the objects around us can be described in terms of their relation to our body. If they are moving, we can register that movement with their changes compared to our fixed reference frames. If our reference frame is moving, we know that and can adjust accordingly. Our “up-down” reference also has an interesting relation to gravity. We don’t adjust for gravity all the time, such as turning our view sideways when we lay down, but gravity is a constant reference for up and down. We can trust it will remain the same (unless we go to space).
Another important point is that shapes in the world are not all the same, and yet we recognize all suitcases as suitcases despite any differences in size or design. We store shape information in a more general form, described by geon theory, and use the more general information to assess what object we are likely seeing. We can use in-shape references (such as suitcases having handles on one of the thin, long sides) to consider all suitcases as suitcases, even when they are shifted in our visual field or designed a little differently.
Geons—the simple forms we used to understand the components of more complex shapes—do not explain everything. They can’t handle changes in the angle at which an object is viewed, and initial theories proposed that people store versions of objects from all angles so they can recognize the object when they come across it. However, the sheer number of representations of each object would be astronomical. More recent research indicates that we store several representations at the most likely angles we’ll encounter. We can rotate an image we come across that doesn’t fit any of those representations until it fits one.
Geons also are not sufficient with mirror images: We can’t rotate mirror images until they match, and yet we always recognize our left and right hands as hands. This issue is tied to our preferential processing of up-down and backward-forward aspects of images. We practically ignore left-right, and this lack of processing may be tied to our existence as symmetrical beings in the left-right plane. We would look the same to a predator that approached from the right or the left, and we assume that most other things will be that way as well. To process mirror images, Cooper and Shepard showed that we mentally rotate an image until it is upright and then determine which way it is pointing using our own bodies as a reference. Humans are slightly asymmetrical (we have a dominant hand), and this asymmetry gives us a reference for mirror images whenever needed.
Pinker and Michael Tarr conducted a study to examine the above theories of how we recognize shapes. They used similar but slightly different arrangements of lines and showed them to people at different orientations. After getting them used to a few orientations, they showed people the shapes at 24 different angles to determine how they processed these new orientations. If people created simple geons, then it wouldn’t matter how the shape was oriented, but if they stored representations and rotated them, their reaction time should increase as the shape strayed from the nearest stored representation. It turns out people use both. Shapes with similar sides are stored as geons and easily recognized in all forms, and more complex shapes are stored with multiple representations and rotated. One interesting finding from showing participants mirror images was that the angle mattered for standard images, but the angle did not matter for mirror images. People took the exact same amount of time to identify mirror images whether they were at a recognized angle or upright. The conclusion they drew was that people were flipping the image in the third dimension to match to the standard and identify it. Flipping happened with 2D images but not 3D images as people can’t process a fourth dimension to flip the shape. Mental rotation is indeed a process we use for identifying shapes.
Vision is such a vital sense to humans that we take it for granted, and few people understand the complexity of seeing just a simple picture, let alone moving objects or distant objects. In Chapter 4, Pinker pries into the complexity of vision and some of the ways researchers have studied it. We see many of the book’s themes in vision alone. We can see the appropriate flexibility in the way the mind can provide two solutions for identifying objects presented at an angle. This flexibility is important because we don’t always know how objects will come into our visual field, but they are just as important to identify regardless of their orientation. We see iteration in the way the mind builds an understanding of what it sees through processing first individual elements of the scene (boundaries, colors, lighting) and then putting those elements together. We see elegance in the way our visual system seamlessly performs these steps to give us uninterrupted and clear vision of the world around us. We also see elegance in the way the mind has solved visual problems such as movement, depth, and orientation.
One key point from Chapter 4 is that humans use mental imagery. Research has shown that the visual cortices are activated when we are imagining various scenes or objects, but mental imagery is somehow different from a real experience. Even though mental images produce similar or even greater cortical activation intensity, they are distinct from real experiences. We don’t want to confuse reality with imagination or hallucination, and yet we have a system that processes these elements through the same machinery and gives them similar activation intensity. This process mirrors Pinker’s contention that what makes us similar is more important than what makes us different. The fact that we process imaginary images and real images with the same structure is important. It speaks to human experience and likely plays an important role in how we use our minds, even if we don’t understand that role yet.
We place an odd limitation on our visual processing: We assume the world is like us and symmetrical in the left-right plane. A common thread throughout the book is that we start from what we know and build new knowledge as we have new experiences. Our own bodies are an excellent reference from which to start, and they are arguably a newborn infant’s only reference. However, the fact that we would reduce processing of the left-right plane such that even as adults we struggle with things that aren’t symmetrical on that plane appears odd. In making trade-offs, this is likely the best one to make as more creatures are symmetrical left to right than up-down or front-back. As Pinker mentions, vision is an ill-posed problem, and assumptions must be made to allow the problem to be solved. With our vision limited to three dimensions, it may be that one of our dimensions must be assumed to be constant (or more constant) to allow us to solve the other two. Had we developed in a world with different gravity or living deep in the ocean, we may not have sacrificed left-right processing.
By Steven Pinker