logo

67 pages 2 hours read

Brian Christian

The Alignment Problem: Machine Learning and Human Values

Nonfiction | Book | Adult | Published in 2020

A modern alternative to SparkNotes and CliffsNotes, SuperSummary offers high-quality Study Guides with detailed chapter summaries and analysis of major themes, characters, and more.

Part 1Chapter Summaries & Analyses

Part 1: “Prophecy”

Part 1, Chapter 1 Summary: “Representation”

Chapter 1 presents some of the preceding models upon which current AI systems are built—such as the neural network perceptron, the AlexNet network, image processing models, machine learning systems, and word-embedding models—together with the ethical concerns they bring.

In 1958, Frank Rosenblatt introduced the “perceptron” during a demonstration organized by the Office of Naval Research in Washington, D.C. This device could learn from its mistakes through adjustments after each error. The perceptron, a basic neural network, determined the position of colored squares on flashcards based solely on the binary data received from its camera. Rosenblatt’s presentation showcased the perceptron’s potential to learn and adapt through experience, described as a “self-induced change in the wiring diagram” (18). This idea pointed to the gap in understanding neural networks at the time the perceptron was introduced. The challenge of achieving “suitably connected” networks, a concept envisioned by earlier theorists like McCulloch and Pitts but not fully realized in practical applications, was eventually carried out by Rosenblatt by including “a model architecture” (18), which contains adjustable parameters that can be modified by use of an “optimization algorithm or training algorithm” (18).

Rosenblatt’s demonstration presented the perceptron as a foundational model for future machine learning systems, emphasizing its ability to form connections and make deductions from basic binary inputs, just like the human brain does. However, the limitations of Rosenblatt’s model were soon recognized by other researchers in the field. For example, Marvin Minsky and Seymour Papert, working at MIT, pointed out the perceptron’s inability to perform complex pattern recognition, a critical drawback that would limit the perceptron’s applicability. Due to the criticism of the perceptron model, the downfall of the research groups, and the withdrawal of financial support for research, the field of neural networks stalled in the 1970s.

Several decades later, in 2012, University of Toronto student Alex Krizhevsky, under the guidance of Geoffrey Hinton and in collaboration with Ilya Sutskever, developed a deep neural network that revolutionized image recognition. Utilizing a massive set of images from ImageNet (an organized image repository developed by Fei-Fei Li, a Princeton professor) and the computational power of GPUs (a graphics processor used in the gaming industry), Krizhevsky, with the help of Ilya Sutskever, built a model that significantly surpassed existing benchmarks in image-recognition accuracy. This model, known as AlexNet, employed innovative techniques such as data augmentation and dropout to improve performance. Despite the grueling process, marked by constant trial and error over weeks of continuous operation, their efforts culminated in successfully presenting AlexNet at the ImageNet Large Scale Visual Recognition Challenge workshop, in front of all the field’s leading researchers.

Following the success of AlexNet in image recognition, the implementation of such a model on a large scale started to uncover some fundamental issues. For example, in 2015, Jacky Alciné encountered racial bias in Google Photos’s labeling system, when it mistakenly tagged him and his Black friend in photos as “gorillas.” Alciné posted his findings on Twitter, prompting Google to quickly adjust the software, eventually removing the problematic label entirely. This incident showed that there are inherent flaws in the training data of AI systems, where inadequate or biased data can lead to discriminatory outcomes.

Through a historical example, Christian contextualizes the issues inherent in image processing dating back to before modern image recognition tools were invented. He includes the example of Frederick Douglass, the American abolitionist and author who escaped slavery at a young age, to discuss what Douglass recognized as the power of photography to counter racist caricatures prevalent in 19th-century America—unlike paintings that usually exaggerated racial features. Over time, photography’s standardization failed to accommodate diverse skin tones, leading to biased practices in color calibration like Kodak’s Shirley card (28), which emphasizes white skin. These biases persist and continue to inform modern photography and film, demonstrating systemic racial bias in visual media.

Machine learning systems are built using training data that acts as a foundational Shirley card, shaping their outputs. As Christian discusses, if this data lacks representation from certain groups or real-world scenarios, the systems can perpetuate biases. These biases manifest because these systems disproportionately represent the majority, skewing their predictions and decisions against minorities. Joy Buolamwini, an undergraduate student at Georgia Tech and later an Oxford and MIT researcher, encountered biases in facial recognition technology that failed to detect darker skin tones, including her own. Motivated by these experiences and the widespread potential for bias in deployed systems, Buolamwini focused her research on addressing these discrepancies. She analyzed and highlighted the significant racial and gender biases in commonly used datasets, advocating for more equitable and representative data in AI systems to prevent perpetuating these biases. Reaching out to various tech firms with her results, she only received a positive response from IBM, who took her findings seriously and committed to improving their error rate for portraying dark-skinned subjects.

Word-embedding models also use large datasets, which are often biased. Christian discusses such models as Google’s word2vec and Stanford’s GloVe, which are used to enhance functions such as search result ranking, language translation, and consumer response. These models generate word vectors by predicting nearby words and capturing rich, real-world semantic relationships. For instance, they can understand geographical and linguistic relationships, seamlessly associating “Czech” with “koruna” or transforming “big” in the context of “cold” to “colder” (37).

Despite their utility, these models have also unintentionally embedded social biases, particularly gender stereotypes, which could perpetuate discrimination in applications like CV screening. For example, traditional biases could influence a system to prefer male candidates over female candidates due to the model’s associations between gender and professions or names. The evidence of these biases has sparked significant concern among researchers, leading to research aimed at mitigating bias in word embeddings while retaining their beneficial attributes.

Efforts to reduce the bias (called “debiasing”) inherent in these models involve identifying and adjusting vectors that reflect problematic biases without undermining the model’s overall utility. This process, however, is complex and requires continuous refinement to balance accuracy with fairness. The challenges associated with debiasing have prompted interdisciplinary collaboration, blending techniques from machine learning with insights from social sciences to refine these models responsibly.

Part 1, Chapter 2 Summary: “Fairness”

This chapter discusses the historical and evolving use of numerical models to replace human judgment in criminal justice, focusing on the parole system. The chapter opens with the account of Hinton Clabaugh, the chairperson of the Parole Board of Illinois in 1927, and Ernest Burgess, a sociologist from the University of Chicago. Clabaugh commissioned a study by prominent universities to assess the parole system, which had become unpopular and was perceived as overly lenient. Ernest Burgess played an essential role in this effort, collecting data on parolees to identify factors predicting parole success or failure, and categorizing them into eight restrictive social categories. Burgess developed one of the earliest predictive models in criminal justice, aiming to scientifically improve parole decisions. He proposed using “summary sheets” for parolees to help the Parole Board quickly assess risks based on various factors, introducing a more objective approach than the subjective judgments previously used. His work suggested that human behavior had predictable elements that could enhance parole decision-making.

The adoption of Burgess’s scientific approach was initially slow, but it gained traction over the decades. By 1951, Illinois published a “Manual of Parole Prediction” (54), announcing the progress made in the 20 years of using statistical models for parole decisions. These tools were intended to make parole decisions more consistent and fairer, reducing reliance on potentially biased human judgment.

However, states were slow to adopt them. By 1970, only two states in the US used these tools, but interest grew after Tim Brennan, a Scottish-born statistician working for Unilever, moved from applying statistical models to market products to working in a university research environment to address first educational concerns and later to improve consistency in prison inmate classification. Together with Dave Wells, another researcher in the field, Brennan founded the company Northpointe, aiming to reform the justice system using statistical models. By 2000, half of the states used statistical models for parole decisions, with tools like COMPAS, developed by Brennan and Wells, becoming standard by 2011. However, concerns about the ethical implications and fairness of such predictive tools began to surface, leading to critical scrutiny and debates about the reliance on algorithms in the justice system.

One journalist who covered such ethical implications of statistical risk assessment systems was Julia Angwin, working for the investigative journalism platform, ProPublica. Shocked by the lack of substantiation behind data-driven tools like COMPAS used nationwide for criminal risk assessment, Angwin investigated the systems’ reliability and biases, discovering racial disparities in risk assessments, namely a bias against Black defendants. While Northpointe defended its system, pointing to the method of calibration, which implies that the rating system is maintained throughout the data regardless of the defendants’ race, the disparity in prediction was still sustained.

To address ProPublica’s diagnosis of the use of the COMPAS system in the justice system, Cynthia Dwork, a Harvard computer scientist known for pioneering “differential privacy,” a system of collecting data while protecting individuals’ privacy, started researching how data fairness could address systemic issues like racism and sexism. Working together with computer scientist Amos Fiat, Dwork explored the idea of “fairness” and how it could be mathematically defined and implemented in technology. Dwork’s engagement with fairness in computer science led to a broader exploration of the ways data is used and its societal implications. Together with other researchers in the field, such as Moritz Hardt and Helen Nissenbaum, Dwork continued the work on fairness.

Also in response to ProPublica’s findings about COMPAS’s racial biases, Cornell University computer scientist Jon Kleinberg and economist Sendhil Mullainathan examined pretrial detention decisions using machine learning, comparing them with human judges. They explored algorithmic concerns related to racial biases. Around the same time, Alexandra Chouldechova developed a visual dashboard to analyze risk-assessment tools, pondering the various notions of fairness.

This research spurred a significant academic response with multiple papers analyzing the compatibility of different fairness definitions. Kleinberg and his team demonstrated that satisfying both ProPublica’s fairness criteria and COMPAS’s calibration simultaneously was mathematically impossible unless both groups committed repeat offenses at the same rates. Chouldechova reached similar conclusions about the impossibility of achieving equal false positive and negative rates across groups with different recidivism frequencies.

These findings suggest an inherent challenge in using machine learning for judicial predictions. The research highlights the complexities and potential biases embedded in risk-assessment tools. Researchers in the field, such as Moritz Hardt, emphasize the need for continuous scrutiny and adjustment of these algorithms to address and mitigate discrimination effectively from outside of the field of computer science. The broader academic and policy discussions triggered by these analyses continue to explore how best to balance technical fairness with broader societal justice.

Part 1, Chapter 3 Summary: “Transparency”

Chapter 3 discusses the application of neural network models in the field of medicine and its implications. In the 1990s, Rich Caruana (at the time a graduate student at Carnegie Mellon) embarked on a critical project to predict patient survival rates for pneumonia using neural networks. This project, supervised by Tom Mitchell, aimed at improving hospital decisions on whether to treat pneumonia patients as inpatients or outpatients. The research involved a large interdisciplinary team and utilized a dataset of 15,000 patients, employing various machine-learning models. Caruana’s neural network emerged as the most accurate, surpassing other models and traditional statistical methods.

However, despite the neural network’s superior performance, Caruana and his team decided against its deployment due to the discovery of misleading correlations within simpler, rule-based models that were easier to interpret. For instance, one rule-based model suggested treating asthma patients as low-risk outpatients because historical data showed they had better survival rates. However, it was revealed that this was due to asthma patients receiving more attentive in-hospital treatments, including ICU care.

This discovery led to a broader discussion about the transparency and safety of using neural networks in healthcare. Caruana feared that the neural network, although powerful, might also learn similarly misleading correlations that were not as easily detectable as in rule-based systems. The potential for neural networks to accidentally learn and act on incorrect or harmful correlations without clear insight into their decision-making process posed significant risks. Thus, the decision to use a simpler and more interpretable model was a precaution against unintended and potentially dangerous outcomes based on opaque decision-making processes.

Caruana’s experience underlined the critical issue in AI and machine learning: the trade-off between accuracy and interpretability. It highlighted the importance of understanding the basis of a model’s decisions, particularly in fields like healthcare where decisions have significant consequences.

Other researchers in the machine learning community have grown increasingly concerned with the opacity of neural networks, often described as “black boxes” and their pervasive use in critical decision-making sectors, including defense, healthcare, and finance. The issue of machine learning models’ opacity became notably pressing as these models began to play a more significant role in strategic military and intelligence operations, leading to the development of models that users could more easily understand and trust.

Simultaneously, the European Union was advancing the General Data Protection Regulation (GDPR), which mandated that decisions made by algorithms, including loan approvals or parole denials, must be explainable to those affected. This regulation sparked significant concern among tech leaders about the feasibility of providing clear explanations for decisions derived from complex neural networks.

Discussing the transparency and necessity of complex AI models, Christian refers to the work of Robyn Dawes in the mid-20th century. Dawes, initially an ethics student, shifted to psychology due to his skepticism about the efficacy of intuitive methods like the Rorschach tests. His career pivot was influenced by a clinical error misdiagnosing a patient with a genetic condition as having a psychiatric delusion. This incident propelled him toward “mathematical psychology,” which means that he checked expert clinical judgment against simple mathematical models. The efficacy of straightforward statistical models often surpassed more complex and intuitive clinical assessments, suggesting that often, simpler models or even actuarial methods might be preferable for decision-making.

Further on, Dawes and his colleagues investigated why simple linear models were so effective in decision-making. Studies showed that even models mimicking a single expert’s judgment often outperformed the experts themselves. The researchers’ work pointed to the fact that the key to effective modeling lies in identifying the right variables to consider and then simply adding them up. This research implies that the right information for the models is not in individual decision-making but in the established practices of the field.

Carrying Dawes’s work further into present times, Cynthia Rudin, a computer scientist at Duke University, championed the use of simple, interpretable models over complex ones, particularly in the criminal justice system. In 2018, she demonstrated that a straightforward model could effectively predict recidivism, rivaling the more complex COMPAS system. Her approach advocates for models that are grounded in empirical data rather than clinical intuition. Rudin’s work extends to healthcare, where she criticizes existing models for being overly influenced by subjective clinical judgments rather than robust data. She emphasizes the importance of creating tools that enhance clarity and decision-making in practical settings, suggesting that even with vast amounts of data, simple models can often outperform their more complex counterparts.

As Christian explains, humans have visible sclera (the whites of the eyes), which evolutionary biologists believe are proof of the evolutionary need for cooperation through visible attention direction. In machine learning, the concept of “saliency” reflects this trait by identifying which parts of an image a system focuses on, enhancing the researchers’ understanding of its decision-making process. This method delivers surprising insights, such as systems focusing on unexpected areas of images, which challenge assumptions about their learning processes. Machine learning’s saliency methods reveal unintuitive focus areas, often emphasizing irrelevant training data aspects. In medical applications, training the machine through saliency methods has enhanced diagnostic accuracy and transparency, allowing networks to assist in complex disease detection with high precision. For example, a network retrained on a vast dataset of skin conditions outperformed dermatologists, proving its utility in enhancing global diagnostic capabilities and acting as a reliable second opinion. Such systems, however, require careful implementation to avoid misclassifications due to unexpected data cues.

However, saliency methods have not revealed anything about the “black box” (what happens inside a neural network model). Focusing on this problem, Matthew Zeiler and Rob Fergus at NYU developed deconvolution to visualize intermediate network layers, revealing how models process visual inputs. Their insights demonstrate both the potential and limitations of these models, leading to further innovations in machine learning visualization. These techniques, like Google’s DeepDream, allow for both artistic exploration and critical insights into how models perceive and categorize inputs, highlighting the importance of transparency and the ongoing challenge of understanding AI’s decision-making processes. Researchers such as Been Kim, a graduate student at MIT, advocate for incorporating human-computer interaction and cognitive science to enhance AI’s usability and trustworthiness. Such research work emphasizes the necessity of developing AI that communicates in human terms, with the final goal of making AI systems more interpretable and accountable.

Part 1 Analysis

Christian’s three examples of machine learning application mark the development of predictive models in different fields: Frank Rosenblatt’s perceptron, Ernest Burgess’s predictive models used in criminal justice, and Rich Caruana’s work on neural networks for predicting pneumonia survival. Christian utilizes each example to provide insight into the evolution, potential, and ethical considerations of artificial intelligence systems.

Christian establishes the historical progression of early AI models and frameworks in a variety of fields to provide context for contemporary Interdisciplinary Approaches to AI Development and Implementation. The perceptron introduced by Frank Rosenblatt, an early form of neural network models capable of learning from its errors to improve performance over time, represents a foundational step in machine learning, demonstrating a machine’s ability to adapt through experience by adjusting its weights—a concept still at the core of modern deep learning. Rosenblatt’s perceptron laid the groundwork for more sophisticated neural network architectures and machine learning frameworks. Despite its simplicity and limitations in handling complex pattern recognition, as noted by critics like Marvin Minsky, the perceptron marked a significant leap in understanding and leveraging computational models for learning tasks.

In the realm of criminal justice, Ernest Burgess’s work with predictive models for parole decisions exemplifies the early use of statistical methods to enhance decision-making processes traditionally dominated by human judgment. By analyzing data to predict parole outcomes, Burgess introduced a scientific method that could potentially offer more objective and consistent decisions compared to the subjective assessments previously used. Similarly, Rich Caruana’s work on predicting pneumonia survival rates using neural networks represents a modern approach to machine learning in healthcare. Despite the neural network’s superior accuracy over other models, the decision not to deploy it due to potential risks from non-transparent decision-making processes points to the difficulty of integrating advanced AI systems into sensitive fields like healthcare, setting up Christian’s thematic engagement with Ethical Implications of AI Use.

Across the three cases, Christian establishes a recurring motif in his book: the trade-off between the complexity of a model and its interpretability. Frank Rosenblatt’s perceptron, though ground-breaking, was limited by its simple linear nature, which could not solve problems requiring the comprehension of more complex patterns. This limitation reflects a fundamental challenge in AI: As models become more complex and capable, they often become less interpretable. In the context of criminal justice, the shift from human judgment to numerical models like those developed by Burgess aimed at reducing bias and improving the consistency of parole decisions. Christian’s nuanced discussion interrogates the fact that these models also risk oversimplifying the complexities of human behavior, potentially leading to decisions that might not account for interpretations beyond the data.

Rich Caruana’s project on pneumonia showed how advanced neural networks, despite their accuracy, could obscure insights into how decisions are made, which could lead to potentially dangerous outcomes if not properly understood and monitored. Christian highlights the decision to opt for simpler, more interpretable models, as advocated by Robyn Dawes and Cynthia Rudin, to emphasize an ongoing challenge in AI development: balancing performance with the ability to understand and control the model’s decision-making process.

Christian focuses his book on the ethical implications of deploying AI systems in high-stakes domains like healthcare and criminal justice to provide a nuanced picture of the ongoing challenges involved in AI development. Rosenblatt’s perceptron, though primarily a technological demonstration, indicated, even at that early time in the development of AI, the ethical and societal questions that would become more pressing in contemporary times. For example, the perceptron brought up issues of performing more complex operations, thus largely limiting the model to the data it has access to.

Christian highlights more explicit issues of fairness and the potential for systematic biases encoded within the models themselves through the example of Ernest Burgess’s predictive model in parole decision-making, which was shown to perpetuate and exacerbate existing disparities within the criminal justice system. The reliance on numerical models to make decisions about human freedom calls for further work to develop the transparency and accountability of AI systems used in legal contexts.

Rich Caruana’s decision to forgo the deployment of a more accurate but less interpretable neural network in favor of simpler models emphasizes the necessity of a prudent approach to AI deployment in healthcare. The potential for AI to make decisions based on spurious correlations or incomplete understandings of medical conditions raises significant ethical concerns about patient safety and the trustworthiness of AI systems.

Through the evolution of machine learning from Rosenblatt’s perceptron to modern neural networks used in healthcare, Christian portrays the rapid advancements and growing set of concerns raised by the implementation of AI technology. Each example reflects the ongoing dialogue between technological capabilities and theoretical understanding, highlighting the need for a balanced approach to AI development and use. Such theoretical understanding, however, relies on knowledge that researchers in computer science do not have. Thus, these examples also point to the field’s need to allow for input and criticism from other fields, such as sociology, philosophy, and psychology among others.

blurred text
blurred text
blurred text
blurred text