Times Literary Supplement: Lessons from the lab

Lessons from the lab

Teaching contentious classics: Sherif, Milgram and Harlow revisited

CAROL TAVRIS

The fiftieth anniversary last year of the Milgram experiments on obedience to authority provides a good vantage point from which to consider the eternal dilemma for instructors and textbook writers: how much time should we devote to the classics, and how should we teach them? In every generation, certain studies or approaches rise to prominence. But once they get planted in our books and lectures, they tend to become rooted there. Over time it gets harder to decide how much to prune – let alone whether it's time to uproot them altogether. We stop looking at the original studies closely, let alone critically; they just sit there in our courses like grand historical monuments. However, it's good to reexamine them for two important reasons: one is for our own sake, to refresh our memories and rethink their contributions; the other is for our students' sake. Students today are as eager to reject unflattering or counterintuitive portrayals of humanity as students were decades ago. Teaching the classics therefore means finding new ways of persuading students that these findings do apply to them despite the errors or limitations of the original studies.

The relationship between cultural events and research is a two-way street: an event may stimulate research, and research may influence the larger culture. The story of Kitty Genovese – murdered in New York in 1964, while thirty-eight witnesses allegedly watched from their windows and failed to call for help – still occupies a prominent position in social psychology, where it launched a long and productive line of experimental studies on bystander apathy, deindividuation and intervention. Thanks to two recent books and a critical reassessment that appeared in the American Psychologist in 2007, though, we now know that almost all of the details reported at the time were wrong. Turning Genovese's death into a story of urban alienation was largely the work of A. M. Rosenthal at the New York Times, who wanted the image of the "38 witnesses doing nothing" to be in the story's headline, where it quickly went the contemporary equivalent of viral. In fact, most of the neighbours who heard her screams could not see her – let alone watch her murder from their windows – and thought it was just another drunken domestic fight. As it turns out, only three neighbours understood the attack for what it was and failed to respond. Three is too many; but a newspaper story about three craven witnesses would not have been as shocking.

Should teachers eliminate the Kitty Genovese story out of embarrassment that we got it wrong or weren't sceptical enough? Not necessarily. Although the specifics of the story were wrong, its essence was true. The accurate Genovese story, however, raises an additional social and psychological lesson for students: what cautions does it offer for thinking critically about an equally sensational crime today? What does it reveal about the societal sources of anxiety that generate an urban legend? America at the time was undergoing political assassinations, race riots, the Vietnam War, and rising crime rates – Kitty Genovese's was one of 636 murders in New York that year. People were frightened, and the headline resonated.

With Kitty Genovese, an oversimplified but emotionally compelling narrative led to good research. It often works the other way, of course: good research can lead to an oversimplified and emotionally compelling narrative. Consider the case of Walter Mischel's "marshmallow study" of four-year-old children's ability to delay gratification. (Children were given a choice between eating one marshmallow right then or waiting a little while and getting two.) The part that got so much recent public attention was that children who had resisted temptation turned out years later to score as many as 210 points higher on their SATs than their less patient peers. Bingo!

The marshmallow study captured the public imagination just as Kitty Genovese had, but with a brighter, happier moral – one that suits our current cultural concerns. We can all see those kids in our mind's eye, and imagine what we would have done, faced with a now-or-later choice. And it has such an all-American, Calvinist moral: delay gratification and heaven will be thine. Yet Matthew Bourne's analysis in the New York Times revealed that one reason that this story resonated so much was the fact that inconvenient details about the actual scientific findings had been pruned away. Mischel's original studies focused on 653 toddlers, all of them attending the Bing Nursery School at Stanford, California, for children of professors and graduate students. The studies weren't originally designed to look at long-term outcomes; that idea occurred to Mischel much later, when he asked his own children, who had attended the Bing school, how his research subjects were faring in college. Of the original 653, he tracked down 185, of whom just ninety-four provided their SAT scores.

So we have a problem with the representativeness of the sample – originally and in the follow-up. How would that affect the results? An article in Cognition in 2012 by Celeste Kidd, Holly Palmeri and Richard N. Aslin showed that some children were more likely to eat the first marshmallow when, by virtue of their previous history, they had reason to doubt the researcher's promise to come back with a second one. For children raised in an unstable environment, they wrote, "the only guaranteed treats are the ones you have already swallowed", while children raised in a more stable environment might be willing to wait a few more minutes, confident that a second treat will materialize. And not only the stability of the environment matters: I'm an only child, so the effects of siblings didn't occur to me, until a friend said, "try the marshmallow study on those of us who have three siblings. You grabbed what you could, or else one of them got it".

To offer these caveats about the way the Mischel findings are so often told is not to debunk the excellent original study but to make students more excited about it – to alert them not only to what it found, but also to what it failed to find. The study and its popular response illustrate how, in science, each finding generates new questions; the importance of critical thinking in coming up with alternative hypotheses; and the importance of not oversimplifying. An ability to resist temptation is one factor among many that shapes our lives, but if you are a child from an unstable family, living in a tough environment, in poor health, it might not be the most important. It might not be a stable character trait. It might not even cross domains of experience.

Some of social psychology's great historical monuments – studies with a long-lasting influence – could not be replicated today: Muzafer Sherif's "Robbers Cave" study, Stanley Milgram's obedience experiments, and Harry Harlow's wire-and-cloth mother studies.

Between 1949 and 1954, Sherif and his colleagues used a Boy Scout camp in Oklahoma to test their theories of the origin and reduction of intergroup animosity and prejudice. They randomly assigned the twelve-year-old boys to one of two groups, the Eagles or the Rattlers, and set up competitive activities that quickly generated "us–them" thinking and mutual hostility. Later, to undo the warfare they had thus created, they set up situations in which the boys had to work together to achieve some shared goal.

When I revisited Sherif's original papers, I found that the study was not as rigorous as I had remembered. Most of the conclusions, Sherif wrote, "were reached on the basis of observational data" – confirmed by "sociometric choices and stereotype ratings". He said: "Observations made after several superordinate goals were introduced showed a sharp decrease in name-calling and derogation of the out-group common . . . in the contact situations without superordinate goals". (One of the pleasures of going back to read original studies is that of the unexpected discovery: The name-calling is so charmingly outdated. In 1949, boys put each other down by saying things like "all of them are stinkers" and calling their enemies "smart alecks".) Sherif did provide some numbers and percentages, a few chi squares, but this was a field study, with all of the uncontrollable variables that field studies can generate, and as "science" it would not meet today's standards. Was everything all hunky dory for the Eagles and Rattlers afterwards? The numbers of boys favourable towards the out-group improved, but the majority of boys in each group apparently maintained their hostility towards each other.

Yet Robbers Cave was and remains important for its central ideas. At the time, most psychologists did not understand, and most laypeople don't understand even today, that simply putting two competing, hostile groups together in the same room to, say, watch a movie, won't reduce their antagonism; that competitive situations generate hostility and stereotyping of the out-group; and that competition and hostility can be reversed, at least modestly, through cooperation in pursuit of shared goals. That's the story of Robbers Cave: it was true then, and it's true now.

In fact, just as the Kitty Genovese case spurred bystander-intervention experiments, Robbers Cave generated a great deal of research into the importance of superordinate goals. When Elliot Aronson went into the newly desegregated but hostile classrooms in Austin, Texas, where African American, Mexican American, and Anglo children were at war with each other, Sherif's findings strongly influenced his design of the "jigsaw" classroom. But Elliot went one better, using an experimental intervention and a control group. What a great coda to the Robbers Cave story – a direct link from Eagles and Rattlers, a made-up antipathy, to interethnic warfare in American schools, which is all too real and persisting.

Teaching the lessons of Stanley Milgram's experiments is far more complicated than Sherif's. Again, the cultural context of the times is crucial. In 1961, when Adolf Eichmann was claiming at his trial that he was "only following orders" in the murder of Jews during the Holocaust, Milgram began his effort to determine how many Americans would obey an authority figure when directly ordered to harm another human being.

Participants came to the Yale lab thinking they were part of an experiment on the effects of punishment on learning, and were instructed to administer increasing levels of shock to a "learner". The learner was a confederate of Milgram who did not receive any shocks, but played his part convincingly: as the study continued, he shouted in pain and pleaded to be released, according to an arranged script. To almost everyone's surprise at the time, some two-thirds of the participant "teachers" administered what they thought were the highest levels of shock, even though many were experiencing difficulty or distress doing so. Milgram's experiment produced a firestorm of protest about the possible psychological harm inflicted on the unwitting participants, and as a result, it could never be done today in its extreme version.

Some people hated the method and others the message, but the Milgram study has never faded from public attention and debate about it continues. Gina Perry, an Australian journalist whose book, Behind the Shock Machine (2012), aimed to discredit Milgram and his findings, interviewed everyone she could find who was connected to the original study, along with Milgram's critics and defenders. She also went through the archives of Milgram's voluminous unpublished papers.

Reinvestigations almost invariably yield some useful discoveries. Perry found violations of the research protocol: Over time, the man playing the experimenter began to drift off-script, urging reluctant subjects to keep going longer than he was supposed to. He pressed some people eight or nine times; one woman, twenty-six times. To my own dismay, I learned that Milgram committed what researchers, even then, would have considered a serious breach of ethics: He did not fully debrief subjects at the end of each experiment. They did meet the "learner" to shake hands and be assured that he was fine, but they were not told that all those escalating levels of shocks were completely fake, because Milgram was afraid the word would get out and invalidate future participants' behaviour. It was almost a year before subjects were given a full explanation. Some never got it; some never understood what the whole thing was about.

For critics like Perry, these flaws are reason enough to kick Milgram off his pedestal and out of our textbooks. I disagree. I think we need to give the Milgram experiment the prominent position we do, and for the same reason as when it was originally done. When I first read about Milgram's experiments in graduate school, I remember thinking, "Very clever, but why do we need them? Wasn't Nazi Germany evidence enough of obedience to authority?" But that was Milgram's point: in the early 1960s, Americans – and American psychologists – deeply believed in national character. Germans obeyed Hitler, it was widely assumed, because obedience was in the German psyche: look at all those high scores on the Authoritarian scale. It could never happen here.

For me, reading Perry's criticisms made it all the clearer why the Milgram experiments deserve their prominence. "Deep down, something about Milgram makes us uneasy", she writes. There is indeed something that makes everyone uneasy: the evidence that situations have power over our behaviour. This is a difficult message, and most students – indeed, most people – have trouble accepting it. "I would never have pulled those levers!" we cry. "I would have told that experimenter what a . . . stinker he is!" Perry insists that people's personalities and histories influence their actions. But Milgram never disputed that fact; his own research found that many participants resisted. Milgram wrote: "There is a tendency to think that everything a person does is due to the feelings or ideas within the person. However, scientists know that actions depend equally on the situation in which a man finds himself". Notice the "equally" in that sentence; many critics, like Perry, don't.

One of the original subjects in the experiments, called Bill, tried to explain to Perry why the studies were so valuable, and why he did not regret participating, although he was one of those who went on to the end. He hadn't thought about the experiment for twenty years, he said, until he began dating a psychology professor. She, thrilled to have met a living link to the experiment, asked him to speak to her class. "Well", Bill told Perry, "you would have thought Adolf Hitler walked in the room. I never really thought about it that way, you know?" Bill told the students, who were silently sitting in judgment on him: "It's very easy to sit back and say, 'I'd never do this or that' or 'Nobody could ever get me to do anything like that.' Well, guess what? Yes, they can".

That, of course, is the moral of the story. But the wall of hostility that Bill felt from the students means that they, along with critics like Perry, had failed to grasp that moral. They were reading about the experiment, seeing the films – and not understanding that they themselves might have been Bill.

My final classic is Harry Harlow's demonstration of the importance of contact comfort. Harlow took infant rhesus monkeys away from their mothers and raised them with a "wire mother", a forbidding construction of wires with a milk bottle connected to it, and a "cloth mother", a similar construction but one covered in foam rubber and terry cloth. At the time, it was widely believed (by psychologists, if not mothers) that babies become attached to their mother simply because mothers provide food. But Harlow's baby monkeys ran to the terry-cloth mother whenever they were frightened or startled, and clinging to it calmed them down. They went to the wire mother only for milk, and immediately abandoned it.

Every textbook tells this story, along with heartbreaking photos of the infant monkeys clinging to their cloth mother when a scary moving toy is put into their cage. Wasn't this discovery, like Milgram's, something "we all knew" – in this case, that infants need contact comfort even more than they need food if they are to flourish? Didn't we have enough data from René Spitz and John Bowlby's observations of abandoned infants warehoused in orphanages?

Well, no, apparently we didn't. As Deborah Blum describes in Love at Goon Park (2002), most American psychologists at the time were under the influence of either behaviourism or psychoanalysis, two apparently opposed philosophies that nonetheless shared a key belief: that the origin of a baby's attachment to the mother was through food. Behaviourists believed that healthy child development required positive reinforcement: the mother satisfies the baby's hunger drive; the baby becomes conditioned to associate the mother with food; mother and breast are equated. Interestingly, it was the Freudian view as well: no mother need be present, only a breast. "Love has its origin", Freud wrote, "in attachment to the satisfied need for nourishment." Why would cuddling be necessary? For the eminent behaviourist John Watson, cuddling was coddling.

But whereas Milgram's findings need constant reiteration in every generation, there is nothing surprising in Harlow's any more. One might say that the very success of his research makes teaching it unnecessary: no one would argue against Harlow's findings, as many students and laypeople always want to do with Milgram's. Adult humans could choose to walk out of Milgram's experiment at any point, and a third of them did. But the monkeys were captives, tortured by their isolation. In recent decades, psychologists have learned that "torture" is not an exaggeration to describe the experience of isolation for any primate. And to torture infants? It's horrible. But the fact that so many people think it is horrible now – and didn't then – is an extraordinary story in itself. How have we extended the moral circle to include other primates?

In 1973, as a young editor at Psychology Today, I interviewed Harlow. I walked through his lab with our photographer, Rod Kamitsuka, and looked aghast at an array of monkeys, each cowering alone in its own cage, electrodes on their heads. When Rod took a picture of one, it became wildly excited and fearful, careering around its tiny cage trying to escape. Rod and I were devastated, but Harlow was amused by us. "I study monkeys", Harlow said, "because they generalize better to people than rats do, and I have a basic fondness for people." I asked him what he thought of his critics who said that taking infants from their mothers was cruel and that the results did not justify the cruelty. He replied: "I think I am a soft-hearted person but I never developed a fondness for monkeys. Monkeys do not develop affection for people. And I find it impossible to love an animal that doesn't love back". Today, that sounds like lame moral reasoning: the fact that animals don't care for us is no justification for torturing them.

When I revisited Harlow's work, however, I was reminded of how many pioneering discoveries Harlow made, most of them lost in the telling of the main story of contact comfort. But he also demonstrated that monkeys use tools, solve problems, and learn because they are curious or interested in something, not just to get food or other rewards. He showed the importance of contact with peers, which can even overcome the detrimental effects of maternal deprivation. Harlow created a nuclear family unit for some of the monkeys and found that under those conditions, rhesus males became doting fathers – something they don't do in the wild.

Harlow was hardly the first to demonstrate the power of "mother love" and the necessity of contact comfort – and the devastation that ensues when an infant is untouched and unloved. Was experimenting with monkeys, by raising them in isolation with only wire or cloth mothers and causing them anguish that no observer could fail to see, essential to make the same point that Bowlby and Spitz had done? I don't know. What Harlow did, like Milgram, was to make his case dramatic, compelling, and scientifically incontrovertible. His evidence was based not on anecdote or observation, however persuasive, but on empirical, replicated data. As Blum showed, that is what it took to begin to undermine a scientific world view in which the need for touch and cuddling – physical expressions of mother love – had been so deeply ignored.

Harlow's work is a great contribution to psychology: it shows not only how we thought about mothers, but also how we thought about monkeys. It shows how dominant psychological perspectives influence our lives – in his day, behaviourism or psychoanalysis; in our day, genetics and brain chemistry – seeping into the questions we ask and the studies we do. The classics are living history, and we are not at the end of history by any means.

––––––––––––––––––––––––––––––––

This is an edited version of a lecture given at the Association for Psychological Science's annual convention on May 24.

Times Literary Supplement

Tuesday, 22 July 2014

Lessons from the lab

No comments:

Post a Comment