Saturday, November 30, 2013

The Forgotten Role of Individual Differences in the Stanford Prison Experiment

The Stanford Prison Experiment did NOT show that strong situations overpower personality.
ResearchBlogging.org
The Stanford Prison experiment (SPE) is one of the most famous, or indeed infamous, studies in the history of psychology. The dramatic and horrifying result of the SPE have been used to draw rather sweeping conclusions about human nature and the psychology of evil. For example, the SPE supposedly illustrates the power of an abusive situation to induce good people to do evil things. In particular, Phil Zimbardo has argued that the study shows that strong situational forces can override individual differences in personality and moral values so that the latter count for very little. Indeed he has even claimed that virtually anybody at all who was put into a situation where they had power over others, such as guards have over prisoners, would act in a tyrannical and abusive way. Furthermore, the results of the SPE have been applied to prisoner abuse in Abu Ghraib. The influence of the SPE on psychology is all the more remarkable considering the obvious limitations of the study, such as its small sample size and the ad hoc way in which the experiment was conducted. Closer examination shows that the design of the SPE did not provide an adequate test of the role of individual differences in a simulated prison, and that no satisfactory account of the individual differences in behaviour shown by participants has been offered. Therefore, the popularly accepted conclusion that the SPE shows that “situational power triumphs over individual power in certain contexts” (Zimbardo, 2007) is quite unfounded.

Prisoner abuse at Abu Ghraib has been compared to events at Stanford. What are the real lessons? 

Dispositions vs. situations - opposed or complementary?
 The details of the SPE are fairly well-known and are explained in detail on Zimbardo’s website. Susan Krauss Whitbourne also provides a nice accessible summary of the study on her blog. When the study was first published, the stated rationale was to critique the “dispositional hypothesis” of why prison life is so deplorable (Haney, Banks, & Zimbardo, 1973). Briefly, the “dispositional hypothesis” supposedly blames the “nature” of the people who administer the prison system (e.g. the guards) and the “nature” of the people who populate it (the prisoners). That is, when guards act in a brutal manner it is because they are brutal people. Alternatively, prisoners are seen as naturally aggressive people unable to control their impulses, and therefore repressive measures are needed to control them. According to Haney et al., this dispositional hypothesis has been invoked both by those who defend the status quo (poor conditions in prisons are due to evil prisoners) and by critics of the system (poor conditions in prisons are due to sadistic guards). Supposedly, such simplistic explanations draw attention away from the complex social, economic, and political causes that really underlie this deplorable situation, and which are too difficult to change without radical social upheaval. A few years ago, a research paper proposed that self-selection might have influenced the outcomes of the SPE, because the sort of people who would willingly volunteer for a study on prison life might have distinctive personality traits that might predispose them to abusive behaviour (Carnahan & McFarland, 2007).[1] Haney and Zimbardo (2009) responded to this by attacking the influence of what they call “persistent dispositionalism” in psychology – “explaining context-driven socially problematic behavior in largely individualistic, trait-based terms, no matter how much evidence has been amassed to the contrary”.

The alternative hypothesis that Haney et al. present is a “situationist” one, which is the claim that powerful and oppressive social situational forces, such as occur in a prison, over-ride individual differences in personality and moral values, and induce ordinary decent people to act in abhorrent ways. Haney et al. (1973) attacked the idea that prisoner abuse is due to “bad seeds” and alternatively suggested that the prison system consists instead of “bad soil” that can corrupt anyone. In a more recent talk, Zimbardo has explained his belief about prisoner abuse at Abu Ghraib “I believed our soldiers were good apples that someone had put into a very bad barrel in that prison dungeon.”

Prisoner abuse at Abu Ghraib: good apples in a bad barrel? Did they not choose to behave the way they did? 

Personality is revealed rather than suppressed by situations

Note the apparent dichotomy here. A person’s behaviour in a situation such as a mock prison or even a real one is supposed to be due either to their internal dispositions or the external features of the situation, but not both. I think this a false dichotomy that has led to extreme and unfounded conclusions. Furthermore, it appears to be a straw man argument. When Haney et al. (1973) originally discussed the “dispositional hypothesis” they did not cite any references to show that this is a real hypothesis taken seriously by any genuine scholars. Perhaps certain naïve laypeople believe in it, but whether actual social scientists and psychologists do is not clear. Similarly, when Haney and Zimbardo (2009) attack “persistent dispositionalism” they seem to invoke a decades-old misconception that personality psychologists believe that behaviour can be understood primarily as a function of a person’s traits without serious consideration of the context of the person’s behaviour. On the contrary, personality psychologists have long maintained that a person’s behaviour is a function of both the features of the person and the features of the situation, not just one or the other. That is, personality psychologists argue that people generally make choices about how to behave in order to meet their needs within the constraints and opportunities inherent in particular situations.[2] Regarding the SPE in particular, the authors who argued that self-selection could have influenced the SPE’s outcome responded to the criticism of Haney and Zimbardo that they supposedly preferred “dispositionalist” explanations over situational ones, by acknowledging that features of the situation had a powerful influence on the behaviour of the participants (McFarland & Carnahan, 2009). What they were arguing was that traits might influence a person’s decision to participate in such a situation in the first place. Furthermore, they also argued that being in such a situation with people with similar personality traits would tend to amplify whatever tendencies one already had to be abusive. However, Zimbardo (2007) has argued for a more extreme situationist view, claiming that “a large body of evidence in social psychology supports the concept that situational power triumphs over individual power in certain contexts” and that bad situations can cause “good” people to do “evil” things. However an alternative view of the power of situations is that they provide opportunities that can reveal rather than suppress individual differences (Krueger, 2008). That is, put two different people with different desires in the same situation, and they will respond in accordance with their personal preferences, within whatever constraints are imposed by the demands of the situation. Let’s examine the actual findings of the SPE and see which view of situational power finds more support.

What really happened at Stanford

The SPE study sample consisted of 21[3] men who had been selected from a large pool of 75 volunteers based on psychological assessments to ensure their mental stability and lack of criminal history. One day prior to the study these 21 were assessed on ten different personality trait tests and then randomly assigned to the role of guard or prisoner – 11 to the former, 10 to the latter. On the whole it seems, the guards were pretty mean, and the prisoners became demoralised by their situation, and five of the latter had such adverse psychological reactions that they had to be released early. So far, sounds like a big win for the situationist account right? Participants acted the way they did based on their situationally defined roles, so the situation had a strong influence on their behaviour. However, I don’t think anyone is actually denying that situations influence behaviour. Zimbardo’s claim is that “situational power triumphs over individual power.” If this was the case, then we would expect that there was little or no variation in the way participants behaved in their respective roles as prisoners or guards. Did this really happen though?

According to the original report by Haney et al. there actually were notable individual differences in how the prisoners and guards behaved.
“Some guards were tough but fair (“played by the rules”), some went far beyond their roles to engage in creative cruelty and harassment, while a few were passive and rarely instigated any coercive control over the prisoners” (p. 81).
Apparently about a third of the guards (so about 3 or 4) were actively cruel, while those described as “passive” by Haney et al. have been described elsewhere as “good guards from the prisoner’s point of view since they did them small favors and were friendly”. Furthermore, although five prisoners broke down under the stress of being abused, the other five were more resilient.

Clothes make the man? (Image Source)

The role of personality traits - at first acknowledged, then later dismissed

The original report by Haney et al. does acknowledge that personality traits could moderate the effect of social situational variables, allaying or intensifying the latter’s effects. That is, individual differences in participants could influence how they respond to the perceived demands of their assigned role. When discussing the limitations of their study they even go so far as to admit that they could not adequately test whether a dispositional or a situational account provides a better explanation of their results and state that “We cannot say that personality differences do not have an important effect on behavior in situations such as the one reported here.” They acknowledge that a stronger test would involve comparing two conditions where participants were pre-selected for having more extreme personality traits. I suppose one way to do this would to set up two mock prisons for comparison, one featuring people selected for above-average kindness and compassion, the other one populated only with narcissists and psychopaths. If there were no differences in the behaviour shown in the two conditions (!) this would provide strong evidence that personality traits are not an important influence on behaviour in such a situation. However, they lacked the resources to perform such an experiment, which (hardly surprisingly) has not been done to this day.

In their more recent article though, Haney and Zimbardo (2009) summarily dismissed the role of individual differences, arguing that the precautions and controls they used in their original study were sufficient to lay to rest “any trait-based explanations of our findings” (emphasis added). Specifically, participants were assessed on a number of personality traits and found to score within the normal range for the general population. Additionally, guards and prisoners did not differ on any of these traits. And finally, these personality measures did not predict variations in behaviour within either the prisoner group or the guard group. Supposedly, these precautions should be enough to settle the matter for good.

On its face, such an assertion that the results from a single study of 21 people can permanently lay to rest “any” trait-based explanations seems to me like a breathtakingly bold dismissal that flies in the face of usual scientific practice. Such a small single study like this would normally be considered by most scientists just the beginning of enquiry into the matter not the end of it. Haney and Zimbardo offer no explanation of why individual differences occurred in people who were exposed to the same situation, yet claim that they have enough evidence to dismiss “any” trait-based explanation at all based on their statistical analysis of 21 people. Let’s examine the merits of their “precautions and controls.”

Weak arguments about strong situations

The first argument is that participants did not differ from the general population on their personality traits, and were therefore a fair sample of “normal” individuals. Eight of these measures comprised the Comrey Personality scales. According to a critique by McFarland and Carnahan (2009) none of these traits have ever been linked to abusive and aggressive behaviour. If this is correct, they would have been of no use in assessing whether the participants were “normal” with respect to their propensity to be abusive in a situation where they held power over others. The other two traits measured were authoritarianism and Machiavellianism (the propensity to manipulate others for one’s own gain), which would appear to be theoretically relevant to abusive behaviour. The original report by Haney et al. is actually silent on how their participants compared to the normal population on these measures. For some reason that is not made clear, the researchers used a non-standard scoring method for Machiavellianism that makes comparisons with the general population not possible. Carnahan and McFarland (2007) pointed out that participants actually did score higher on authoritarianism than the general population and their scores were actually comparable to those found in a study of actual prisoners in San Quentin. Haney and Zimbardo argued that the actual difference from the norm was fairly small, so whether it was enough to contribute to the actual behaviour of participants in their study was a moot point. Still, the matter has hardly been “laid to rest.” 

The second argument is that participants assigned to the prisoner and guard roles did not differ significantly on their personality traits. Apart from the miniscule sample size involved, which I will address shortly, I am tempted to respond “So, what?” Prisoners and guards were effectively in two different situations with differing opportunities and faced different challenges. For example, some of the guards disturbed the prisoners’ sleep by banging on their cell doors. The prisoners obviously did not have the opportunity to reciprocate this treatment, because the guards went home at the end of their shifts. So the prisoners could not engage in such abusive behaviour even if they had felt inclined to do so, because the opportunity was simply not there. As I have argued earlier, personality theorists propose that individual differences are relevant to how people respond to their circumstances, not that individual differences somehow allow people to transcend these circumstances and behave however they feel like.

Haney and Zimbardo’s third argument is that the behaviour of individuals within their respective roles of prisoner or guard could not be predicted from their personality scores. They do not deny that there were individual differences in behaviour, just that they could not predict them. I think this is their weakest argument of all. Remember that there were 11 guards and 10 prisoners. The guards’ behaviour in particular sorted them into three distinct types – good guards, tough but fair, and mean guards. So this means that in order to perform a statistical analysis we would have to compare three subgroups consisting of 3 – 4 individuals to determine if there were significant differences in their personality traits. Statistically this is laughable. A basic principle of statistics is that significant differences between groups can only be detected if the sample sizes are adequately large, and the sample sizes in the SPE are so small as to be completely inadequate for the purpose. Now, let’s say that I was a researcher who wanted to test the hypothesis that individual differences in personality traits could predict behaviour in an experimental situation such as an in a mock prison. (Let’s also assume that I knew in advance what personality traits were relevant to the outcomes concerned.) I could actually estimate in advance what sort of sample size I would need in order to have a reasonable chance of finding a significant result, if a real effect existed. Using a procedure known as power analysis, I can calculate that if personality traits had a medium-sized effect on behaviour (i.e. about average compared to most effects in psychology) resulting in three different behavioural subgroups I would need about 50 or so participants per subgroup (so 150 in total) to have an 80% chance of detecting a statistically significant effect if one actually existed. Even if the effects of personality were actually much larger than average, I would still need about 22 participants per subgroup, so 66 in total. Remember, that these numbers refer only to the number of guards. Presumably we would need an equivalent number of prisoners as well. This means that I could anticipate in advance that I would need a sample of between 132 to 300 participants to have a reasonable chance of getting a significant result. If for some reason I then decided to settle for a grand sample of 21 people - which would give me less than a 9% chance of finding a statistically significant result assuming a medium sized effect, and about a 15% chance assuming a large one - I would look rather foolish as such a tiny sample would not allow me to test my hypothesis in anything like a conclusive way. Haney et al. quite obviously did not have anywhere near enough statistical power to predict individual behavior from measured personality traits, so the fact that they could not do so reflects a defect in their methodology rather than some deep truth about the power of situations to overwhelm individual differences.

In spite of these limitations, the original report on the SPE does note a number of non-significant trends for personality traits to predict behaviour. Specifically, prisoners who stayed until the end of the study, compared to the five who left early, scored higher on extraversion, conformity, empathy, and authoritarianism (Haney, et al., 1973). A reasonable interpretation of this finding is that personality traits deserve further investigation with a larger sample to determine if these trends are robust or not. Haney et al. admitted in 1973 that the SPE was not actually designed to test the hypothesis that personality traits would predict individual differences in behaviour. Yet in spite of these inadequacies in the study design, Haney and Zimbardo argued in 2009 that “any trait-based explanations” of why participants behaved the way they did can be dismissed without any further consideration. This seems disingenuous as well as unreasonable.


Conclusions: the importance of choice
In summary, the purpose of the SPE was supposed to be to demonstrate that powerful situational forces could over-ride individual dispositions and choices, leading good people to do bad things simply because of the role they found themselves in. If this were true, then participants in the study should have acted in a uniform way depending on their role. However, this was not the case, participants acted like individuals, showing that they still had the capacity to make choices within the constraints of their situations. Furthermore, the study was not even designed to provide a fair assessment of the influence of personality traits in such a situation because the sample size was nowhere near large enough to justify any definite conclusions. Far from demonstrating that individual differences do not matter in how people behave in a strong situation, the study’s results illustrate that even in undeniably tough situations people still have the capacity to make choices and that these choices matter.


Footnotes

[1] This blog post mentions some interesting results of this study regarding self-selection.
[2] To be fair, Zimbardo has stated, for example on his blog, that be believes that behaviour is a function of both individual differences and situational factors. However, many of his published remarks indicate that he sees dispositional and situational factors as competing with each other to explain behaviour. Personality psychologists see this “competition” hypothesis as being based on a false dichotomy. See this blog post by David Funder for example for an explanation of why this dichotomy is not valid.
[3] The sample was originally 24. Two were asked to remain on standby and one withdrew before the study began. 

Please consider following me on Facebook, Google Plus, or Twitter.



This article also appears on Psychology Today on my blog Unique - Like Everybody Else.

© Scott McGreal. Please do not reproduce without permission. Brief excerpts may be quoted as long as a link to the original article is provided. Any version of this article appearing on sites other than Eye on Psych or my blog at Psychology Today has been ripped off without my consent.

Follow up articles critiquing situationism that discuss the SPE
Challenging the "Banality" of Evil and of Heroism, Part 1 and Part 2. This pair of articles refutes Zimbardo's claim that heroic and evil acts are equally "banal" outcomes of situational factors and that qualities within a person are of no real importance. 

Further interesting reading
Don’t blame Milgram by David Funder – debunks the popular claim that Milgram’s obedience studies show that the “power of the situation” overwhelms the “power of the person”.

References
Carnahan T, & McFarland S (2007). Revisiting the Stanford prison experiment: could participant self-selection have led to the cruelty? Personality & social psychology bulletin, 33 (5), 603-14 PMID: 17440210
Haney, C., Banks, C., & Zimbardo, P. G. (1973). Interpersonal dynamics in a simulated prison. International Journal of Criminology and Penology, 1, 69-97.
Haney, C., & Zimbardo, P. G. (2009). Persistent Dispositionalism in Interactionist Clothing: Fundamental Attribution Error in Explaining Prison Abuse. Personality and Social Psychology Bulletin, 35(6), 807-814. doi: 10.1177/0146167208322864
Krueger, J. I. (2008). Lucifer's last laugh. The American Journal of Psychology, 121, 335-341.
McFarland, S., & Carnahan, T. (2009). A Situation's First Powers Are Attracting Volunteers and Selecting Participants: A Reply to Haney and Zimbardo (2009). Personality and Social Psychology Bulletin, 35(6), 815-818. doi: 10.1177/0146167209334781
Zimbardo, P. G. (2007). The Lucifer Effect: Understanding How Good People Turn Evil (1st ed.). New York: Random House.