## Notes to Experimental Moral Philosophy

1. Others prominently expressing concern about the bearing of experimental results such as these on philosophers’ reliance on moral intuitions include Kwame Anthony Appiah (2008) and Peter Singer (2005).

2. This and related research are discussed in more detail in subsection 2.3.

3. Mediation analysis attempts to
determine whether one variable (the predictor) affects a second
variable (the outcome) by influencing a third, *mediating*
variable (Baron & Kenny 1986). Structural equation modeling
allows the analyst to assess and compare various models relating
predictors, outcomes, mediators, and moderators (Kline 2005).

4. See Nadelhoffer (2004, 2006); Knobe & Mendlow (2004); Knobe (2004a, 2004b, 2007); Pettit & Knobe (2009); Tannenbaum, Ditto, & Pizarro (2007); Beebe & Buckwalter (2010), Beebe & Jensen (2012); Alfano, Beebe, & Robinson (2012); Robinson, Stey, & Alfano (2013).

5. Such scales are named for their inventor, Rensis Likert [pronounced “LICK-urt”] (1932). The participant is presented a statement and then asked to agree or disagree with it on a numeric scale. Commonly, scales run from 1 to 7, 1 to 5, −3 to 3, or −2 to 2. Almost always, the endpoints are labeled ‘strongly disagree’ and ‘strongly agree’. Quite often, the midpoint is labeled ‘neither agree nor disagree’. Sometimes other points on the scale are labeled as well.

6. The idea that seemingly predictive and explanatory concepts might also have a normative component is not entirely original with Knobe; Bernard Williams pointed out that virtues and vices have such a dual nature (1985, 129).

7. Owen Flanagan (1991) considered some of the same evidence before Doris and Harman, but he was reluctant to draw the pessimistic conclusions they did about virtue ethics.

8. When it comes to explaining variance in behavior, the basic idea is that the statistical analysis of experimental results yields a correlation between a personality variable (such as extroversion) and a behavioral variable (such as an act of helping). Correlations range from −1 to +1. A correlation of 0 means that the individual variable is of literally no use in predicting the behavioral outcome; a correlation of 1 means that the individual variable is a perfect positive predictor; a correlation of −1 means that the individual variable is a perfect negative predictor. Actual correlations tend to be between −.3 and +.3. The amount of variance explained by a given predictor variable is the square of the correlation between that variable and the behavior in question. So, for instance, if extroversion is correlated with helping behavior at .25, then extroversion explains 6.25% of the variance in helping behavior. Although this is only one, rather simplistic, measure of explanatory power, personality variables do not look better on other measures, such as Cohen’s \(d\), \(\eta^2\), or partial-\(\eta^2\).

9. Merritt (2000) was the first to suggest that the situationist critique could be handled by offloading some of the responsibility for virtue onto the social environment in something like this way.

10. One might hope that philosophical reflection on ethics would promote moral behavior. Eric Schwitzgebel has recently begun to investigate whether professional ethicists behave better morally than their non-ethicist philosophical peers, and claims that, on most measures, the two groups are indistinguishable (Schwitzgebel 2009; Schwitzgebel & Rust 2010; Schwitzgebel et al. 2011).

11. See, for instance, Diener, Scollon, & Lucas (2003).

12. See Schimmack & Oishi (2005) for a critical reply, which argues that chronically accessible information is a much better predictor of life satisfaction responses than temporarily accessible information, such as how many dates one went on last week.

13. See May (2014) for a criticism of these findings, and Kelly (2011, especially chapter 1) for a comprehensive literature review.

14. The scandal over replication has (rightly or wrongly) assumed such proportions recently that John Doris has taken to calling it “Repligate”.

15. There is also an ongoing controversy surrounding null-hypothesis significance testing (NHST). In a nutshell, the problem is that a \(p\)-value is a conditional probability, but not the conditional probability that one might expect. A \(p\)-value is the probability that the result in hand would have been observed given the null hypothesis, i.e., given that nothing interesting is happening (no positive correlations, no negative correlations, no interaction effects, and so on). This is sometimes inverted by sloppy researchers and interpreters, who gloss the \(p\)-value as the probability of the null hypothesis given the observation. Symbolically, the difference is between P(observation | null) and P(null | observation). The latter, more desirable, conditional probability can be estimated using Bayesian statistical analysis, but seldom is (and there are controversies surrounding Bayesian analysis, especially the arbitrariness of prior probabilities). For an introduction to these problems, see Abelson (1997), Cohen (1994), and Wagenmakers et al. (2012).

17. In a recent critique of this kind of fallacious statistical thinking, Peter Austin, Muhammad Mamdani, David Juurlink, and Janet Hux (2006) describe statistical arguments purporting to show that Canadian patients’ astrological signs were often correlated with their pathologies. For instance, using the same statistical techniques favored by many experimental philosophers one would be led to conclude that Gemini are 30% more likely to be alcoholics \((p \lt 0.02)\), Scorpios have an 80% higher risk of developing leukemia \((p \lt 0.05)\), and Virgo women suffer 40% more from excessive vomiting during pregnancy \((p \lt 0.04)\). These are presumably statistical anomalies, not indicators of genuine health risks.

18. We are here indebted to Chris Heathwood.