## Supplement 3. Further Topics in Causal Inference

This supplement briefly surveys some more advanced topics in causal inference, and point to some references. Two recent review articles, Eberhardt 2017 and Spirtes & Zhang (2016), survey many of these topics.

Portability: We are often interested in exporting a causal inference made in one context to a novel context. For example, we may conduct an experiment establishing that a certain development program, such as micro-lending, is successful in one country. We may then be interested in whether the same program will be successful in a different country, with different social institutions. Bareinboim and Pearl (2013, 2014) explore conditions in which this is possible.

Inferences from sample data: We have largely focused on what can be inferred about causal structure if one knows the true probability distribution. In practice, of course, we must make inferences from finite sample data. This raises particular problems for causal inference. Standard statistical methods allow us to reject the hypothesis that two variables are probabilistically independent, but they never allow us to accept such a hypothesis (or reject probabilistic dependence). This raises the issue of how we might confirm a causal hypothesis that entails a relation of probabilistic independence between variables. Strategies include assigning Bayesian prior probabilities and updating (Claassen & Heskes 2012; Cooper & Herskovits 1992; Geiger & Heckerman 1994), using cost functions that weigh goodness of fit with data against complexity (Hyttinen et al. 2014; Triantafillou & Tsamardinos 2015), and learning-theoretic approaches (Schulte et al. 2010).

Relational causal models: As mentioned in the previous paragraph, causal inference frequently requires that we make inferences about a probability distribution from sample data. The most commonly employed methods assume that sampling is independent. This independence assumption can be violated when individuals in the sample population stand in certain kinds of causal or non-causal relationships. For instance, suppose that we are conducting an epidemiological study about risk factors for a particular disease. If some of the subjects in our study have come in contact with one another, they may spread the disease among themselves. This is a case of inter-unit causation. Or suppose that we wish to study the factors that influence which academic authors are widely cited. If two of the subjects in our study are co-authors, their citation rates will not be independent. In this case, the relationship between the subjects in the study population is non-causal. Representative work on this topic includes Maier et al. 2010, Shalizi & Thomas 2011; Schulte & Khosravi 2012; and Maier et al. 2013.

Combining evidence from different studies: Sometimes we have data available from different studies. These studies may employ different methods; e.g., some may involve experimental interventions while others are observational. They may investigate overlapping but non-identical sets of variables. And they may reach incompatible conclusions. Strategies for drawing inferences about causal structure from diverse studies involve deriving constraints from individual studies and searching for causal models that optimally satisfy the constraints. See, e.g., Hyttinen et al. 2014; Tillman & Eberhardt 2014; and Bareinboim & Pearl 2015.

Time series: Often we are interested in tracking the state of a system over a period of time. In econometrics, we may be interested in changes in inflation, interest rates, unemployment, and government spending from month to month or year to year. Unemployment rates in one month may affect inflation in the next month, and it may also affect unemployment in the following month. To represent such a system, we can have multiple copies of the same set of variables: e.g., we might have different variables for the rate of unemployment in May, and for the rate in June. Then we could look for causal relationships among these time-indexed variables. But complications can arise if we do not have separate observations of each time period. For instance, perhaps we only have quarterly data on unemployment, inflation, and so on, while unemployment has an impact on inflation rates within one month. Or perhaps we only have aggregate data that combines observations from multiple time periods. Another problem is that there may be latent common causes that are changing over the course of the time scales that we are observing. See Eichler 2012 for an overview of these issues, and Danks & Plis 2014; Gong et al. 2015; Hyttinen et al. 2016; and Gong et al. 2017 for some recent approaches.

Dynamic systems: In physics and other sciences, we are often interested in modeling the evolution of a system over time. The state of the system at a time is represented by a set of variables, and the way in which these variables change over time is represented by a set of differential equations. These systems can also be represented using causal models with discrete temporal stages. One question we may ask about these systems is whether they will evolve toward a stable equilibrium. Another question is how the causal relations governing the evolution of the system relates to the causal relations governing the equilibrium state. For a simple example, increasing the rate at which water pours into a cup at time $$t$$ may affect the amount of water present in the cup at time $$t + 1$$; but increasing the rate at which water pours into the cup over a period of time will not affect the volume of water that is present at equilibrium (when the flow of water in equals the flow of water out), which is determined solely by the capacity of the cup. Modeling such dynamic systems raises a number of conceptual and technical challenges. In particular, it requires greater flexibility in the way we represent interventions. For instance, we must distinguish between interventions that change the value of a variable at one time, and interventions that fix the value of the variable at all times. Also, the variables characterizing dynamic systems typically include time derivatives, or discrete differences, of other variables. For instance, to model the trajectory of a body in classical physics, we must represent both its position and its velocity at each time. This means that it will not be possible to intervene on all of the variables independently: an intervention that fixes the position of the body over time will also set the velocity of the body to zero. So our representation will need to encode information about which variables are mathematically related to others as derivatives or integrals, and we will need a mechanism to capture the effects of interventions on such variables. Work on these problems was pioneered by Denver Dash, see for example Dash & Druzdzel 2001. See also Mooij et al. 2013 for a more recent discussion.

Cycles: Actual causation is usually assumed to be asymmetric: if $$C$$ causes $$E$$ then $$E$$ does not cause $$C$$. But at the general level, there can be cycles. For example, in familiar supply and demand models of economics, the price of a good (such as an iPhone) affects the level of demand for that good; and the level of demand for the good affects its price. If we add time indices to the variables, it will be possible to eliminate the cycle. For instance, if Apple drops the price of an iPhone at noon on a Monday, that will cause demand to increase starting on Monday afternoon. If demand for iPhones suddenly increases, that may cause Apple to increase the price shortly afterward. However, if we collect data on prices and demand levels over a period of time, it may not be possible to separate out the variables in this way. In this case, we would represent the causal relationships under investigation with a graph that includes cycles. If we assume that the underlying causal relationships are linear, we can still infer a good deal about models with cycles. But the general case poses serious challenges. See Hyttinen et al. 2013b; Neal 2000; Pearl & Dechter 1996; and Spirtes 1995.

Macro-variables: A sample of a gas consists of a huge number of molecules, each of which has a position and momentum (if we ignore quantum mechanics). But we can predict many features of the behavior of the gas using macro-variables like pressure and temperature. Methods exist for determining when it is possible to construct macro-variables for use in causal inference. See Chalupka et al. 2017 for methods and applications.

In addition to these topics in causal inference, there is much important work on the use of Bayes nets for computations. Pearl (1988) is a technical work on Bayes nets and other graphical representations of probability, although it is not focused on causation. Neapolitan 2004 is a text book that treats Bayes nets in causal and noncausal contexts. Neapolitan & Jiang 2016 is a short overview of this topic.