Associationist Theories of Thought

First published Tue Mar 17, 2015; substantive revision Wed Jun 24, 2020

Associationism is one of the oldest, and, in some form or another, most widely held theories of thought. Associationism has been the engine behind empiricism for centuries, from the British Empiricists through the Behaviorists and modern day Connectionists. Nevertheless, “associationism” does not refer to one particular theory of cognition per se, but rather a constellation of related though separable theses. What ties these theses together is a commitment to a certain arationality of thought: a creature’s mental states are associated because of some facts about its causal history, and having these mental states associated entails that bringing one of a pair of associates to mind will, ceteris paribus, ensure that the other also becomes activated.

1. What is Associationism?

Associationism is a theory that connects learning to thought based on principles of the organism’s causal history. Since its early roots, associationists have sought to use the history of an organism’s experience as the main sculptor of cognitive architecture. In its most basic form, associationism has claimed that pairs of thoughts become associated based on the organism’s past experience. So, for example, a basic form of associationism (such as Hume’s) might claim that the frequency with which an organism has come into contact with Xs and Ys in one’s environment determines the frequency with which thoughts about Xs and thoughts about Ys will arise together in the organism’s future.

Associationism’s popularity is in part due to how many different masters it can serve. In particular, associationism can be used as a theory of learning (e.g., as in behaviorist theorizing), a theory of thinking (as in Jamesian “streams of thought”), a theory of mental structures (e.g., as in concept pairs), and a theory of the implementation of thought (e.g., as in connectionism). All these theories are separable, but share a related, empiricist-friendly core. As used here, a “pure associationist” will refer to one who holds associationist theories of learning, thinking, mental structure, and implementation. The “pure associationist” is a somewhat idealized position, one that no particular theorist may have ever held, but many have approximated to differing degrees (e.g., Locke 1690/1975; Hume 1738/1975; Thorndike 1911; Skinner 1953; Hull 1943; Churchland 1986, 1989; Churchland and Sejnowski 1990; Smolensky 1988; Elman 1991; Elman et al. 1996; McClelland et al. 2010; Rydell and McConnell 2006; Fazio 2007).

Outside of these core uses of associationism the movement has also been closely aligned with a number of different doctrines over the years: empiricism, behaviorism, anti-representationalism (i.e., skepticism about the necessity of representational realism in psychological explanation), gradual learning, and domain-general learning. All of these theses are dissociable from core associationist thought (see section 7). While one can be an associationist without holding those theses, some of those theses imply associationism to differing degrees. These extra theses’ historical and sociological ties to associationism are strong, and so will be intermittently discussed below.

2. Associationism as a Theory of Mental Processes: The Empiricist Connection

Empiricism is a general theoretical outlook, which tends to offer a theory of learning to explain as much of our mental life as possible. From the British empiricists through Skinner and the behaviorists (see the entry on behaviorism) the main focus has been arguing for the acquisition of concepts (for the empiricists’ “Ideas”, for the behaviorists “responses”) through learning. However, the mental processes that underwrite such learning are almost never themselves posited to be learned.[1] So winnowing down the amount of mental processes one has to posit limits the amount of innate machinery with which the theorist is saddled. Associationism, in its original form as in Hume (1738/1975), was put forward as a theory of mental processes. Associationists’ attempt to answer the question of how many mental processes there are by positing only a single mental process: the ability to associate ideas.[2]

Of course, thinkers execute many different types of cognitive acts, so if there is only one mental process, the ability to associate, that process must be flexible enough to accomplish a wide range of cognitive work. In particular, it must be able to account for learning and thinking. Accordingly, associationism has been utilized on both fronts. We will first discuss the theory of learning and then, after analyzing that theory and seeing what is putatively learned, we will return to the associationist theory of thinking.

3. Associationism as a Theory of Learning

In one of its senses, “associationism” refers to a theory of how organisms acquire concepts, associative structures, response biases, and even propositional knowledge. It is commonly acknowledged that associationism took hold after the publishing of John Locke’s Essay Concerning Human Understanding (1690/1975).[3] However, Locke’s comments on associationism were terse (though fertile), and did not address learning to any great degree. The first serious attempt to detail associationism as a theory of learning was given by Hume in the Treatise of Human Nature (1738/1975).[4] Hume’s associationism was, first and foremost, a theory connecting how perceptions (“Impressions”) determined trains of thought (successions of “Ideas”). Hume’s empiricism, as enshrined in the Copy Principle,[5] demanded that there were no Ideas in the mind that were not first given in experience. For Hume, the principles of association constrained the functional role of Ideas once they were copied from Impressions: if Impressions IM1 and IM2 were associated in perception, then their corresponding Ideas, ID1 and ID2 would also become associated. In other words, the ordering of Ideas was determined by the ordering of the Impressions that caused the Ideas to arise.

Hume’s theory then needs to analyze what types of associative relations between Impressions mattered for determining the ordering of Ideas. Hume’s analysis consisted of three types of associative relations: cause and effect, contiguity, and resemblance. If two Impressions instantiated one of these associative relations, then their corresponding Ideas would mimic the same instantiation.[6] For instance, if Impression IM1 was cotemporaneous with Impression IM2, then (ceteris paribus) their corresponding Ideas, ID1 and ID2, would become associated.

As stated, Hume’s associationism was mostly a way of determining the functional profile of Ideas. But we have not yet said what it is for two Ideas to be associated (for that see section 4). Instead, one can see Hume’s contribution as introducing a very influential type of learning—associative learning—for Hume’s theory purports to explain how we learn to associate certain Ideas. We can abstract away from Hume’s framework of ideas and his account of the specific relations that underlie associative learning, and state the theory of associative learning more generally: if two contents of experiences, X and Y, instantiate some associative relation, R, then those contents will become associated, so that future activations of X will tend to bring about activations of Y. The associationist then has to explain what relation R amounts to. The Humean form of associative learning (where R is equated with cause and effect, contiguity, or resemblance) has been hugely influential, informing the accounts of those such as Jeremy Bentham, J.S. Mill, and Alexander Bain (see, e.g., the entries on John Stuart Mill and 19th Century Scottish Philosophy).[7]

Associative learning didn’t hit its stride until the work of Ivan Pavlov, which spurred the subsequent rise of the behaviorist movement in psychology. Pavlov introduced the concept of classical conditioning as a modernized version of associative learning. For Pavlov, classical conditioning was in part an experimental paradigm for teaching animals to learn new associations between stimuli. The general method of learning was to pair an unconditioned stimulus (US) with a novel stimulus. An unconditioned stimulus is just a stimulus that instinctively, without training, provokes a response in an organism. Since this response is not itself learned, the response is referred to as an “unconditioned response” (UR). In Pavlov’s canonical experiment, the US was a meat powder, as the smell of meat automatically brought about salivation (UR) in his canine subjects. The US is then paired with a neutral stimulus, such as a bell. Over time, the contiguity between the US and the neutral stimulus causes the neutral stimulus to provoke the same response as the US. Once the bell starts to provoke salivation, the bell has become a “conditioned stimulus” (CS) and the salivating, when prompted by the bell alone, a “conditioned response” (CR). The associative learning here is learning to form new stimulus-response pairs between the bell and the salivation.[8]

Classical conditioning is a fairly circumscribed process. It is a “stimulus substitution” paradigm where one stimulus can be swapped for another to provoke a response.[9] However, the responses that are provoked are supposed to remain unchanged; all that changes is the stimulus that gets associated with the response. Thus, classical conditioning seemed to some to be too restrictive to explain the panoply of novel behavior organisms appear to execute.[10]

Edward Thorndike’s research with cats in puzzle boxes broadened the theory of associative learning by introducing the notion of consequences to associative learning. Thorndike expanded the notion of associative learning beyond instinctual behaviors and sensory substitution to genuinely novel behaviors. Thorndike’s experiments initially probed, e.g., how cats learned to lift a lever to escape the “puzzle boxes” (the forbearer to “Skinner boxes”) that they were trapped in. The cats’ behaviors, such as attempting to lift a lever, were not themselves instinctual behaviors like the URs of Pavlov’s experiments. Additionally, the cats’ behaviors were shaped by the consequences that they brought on. For Thorndike it was because lifting the lever caused the door to open that the cats learned the connection between the lever and the door. This new view of learning, operant conditioning (for the organism is “operating” on its environment), was not merely the passive learning of Pavlov, but a species-nonspecific, general, active theory of learning.

This research culminated in Thorndike’s famous “Law of Effect” (1911), the first canonical psychological law of associationist learning. It asserted that responses that are accompanied by the organism feeling satisfied will, ceteris paribus, be more likely to be associated with the situation in which the behavior was executed, whereas responses that are accompanied with a feeling of discomfort to the animal will, ceteris paribus, make the response less likely to occur when the organism encounters the same situation.[11] The greater the positive or negative feelings produced, the greater the likelihood that the behavior will be evinced. To this Thorndike added the “Law of Exercise”, that responses to situations will, ceteris paribus, be more connected to those situations in proportion to the frequency of past pairings between situation and response. Thorndike’s paradigm was popularized and extended by B.F. Skinner (see, e.g., Skinner 1953) who stressed the notion not just of consequences but of reinforcement as the basis of forming associations. For Skinner, a behavior would get associated with a situation according to the frequency and strength of reinforcement that would arise as a consequence of the behavior.

Since the days of Skinner, associative learning has come in many different variations. But what all varieties should share with their historical predecessors is that associative learning is supposed to mirror the contingencies in the world without adding additional structure to them (see section 9 for some examples of when supposedly associative theories smuggle in extra structure). The question of what contingencies associative learning detects (that is, one’s preferred analysis of what the associative relation R is), is up for debate and changes between theorists.

The final widely shared, though less central, property of associative learning concerns the domain generality of associative learning. Domain generality’s prevalence among associationists is due in large part to their traditional empiricist allegiances: excising domain-specific learning mechanisms constrains the amount of innate mental processes one has to posit. Thus it is no surprise to find that both Hume and Pavlov assumed that associative learning could be used to acquire associations between any contents, regardless of the types of contents they were. For example, Pavlov writes,

Any natural phenomenon chosen at will may be converted into a conditioned stimulus. Any ocular stimulus, any desired sound, any odor, and the stimulation of any portion of the skin, whether by mechanical means or by the application of heat or cold never failed to stimulate the salivary glands. (Pavlov 1906: 615)

For Pavlov the content of the CS doesn’t matter. Any content will do, as long as it bears the right functional relationship in the organism’s learning history. In that sense, the learning is domain general—it matters not what the content is, just the role it plays (for more on this topic, see section 9.4).[12]

4. Associationism as a Theory of Mental Structure

Associative learning amounts to a constellation of related views that interprets learning as associating stimuli with responses (in operant conditioning), or stimuli with other stimuli (in classical conditioning), or stimuli with valences (in evaluative conditioning).[13] Associative learning accounts raise the question: when one learns to associate contents X and Y because, e.g., previous experiences with Xs and Ys instantiated R, how does one store the information that X and Y are associated?[14] A highly contrived sample answer to this question would be that a thinker learns an explicitly represented unconscious conditional rule that states “when a token of x is activated, then also activate a token of y”. Instead of such a highly intellectualized response, associationists have found a natural (though by no means necessary, see section 4.2) complementary view that the information is stored in an associative structure.

An associative structure describes the type of bond that connects two distinct mental states.[15] An example of such a structure is the associative pair salt/pepper.[16] The associative structure is defined, in the first instance, functionally: if X and Y form an associative structure, then, ceteris paribus, activations of mental state X bring about mental state Y and vice versa without the mediation of any other psychological states (such as an explicitly represented rule telling the system to activate a concept because its associate has been activated).[17] In other words, saying that two concepts are associated amounts to saying that there is a reliable, psychologically basic causal relation that holds between them—the activation of one of the concepts causes the activation of the other. So, saying that someone harbors the structure salt/pepper amounts to saying that activations of salt will cause activations of pepper (and vice versa) without the aid of any other cognitive states.

Associative structures are most naturally contrasted with propositional structures. A pure associationist is opposed to propositional structures—strings of mental representations that express a proposition—because propositionally structured mental representations have structure over and above the mere associative bond between two concepts. Take, for example, the associative structure green/toucan. This structure does not predicate green onto toucan. If we know that a mind has an associative bond between green and toucan, then we know that activating one of those concepts leads to the activation of the other. A pure associative theory rules out predication, for propositional structures aren’t just strings of associations. “Association” (in associative structures) just denotes a causal relation among mental representations, whereas predication (roughly) expresses a relation between things in the world (or intentional contents that specify external relations). Saying that someone has an associative thought green/toucan tells you something about the causal and temporal sequences of the activation of concepts in one’s mind; saying that someone has the thought there is a green toucan tells you that a person is predicating greenness of a particular toucan (see Fodor 2003: 91–94, for an expansion of this point).

Associative structures needn’t just hold between simple concepts. One might have reason to posit associative structures between propositional elements (see section 5) or between concepts and valences (see section 8). But none of the proceeding is meant to imply that all structures are associative or propositional—there are other representational formats that the mind might harbor (e.g., analog magnitudes or iconic structures; see Camp 2007; Quilty-Dunn forthcoming). For instance, not all semantically related concepts are harbored in associative structures. Semantically related concepts may in fact also be directly associated (as in doctor/nurse) or they may not (as in horse/zebra; see Perea and Rosa 2002). The difference in structure is not just a theoretical possibility, as these different structures have different functional profiles: for example, conditioned associations appear to last longer than semantic associations do in subjects with dementia (Glosser and Friedman 1991).

4.1 Associative Symmetry

The analysis of associative structures implies that, ceteris paribus, associations are symmetric in their causal effects: if a thinker has a bond between salt/pepper, then salt should bring about pepper just as well as pepper brings about salt (for extensive discussion of the symmetry point see Quilty-Dunn and Mandelbaum 2019). But all else is rarely equal. For example, behaviorists such as Thorndike, Hull, and Skinner knew that the order of learning affected the causal sequence of recall: if one is always hearing “salt and pepper” then salt will be more poised to activate pepper than pepper to activate salt. So, included in the ceteris paribus clause in the analysis of associative structures is the idealization that the learning of the associative elements was equally well randomized in order.

Similarly, associative symmetry is violated when there are differing amounts of associative connections between the individual associated elements. For example, in the green/toucan case, most thinkers will have many more associations stemming from green than stemming from toucan. Suppose we have a thinker that only associates toucan with green, but associates green with a large host of other concepts (e.g., grass, vegetables, tea, kermit, seasickness, moss, mold, lantern, ireland, etc). In this case one can expect that toucan will more quickly activate green than green will activate toucan, for the former bond will have its activation strength less weakened amongst other associates than the latter will.

4.2 Activation Maps of Associative Structure

An associative activation map (sometimes called a “spreading activation” map, Collins and Luftus 1975) is a mapping for a single thinker of all the associative connections between concepts.[18] There are many ways of operationalizing associative connections. In the abstract, a psychologist will attempt to probe which concepts (or other mental elements) activate which other concepts (or elements). Imagine a subject who is asked to say whether a string of letters constitutes a word or not, which is the typical goal given to subjects in a “lexical decision task”. If a subject has just seen the word “mouse”, we assume that the concept mouse was activated. If the subject is then quicker to say that, e.g., “cursor” is a word than the subject is to say that “toaster” is, then we can infer that cursor was primed, and is thus associatively related to mouse, in this thinker. Likewise, if we find that “rodent” is also responded to quicker, then we know that rodent is associatively related to mouse. Using this procedure, one can generate an associative mapping of a thinker’s mind. Such a mapping would constitute a mapping of the associative structures one harbors. However, to be a true activation map—a true mapping of what concepts facilitate what—the mapping would also need to include information about the violations of symmetry between concepts.

4.3 Relation Between Associative Learning and Associative Structures

The British Empiricists desired to have a thoroughgoing pure associationist theory, for it allowed them to lessen the load of innate machinery they needed to posit. Likewise, the behaviorists also tended to want a pure associationist theory (sometimes out of a similar empiricist tendency, other times because they were radical behaviorists like Skinner, who banned all discussion of mental representations). Pure associationists tend to be partial to a connection that Fodor (2003) refers to as “Bare-Boned Association”. The idea is that the current strength of an association connection between X and Y is determined, ceteris paribus, by the frequency of the past associations of X and Y. As stated, Bare-Boned Association assumes that associative structures encode, at least implicitly, the frequency of past associations of X and Y, and the strength of that associative bond is determined by the organism’s previous history of experiencing Xs and Ys.[19] In other words, the learning history of past associations determines the current functional profile of the corresponding associative structures.[20]

Although the picture sketched above, where associative learning eventuates in associative structure, is appealing for many, it is not forced upon one, as there is no a priori reason to bar any type of structure to arise from a particular type of learning. One may, for example, gain propositional structures from associative learning (see Mitchell et al. 2009 and Mandelbaum 2016 for arguments that this is more than a mere logical possibility). This can happen in two ways. In the first, one may gain an associative structure that has a proposition as one of its associates. Assume that every time one’s father came home he immediately made dinner. In such a case one might associate the proposition daddy is home with the concept dinner (that is one might acquire: daddy is home/dinner). However, one might also just have a propositional structure result from associative learning. If every time one’s father came home he made dinner, then one might just end up learning if daddy is home then dinner will come soon, which is a propositional structure.

4.4 Extinction and Counterconditioning

There is a different, tighter relationship between associative learning and associative structures concerning how to modulate an association. Associative theorists, especially from Pavlov onward, have been clear on the functional characteristics necessary to modulate an already created association. There have been two generally agreed upon routes: extinction and counterconditioning. Suppose that, through associative learning, you have learned to associate a CS with a US. How do we break that association? Associationists have posited that one breaks an associative structure via two different types of associative learning (/unlearning). Extinction is the name for one such process. During extinction one decouples the external presentation of the CS and the US by presenting the CS without the US (and sometimes the US without the CS). Over time, the organism will learn to disconnect the CS and US.

Counterconditioning names a similar process to extinction, though one which proceeds via a slightly different method. Counterconditioning can only occur when an organism has an association between a mental representation and a valence, as acquired in an evaluative conditioning paradigm. Suppose that one associates ducks with a positive valence. To break this association via counterconditioning one introduces ducks not with a lack of positive valence (as would happen in extinction) but with the opposite valence, a negative valence. Over multiple exposures, the initial representation/valence association weakens, and is perhaps completely broken.[21]

How successful extinction and counterconditioning are, and how they work, is the source of some controversy, and some reason to see both methods as highly ineffectual (Bouton 2004). Although the traditional view is that extinction breaks associative bonds, it is an open empirical question whether extinction proceeds by breaking the previously created associative bonds, or whether it proceeds by leaving that bond alone but creating new, more salient (and perhaps context-specific) associations between the CS and other mental states (Bouton 2002, Bendana and Mandelbaum forthcoming). Additionally, reinstatement, the spontaneous reappearance of an associative bond after seemingly successful extinction, has been observed in many contexts (see, e.g., Dirikx et al. 2004 for reinstatement of fear in humans).[22]

One fixed point in this debate is that one reverses associative structures via these two types of associative learning/unlearning, and only via these two pathways. What one does not do is try to break an associative structure by using practical or theoretical reasoning. If you associate salt with pepper, then telling you that salt has nothing to do with pepper or giving you very good reasons not to associate the two (say, someone will give you $50,000 for not associating them) won’t affect the association. This much has at least been clear since Locke. In the Essay concerning Human Understanding, in his chapter “On the Association of Ideas” (chapter XXIII) he writes,

When this combination is settled, and while it lasts, it is not in the power of reason to help us, and relieve us from the effects of it. Ideas in our minds, when they are there, will operate according to their natures and circumstances. And here we see the cause why time cures certain affections, which reason, though in the right, and allowed to be so, has not power over, nor is able against them to prevail with those who are apt to hearken to it in other cases. (2.23.13)

Likewise, say one has just eaten lutefisk and then vomited. The smell and taste of lutefisk will then be associated with feeling nauseated, and no amount of telling one that they shouldn’t be nauseated will be very effective. Say the lutefisk that made one vomit was covered in poison, so that we know that the lutefisk wasn’t the root cause of the sickness.[23] Having this knowledge won’t dislodge the association. In essence, associative structures are functionally defined as being fungible based on counterconditioning, extinction, and nothing else. Thus, assuming one sees counterconditioning and extinction as types of associative learning, we can say that associative learning does not necessarily eventuate in associative structures, but associative structures can only be modified by associative learning.

5. Associative Transitions

So far we’ve discussed learning and mental structures, but have yet to discuss thinking. The pure associationist will want a theory that covers not just acquisition and cognitive structure, but also the transition between thoughts. Associative transitions are a particular type of thinking, akin to what William James called “The Stream of Thought” (James 1890). Associative transitions are movements between thoughts that are not predicated on a prior logical relationship between the elements of the thoughts that one connects. In this sense, associative transitions are contrasted with computational transitions as analyzed by the Computational Theory of Mind (Fodor 2001; Quilty-Dunn and Mandelbaum 2018,2019; see the entry on Computational Theory of Mind). CTM understands inferences as truth preserving movements in thought that are underwritten by the formal/syntactic properties of thoughts. For example inferring the conclusion in modus ponens from the premises is possible just based on the form of the major and minor premise, and not on the content of the premises. Associative transitions are transitions in thought that are not based on the logico-syntactic properties of thoughts. Rather, they are transitions in thought that occur based on the associative relations among the separate thoughts.

Imagine an impure associationist model of the mind, one that contains both propositional and associative structures. A computational inference might be one such as inferring you are a g from the thoughts if you are an f, then you are a g, and you are an f. However, an associative transition is just a stream of ideas that needn’t have any formal, or even rational, relation between them, such as the transition from this coffee shop is cold to russia should annex idaho, without there being any intervening thoughts. This transition could be subserved merely by one’s association of idaho and cold, or it could happen because the two thoughts have tended to co-occur in the past, and their close temporal proximity caused an association between the two thoughts to arise (or for many other reasons). Regardless of the etiology, the transition doesn’t occur on the basis of the formal properties of the thoughts.[24]

According to this taxonomy, talk of an “associative inference” (e.g., Anderson et al. 1994; Armstrong et al. 2012) is a borderline oxymoron. The easiest way to give sense to the idea of an associative inference is for it to involve transitions in thought that began because they were purely inferential (as understood by the computational theory of mind) but then became associated over time. For example, at first one might make the modus ponens inference because a particular series of thoughts instantiates the modus ponens form. Over time the premises and conclusion of that particular token of a modus ponens argument become associated with each other through their continued use in that inference and now the thinker merely associates the premises with the conclusion. That is, the constant contiguity between the premises and the conclusion occurred because the inference was made so frequently, but the inference was originally made so frequently not because of the associative relations between the premises and conclusion, but because the form of the thoughts (and the particular motivations of the thinker). This constant contiguity then formed the basis for an associative linkage between the premises and the conclusion. [25]

As was the case for associative structures, associative transitions in thought are not just a logical possibility. There are particular empirical differences associated with associative transitions versus inferential transitions. Associative transitions tend to move across different content domains, whereas inferential transitions tend to stay on a more focused set of contents. These differences have been seen to result in measurable differences in mood: associative thinking across topics bolsters mood when compared to logical thinking on a single topic (Mason and Bar 2012).

6. Associative Instantiation

The associationist position so far has been neutral on how associations are to be implemented. Implementation can be seen at a representational (that is psychological) level of explanation, or at the neural level. A pure associationist picture would posit an associative implementation base at one, or both, of these levels.[26]

The most well-known associative instantiation base is a class of networks called Connectionist networks (see the entry on connectionism). Connectionist networks are sometimes pitched at the psychological level (see, e.g., Elman 1991; Elman et al. 1996; Smolensky 1988). This amounts to the claim that models of algorithms embedded in the networks capture the essence of certain mental processes, such as associative learning. Other times connectionist networks are said to be models of neural activity (“neural networks”). Connectionist networks consist in sets of nodes, generally input nodes, hidden nodes, and output nodes. Input nodes are taken to be analogs of sensory neurons (or sub-symbolic sensory representations), output nodes the analog of motor neurons (or sub-symbolic behavioral representations), and hidden nodes are stand-ins for all other neurons.[27] The network consists in these nodes being connected to each other with varying strengths. The topology of the connections gives one an associative mapping of the system, with the associative weights understood as the differing strengths of connections. On the psychological reading, these associations are functionally defined; on the neurological reading, they are generally understood to be representing synaptic conductance (and are the analogs of dendrites).[28] Prima facie, these networks are purely associative and do not contain propositional elements, and the nodes themselves are not to be equated with single representational states (such as concepts; see, e.g., Gallistel and King 2009).

However, a connectionist network can implement a classical Turing machine architecture (see, e.g., Fodor and McLaughlin 1990; Chalmers 1993). Many, if not most, of the adherents of classical computation, for example proponents of CTM, think that the brain is an associative network, one which implements a classical computational program. Some adherents of CTM do deny that the brain runs an associative network (see, e.g., Gallistel and King 2009, who appear to deny that there is any scientific level of explanation that association is intimately involved in), but they do so on separate empirical grounds and not because of any logical inconsistency with an associative brain implementing a classical mind.

When discussing an associative implementation base it is important to distinguish questions of associationist structure from questions of representational reality. Connectionists have often been followers of the Skinnerian anti-representationalist tradition (Skinner 1938). Because of the distributed nature of the nodes in connectionist networks, the networks have tended to be analyzed as associative stimulus/response chains of subsymbolic elements. However, the question of whether connectionist networks have representations which are distributed in patterns of activity throughout different nodes of the network, or whether connectionist networks are best understood as containing no representational structures at all, is orthogonal to both the question of whether the networks are purely associative or computational, and whether the networks can implement classical architectures.

7. Relation between the Varieties of Association and Related Positions

These four types of associationism share a certain empiricist spiritual similarity, but are logically, and empirically, separable. The pure associationist who wants to posit the smallest number of domain-general mental processes will theorize that the mind consists of associative structures acquired by associative learning which enter into associative transitions and are implemented in an associative instantiation base. However, many hybrid views are available and frequently different associationist positions become mixed and matched, especially once issues of empiricism, domain-specificity, and gradual learning arise. Below is a partial taxonomy of where some well-known theorists lie in terms of associationism and these other, often related doctrines.

Prinz (2002) and Karmiloff-Smith (1995) are examples of empiricist non-associationists. It is rare to find an associationist who is a nativist, but plenty of nativists have aspects of associationism in their own work. For example, even the arch-nativist Jerry Fodor maintains that intramodular lexicons contain associative structures (Fodor 1983). Similarly, there are many non-behaviorist (at least non-radical, analytic, or methodological behaviorist) associationists, such as Elman (1991), Smolensky (1988), Baeyens (De Houwer and Baeyens 2001), and modern day dual process theorists such as Evans and Stanovich (2013). It is quite difficult to find a non-associationist behaviorist, though Tolman approximates one (Tolman 1948). Elman and Smolensky also qualify as representationalist associationists, and Van Gelder (1995) as an anti-representationalist non-associationist. Karmiloff-Smith (1995) can be interpreted as, for some areas of learning, a proponent of gradual learning without being associationist (some might also read contemporary Bayesian theorists, e.g., Tenenbaum et al. 2011 and Chater et al. 2006 as holding a similar position for some areas of learning). Rescorla (1988) and Heyes (2012) claim to be associationists who are pro step-wise, one shot learning (though Rescorla sees his project as a continuation of the classical conditioning program, others see his data as grist for the anti-associationist, pro-computationalist mill, see Gallistel and King 2009; Quilty-Dunn and Mandelbaum 2019). Lastly, Tenenbaum and his contemporary Bayesians colleagues sometimes qualify as holding a domain-general learning position without it being associationist.[29]

8. Associationism in Social Psychology

Since the cognitive revolution, associationism’s influence has mostly died out in cognitive psychology and psycholinguistics. This is not to say that all aspects of associative theorizing are dead in these areas; rather, they have just taken on much smaller, more peripheral roles (for example, it has often been suggested that mental lexicons are structured, in part, associatively, which is why lexical decision tasks are taken to be facilitation maps of one’s lexicon). In other areas of cognitive psychology (for example, the study of causal cognition), associationism is no longer the dominant theoretical paradigm, but vestiges of associationism still persist (see Shanks 2010 for an overview of associationism in causal cognition). Associationism is also still alive in the connectionist literature, as well as in the animal cognition tradition.

But the biggest contemporary stronghold of associationist theorizing resides in social psychology, an area which has traditionally been hostile to associationism (see, e.g., Asch 1962, 1969). The ascendance of associationism in social psychology has been a fairly modern development, and has caused a revival of associationist theories in philosophy (e.g., Madva and Brownstein 2019). The two areas of social psychology that have seen the greatest renaissance of associationism are the implicit attitude and dual-process theory literature. However, in the late 2010s social psychology has begun to take a critical look at associationist theories (e.g., Mann et al. 2019).

8.1 Implicit Attitudes

Implicit attitudes are generally operationally defined as the attitudes tested on implicit tests such as the Implicit Association Test (Greenwald et al. 1998), the Affect Misattribution Procedure (Payne et al. 2005), the Sorted Paired Feature Task (Bar-Annan et al. 2009) and the Go/No-Go Association Task (Nosek and Banaji 2001). Implicit attitudes are contrasted with explicit attitudes, attitudes operationalized as the one’s being probed when one gives an explicit response like a marking on a Likert scale, feeling thermometer, or in free report. Such operationalizations leave open the question of whether there are any natural kinds to which explicit and implicit attitudes refer. In general implicit attitudes are characterized as being mental representations that are unavailable for explicit report and inaccessible to consciousness (cf. Hahn et al. 2014; Berger 2020).

The default position among social psychologists is to treat implicit attitudes as if they are associations among mental representations (Fazio 2007), or among pairs of mental representations and valences. In particular, they treat implicit attitudes as associative structures which enter into associative transitions. Recently this issue has come under much debate. In an ever expanding series of studies De Houwer and his collaborators have taken to show that associative learning is, at base, relational, propositional contingency learning; i.e., that all putatively associative learning is in fact a nonautomatic learning process that generates and evaluates propositional hypotheses (Mitchell et al. 2009; De Houwer 2009, 2011, 2014 2019; Hughes et al. 2019). Other researchers have approached the question also using learning as the entrance point to the debate, demonstrating effects that non-associative acquisition creates stronger attitudes than associative acquisition (Hughes et al. 2019). For example, one might demonstrate that learning through merely reading an evaluative statement creates a stronger implicit attitude than repeated associative exposures (Kurdi and Banaji 2017, 2019; Mann et al. 2019). Other researchers have championed propositional models not based on learning, but instead based on how implicit attitudes change regardless of how they are acquired. For instance, Mandelbaum (2016) argued that logical/evidential interventions modulate implicit attitudes in predictable ways (e.g., using double negation to cancel each other out), while others have used diagnosticity to show that implicit attitudes update in a non-associationistic, propositional way (e.g., after reading a story about a man who broke into a building and appeared to ransack it you learn that we jumped into save people from a fire and immediately change your opinion of the man from negative to positive; Mann and Ferguson 2015; Mann et al. 2017; Van Dessel et al. 2019). (For more on implicit attitudes see the entry on implicit bias).

8.2 Dual Process Theories

Associative structures and transitions are widely implicated in a particular type of influential dual-process theory. Though there are many dual-process theories in social psychology (see, e.g., the papers in Chaiken and Trope 1999, or the discussion in Evans and Stanovich 2013), the one most germane to associationism is also the most popular. It originates from work in the psychology of reasoning and is often also invoked in the heuristics and biases tradition (see, e.g., Kahneman 2011). It has been developed by many different psychological theorists (Sloman 1996; Smith and Decoster 2000; Wilson et al. 2000; Evans and Stanovich 2013) and, in parts, taken up by philosophers too (see, e.g., Gendler 2008; Frankish 2009; see also some of the essays in Evans and Frankish 2009).

The dual-process strain most relevant to the current discussion posits two systems, one evolutionarily ancient intuitive system underlying unconscious, automatic, fast, parallel and associative processing, the other an evolutionarily recent reflective system characterized by conscious, controlled, slow, “rule-governed” serial processes (see, e.g., Evans and Stanovich 2013). The ancient system, sometimes called “System 1”, is often understood to include a collection of autonomous, distinct subsystems, each of which is recruited to deal with distinct types of problems (see Stanovich 2011 for a discussion of “TASS—the autonomous set of systems”). Although theories differ on how System 1 interacts with System 2,[30] the theoretical core of System 1 is arguing that its processing is essentially associative. As in the implicit attitude debate, dual systems models have recently come under fire (see Kruglanski 2013; Osman 2013; Mandelbaum 2016; De Houwer 2019), though they remain very popular.

9. Criticisms of Associationism

Associationism has been a dominant theme in mental theorizing for centuries. As such, it has garnered an appreciable amount of criticism.

9.1 Learning Curves

The basic associative learning theories imply, either explicitly or implicitly, slow, gradual learning of associations (Baeyens et al. 1995). The learning process can be summarized in a learning curve which plots the frequency (or magnitude) of the conditioned response as a function of the number of reinforcements (Gallistel et al. 2004: 13124). Mappings between CRs and USs are gradually built up over numerous trials (in the lab) or experiences (in the world). Gradual, slow learning has come under fire from a variety of areas (see sections 9.3 and 9.4.1). However, here we just focus on the behavioral data. In a series of works re-analyzing animal behavior, Gallistel (Gallistel et al. 2004; Gallistel and King 2009) has argued that although group-level learning curves do display the properties of being negatively accelerated and gradually developing, these curves are misleading because no individual’s learning curve has these properties. Gallistel has argued that learning for individuals is generally step-like, rapid, and abrupt. An individual’s learning from a low-level of responding to asymptotic responding is very quick. Sometimes, the learning is so quick that it is literally one-shot learning. For example, after analyzing multiple experiments of animal learning of spatial location Gallistel writes

The learning of a spatial location generally requires but a single experience. Several trials may, however, be required to convince the subject that the location is predictable from trial to trial. (Gallistel et al. 2004: 13130)

Gallistel argues that the reason the group learning curves look to be smooth and gradual is that there are large individual differences between subjects in terms of when the onset latency of the step-wise curves begin (Gallistel et al. 2004: 13125); in other words, different animals take different amounts of time for the learning to commence. The differences between individual subject’s learning curves are predicated on when the steps begin and not by the speed of the individual animal’s learning process. All individuals appear to show rapid rises in learning, but since each begins their learning at a different time, when we average over the group, the rapid step-wise learning appears to look like slow, gradual learning (Gallistel et al. 2004: 13124).

9.2 The Problem of Predication

The problem of predication is, at its core, a problem of how an associative mechanism can result in the acquisition of subject/predicate structures, structures which many theorists believe appear in language, thought, and judgment. The first major discussion of the problem appears in Kant (1781/1787), but variants of the basic Kantian criticism can be seen across the contemporary literature (see, e.g., Chomsky 1959; Fodor and Pylyshyn 1988; Fodor 2003; Mandelbaum 2013a; for the details of the Kantian argument see the entry on Kant’s Transcendental Argument).

For a pure associationist, association is “semantically transparent” (see Fodor 2003), in that it purports to add no additional structure to thoughts. When a simple concept, X and a simple concept Y, become associated one acquires the associative structure X/Y. But X/Y has no additional structure on top of their contents. Knowing that X and Y are associated amounts to knowing a causal fact: that activating Xs will bring about the activation of Ys and vice versa. However, so the argument goes, some of our thoughts appear to have more structure than this: the thought birds fly predicates the property of flying onto birds. The task for the associationist is to explain how associative structures can distinguish a thinker who has a single (complex) thought birds fly from a thinker who conjoins two simple thoughts in an associative structure where one thought, birds, is immediately followed by another, fly. As long as the two simple thoughts are reliably causally correlated so that, for a thinker, activations of birds regularly brings about fly, then that thinker has the associative structure birds/fly. Yet it appears that thinker hasn’t yet had the thought birds fly. The problem of predication is explaining how a purely associative mechanism could eventuate in complex thoughts. In Fodor’s terms the problem boils down to how association, a causal relation among mental representations, can affect predication, a relation among intentional contents (Fodor 2003).

A family of related objections to associationism can be interpreted as variations on this theme. For example, problems of productivity, compositionality, and systematicity for associationist theorizing appear to be variants of the problem of predication (for more on these specific issues see the entries on the Language of Thought Hypothesis and on compositionality). If association doesn’t add any additional structure to the mental representations that get associated, then it is hard to see how it can explain the compositionality of thought, which relies on structures that specify relations among intentional contents. Compositionality requires that the meaning of a complex thought is determined by the meanings of its simple constituents along with their syntactic arrangements. The challenge to associationism is to explain how an associative mechanism can give rise to the syntactic structures necessary to distinguish a complex thought like birds fly from the temporal succession of two simple thoughts birds and fly. Since the compositionality of thought is posited to undergird the productivity of thought (thinkers’ abilities to think novel sentences of arbitrary lengths, e.g., green birds fly, giant green birds fly, cuddly giant green birds fly, etc.), associationism has problems explaining productivity.

Systematicity is the thesis that there are predictable patterns among which thoughts a thinker is capable of entertaining. Thinkers that can entertain thoughts of certain structures can always entertain distinct thoughts that have related structure. For instance, any thinker who can think a complex thought of the form “X transitive verb Y” can think “Y transitive verb X”.[31] Systematicity entails that we won’t find any thinker that can only think one of those two thoughts, in which case we could not find a person who could think audrey wronged max, but not max wronged audrey. Of course, these two thoughts have very different effects in one’s cognitive economy. The challenge for the associationist is to explain how the associative structure audrey/wronged/max can be distinguished from the structure max/wronged/audrey, while capturing the differences in those thoughts’ effects.

Associationists have had different responses to the problem. Some have denied that human thought is actually compositional, productive, and systematic, and other non-associationists have agreed with this critique. For example, Prinz and Clark claim “concepts do not compose most of the time” (2002: 62), and Johnson (2004) argues that the systematicity criterion is wrongheaded (see Aydede 1997 for extended discussion of these issues). Rumelhart et al. offer a connectionist interpretation of “schemata”, one which is intended to cover some of the phenomenon mentioned in this section (Rumelhart et al. 1986). Others have worked to show that classical conditioning can indeed give rise to complex associative structures (Rescorla 1988). In defense of the associationist construal of complex associations Rescorla writes,

Clearly, the animals had not simply coded the RH [complex] compound in terms of parallel associations with its elements. Rather they had engaged in some more hierarchical structuring of the situation, forming a representation of the compound and using it as an associate. (Rescorla 1988: 156)

Whether or not associationism has the theoretical tools to explain such complex compounds by itself is still debated (see, e.g., Fodor 2003; Mitchell 2009; Gallistel and King 2009; Quilty-Dunn and Mandelbaum 2019).

9.3 Word Learning

Multiple issues in the acquisition of the lexicon appear to cause problems for associationism. Some of the most well known examples are reviewed below (for further discussion of word learning and associationism see Bloom 2000).

9.3.1 Fast Mapping

Children learn words at an incredible rate, acquiring around 6,000 words by age 6 (Carey 2010: 184). If gradual learning is the rule, then words too should be learned gradually across this time. However, this does not appear to be the case. Susan Carey discovered the phenomenon of “fast mapping”, which is one-shot learning of a word (Carey 1978a, 1978b; Carey and Bartlett 1978). Her most influential example investigated children’s acquisition of “chromium” (a color word referring to olive green). Children were shown one of two otherwise identical objects, which only differed in color and asked, “Can you get me the chromium tray, not the red one, the chromium one” (recited in Carey 2010: 2). All of the children handed over the correct tray at that time. When the children were later tested in differing contexts, more than half remembered the referent of “chromium”. These findings have been extended—for example, Markson and Bloom (1997) showed that they are not specific to the remembering of novel words, but also hold for novel facts.

Fast mapping poses two problems for associationism. The first is that the learning of a new word did not develop slowly, as would be predicted by proponents of gradual learning. The second is that in order for the word learning to proceed, the mind must have been aided by additional principles not given by the environment. Some of these principles such as Markman’s (1989) taxonomic, whole object, and mutual exclusivity constraints, and Gleitman’s syntactic bootstrapping (Gleitman et al. 2005), imply that the mind does add structure to what is learned. Consequently, the associationist claim that learning is just mapping external contingencies without adding structure is imperiled.

9.3.2 Syntactic Category Learning

“Motherese”, the name of the type of language that infants generally hear, consists of simple sentences such as “Nora want a bottle?” and “Are you tired?”. These sentences almost always contain a noun and a verb. Yet, the infant’s vocabulary massively over-represents nouns in the first 100 words or so, while massively under-representing the verbs (never mind adjectives or adverbs, which almost never appear in the first 100 words infants produce; see, e.g., Goldin-Meadow, Seligman, and Gelman 1976). Even more surprising is that the over-representation of nouns to verbs holds even though

the incidence of each word (that is, the token frequency) is higher for the verbs than for the nouns in the common set used by mothers. (Snedeker and Gleitman 2004: 259, citing data from Sandhoffer, Smith, and Luo 2000)

Moreover, children hear a preponderance of determiners (“the” and “a”) but don’t produce them (Bloom 2000). These facts are not specific to English, but hold cross-culturally (see, e.g., Caselli et al. 1995). The disparity between the variation of the syntactic categories infants receive as input and produce as output is troublesome to associationism, insofar as associationism is committed to the learned structures (and the behaviors that follow from them) merely patterning what is given in experience.

9.4 Against the Contiguity Analysis of Associationism

Contiguity has been a central part of associationist analyses since the British Empiricists. In the experimental literature, the problem of figuring out the parameters needed for acquiring an association due to the contiguity of its relata has sometimes been termed the problem of the “Window of Association” (e.g., Gallistel and King 2009). Every associationist theory has to specify what temporal window two properties must instantiate in order for those properties to be associated.[32] A related problem for contiguity theorists is that if the domain generality of associative learning is desired, then the window needs to be homogenous across content domains. The late 1960s saw persuasive attacks on domain generality, as well as the necessity and sufficiency of the contiguity criterion in general.

9.4.1 Against the Necessity of Contiguity

Research on “taste aversions” and “bait-shyness” provided a variety of problems with contiguity in the associative learning tradition of classical conditioning. Garcia observed that a gustatory stimulus (e.g., drinking water or eating a hot dog) but not an audiovisual stimulus (a light and a sound) would naturally become associated with feeling nauseated. For instance, Garcia and Koelling (1966) paired an audiovisual stimulus (a light and a sound) with a gustatory stimulus (flavored water). The two stimuli were then paired with the rats receiving radiation, which made the rats feel nauseated. The rats associated the feeling of nausea with the water and not with the sound, even though the sound was contiguous with the water. Moreover, the delay between ingesting the gustatory stimulus and feeling nauseated could be quite long, with the feeling not coming on until 12 hours later (Roll and Smith 1972), and the organism needn’t even be conscious when the negative feeling arises. (For a review, see Seligman 1970; Garcia et al. 1974). The temporal delay shows that the CS (the flavored water) needn’t be contiguous with the US (the feeling of nausea) in order for learning to occur, thus showing that contiguity isn’t necessary for associative learning.

Garcia’s work also laid bare the problems with the domain general aspect of associationism. In the above study the rat was prepared to associate the nausea with the gustatory stimulus, but would not associate it with the audiovisual stimulus. However, if one changes the US from feeling nauseated to receiving shocks in perfect contiguity with the audiovisual and gustatory stimuli, then the rats will associate the shocks with the audiovisual stimulus but not with the gustatory stimulus. That is, rats are prepared to associate audiovisual stimuli with the shock but are contraprepared to associate the shocks with the gustatory stimulus. Thus, learning does not seem to be entirely domain general (for similar content specificity effects in humans, see Baeyens et al. 1990).[33]

Lastly, “The Garcia effect” has also been used to show problems in the learning curve (see section 9.1). “Taste aversions” are the phenomena whereby an organism gets sick from ingesting the stimulus and the taste (or odor, Garcia et al. 1974) of that stimulus gets associated with the feeling of sickness. As anyone who has had food poisoning can attest, this learning can proceed in a one-shot fashion, and needn’t have a gradual rise over many trials (taste aversions have also been observed in humans, see, e.g., Bernstein and Webster 1980; Bernsetin 1985; Logue et al. 1981; Rozin 1986).

9.4.2. Against the Sufficiency of Contiguity

Kamin’s famous blocking experiments (1969) showed that not all contiguous structures lead to classical conditioning. A rat that has already learned that CS1 predicts a US, will not learn that a subsequent CS2 predicts the US, if the CS2 is always paired with the CS1. Suppose that a rat has learned that a light predicts a shock because of the constant contiguity of the light and shock. After learning this, the rat has a sound introduced which only arises in conjunction with the light and the shock. As long as the rat had previously learned that the light predicts the shock, it will not learn that the sound does (as can be seen on later trials that have the sound alone). In sum, having learned that the CS1 predicts the US blocks the organism from learning that the CS2 predicts the US.[34] So even though CS2 is perfectly contiguous with the US, the association between CS2 and the US remains unlearned, thus serving as a counterexample to sufficiency of contiguity.[35]

Similarly Rescorla (1968) demonstrated that a CS can appear only when the US appears and yet still have the association between them be unlearnable. If a tone is arranged to bellow only when there are shocks, but there are still shocks when there are no tones (that is, the CS only appears with the US, but the US sometimes appears without the CS), no associative learning between the CS and the US will occur. Instead, subjects (in Rescorla 1968, rats) will only learn a connection between the shock and the experimental situation—e.g., the room in which the experiment is carried out.

In large part because of the problems discussed in 9.4, many classical conditioning theorists gave up the traditional program. Some, like Garcia, appeared to give up the classical theoretical framework altogether (Garcia et al. 1974), others, such as Rescorla and Wagner, tried to usher the framework into the modern era (see, Rescorla and Wagner 1972; Rescorla 1988), where conditioning is seen as sensitive to base rates and driven by informational pick-up.[36] Whether this movement is interpreted as a substantive revision of classical conditioning (Rescorla 1988; Heyes 2012) or a wholesale abandoning of it (Gallistel and King 2009) is debatable.

9.5 Coextensionality

The Rescorla experiment also demonstrates another problem in associative theorizing: the question of why some property is singled out as a CS as opposed to different, equally contemporaneously instantiated properties. Put a different way, one needs a principle to say what the “same situation” amounts to in generalizations such as Thorndike’s laws. For instance, if a CS and a US, say a tone and a shock, are perfectly paired so that they are either both present or both absent, the organism won’t associate the location it received shocks (e.g., the experimental setting) with getting shocked, it will just associate the tone with the shocks. But in the condition where the US occurs without the CS, but the CS does not occur without the US, the organism will gain an association between the shocks and the location. However, in both cases the location is present on every trial.[37] In contrast to shocks, x-ray radiation, when used as a US, never appears to become associated with location, even if they are always perfectly paired (Garcia et al. 1972).[38]

The problem of saying which properties become associated when multiple properties are coinstantiated sometimes goes by the name the “Credit Assignment Problem” (see, e.g., Gallistel and King 2009).[39] Some would argue that this problem is a symptom of a larger issue: trying to use extensional criteria to specify intentional content (see, e.g., Fodor 2003). Associationists need a criterion to specify which of the coextensive properties will in fact be learned, and which not.

An additional worry stems from the observation that sometimes the lack of a property being instantiated is an integral component of what is learned. To deal with the problem of missing properties, contemporary associationists have introduced an important element to the theory: inhibition. For example, if a US and a CS only appear when the other is absent, the organism will learn a negative relationship holds between them; that is, the organism will learn that the absence of the CS predicts the US.[40] Here the CS becomes a “conditioned inhibitor” of the US. Inhibition, using associations as modulators and not just activators, is a central part of current associationist thinking. For example, in connectionist networks, inhibition is implemented by the activation of certain nodes inhibiting the activation of other nodes. Connection weights can be positive or negative, with the negative weight standing in for the inhibitory strength of the association.


  • Anderson, J., K. Spoehr, and D. Bennett, 1994, “A Study in Numerical Perversity: Teaching Arithmetic to a Neural Network”, in Neural Networks for Knowledge Representation and Inference, D. Levine and M. Aparicio IV (eds.), East Sussex: Psychology Press, pp. 311–335.
  • Armstrong, K., S. Kose, L. Williams, A. Woolard, and S. Heckers, 2012, “Impaired Associative Inference in Patients with Schizophrenia”, Schizophrenia Bulletin, 38(3): 622–629.
  • Asch, S., 1962, “A Problem in the Theory of Associations”, Psychologische Beitrage, (6): 553–563.
  • –––, 1969, “A Reformulation of the Problem of Association”, American Psychologist, 24(2): 92–102.
  • Aydede, M., 1997, “Language of Thought: The Connectionist Contribution”, Minds and Machines, 7(1): 57–101.
  • Baeyens, F., P. Eelen, O. Van den Bergh, and G. Crombez, 1990, “Flavor-Flavor and Color-Flavor Conditioning in Humans”, Learning and Motivation, 21(4): 434–455.
  • Baeyens,F., P. Eelen, and G. Crombez, 1995, “Pavlovian Associations are Forever: On Classical Conditioning and Extinction”, Journal of Psychophysiology, 9(2): 127–141.
  • Bar-Anan Y., B. Nosek, and M. Vianello, 2009, “The Sorting Paired Features Task: A Measure of Association Strengths”, Experimental Psychology, 56(5): 329–343.
  • Bates, E. and B. MacWhinney, 1987, “Competition, Variation, and Language Learning”, in B. MacWhinney (ed.), Mechanisms of Language Acquisition, Hillsdale, N.J.: Lawrence Erlbaum Associates, pp. 157–193.
  • Bendana, J. and E. Mandelbaum, forthcoming, “The Fragmentation of Belief”, in D. Kindermann, C. Borgoni, and A. Onofri (eds.), The Fragmented Mind, Oxford: Oxford University Press.
  • Berger, J., 2020, “Implicit attitudes and awareness”, Synthese, 197(3): 1291–1312.
  • Bernstein, I. and M. Webster, 1980, “Learned Taste Aversions in Humans”, Physiology and Behavior, 25(3): 363–366.
  • Bernstein, I., 1985, “Learned Food Aversions in the Progression of Cancer and its Treatment”, in N. Braveman and P. Bronstein, (eds.), Experimental Assessments and Clinical Applications of Conditioned Food Aversions, New York: New York Academy of Sciences, pp. 365–80.
  • Black, W. and W. Prokasy (eds.), 1972, Classical Conditioning II: Current Research and Theory, New York: Appleton-Century-Crofts.
  • Bloom, P., 2000, How Children Learn the Meanings of Words, Cambridge, MA: MIT Press.
  • Bouton, M., 2002, “Context, Ambiguity, and Unlearning: Sources of Relapse after Behavioral Extinction”, Biological Psychiatry, 52(10): 976–986.
  • –––, 2004, “Context and Behavioral Processes in Extinction”, Learning and Memory, 11(5): 485–494.
  • Brett, L., W. Hankins, and J. Garcia, 1976, “Prey-Lithium Aversions. III: Buteo hawks”, Behavioral Biology, 17(1): 87–98.
  • Camp, L., 2007, “Thinking with Maps”, Philosophical Perspectives, 21(1): 145–182.
  • Carey, S., 1978a, “Less May Never Mean More”, in R. Campbell and P. Smith, (eds.), Recent Advances in the Psychology of Language, New York: Plenum Press, p. 109–132.
  • –––, 1978b, “The Child as Word Learner”, in J. Bresnan, G. Miller, and M. Halle, (eds.), Linguistic Theory and Psychological Reality, Cambridge, MA: MIT Press, pp. 264–293.
  • –––, 2010, “Beyond Fast Mapping”, Language Learning and Development, 6(3): 184–205.
  • Carey, S. and E. Bartlett, 1978, “Acquiring a Single New Word”, Proceedings of the Stanford Child Language Conference, 15: 17–29.
  • Caselli, M.C., E. Bates, P. Casadio, J. Fenson, L. Fenson, L. Sanderl, and J. Weir, 1995, “A Cross-linguistic Study of Early Lexical Development”, Cognitive Development, 10(2): 159–199.
  • Chaiken, S. and Y. Trope (eds.), 1999, Dual-Process Theories in Social Psychology, New York: Guilford Press.
  • Chalmers, D., 1993, “Connectionism and Compositionality: Why Fodor and Pylyshyn Were Wrong”, Philosophical Psychology, 6(3): 305–319.
  • Chater, N., 2009, “Rational Models of Conditioning”, Behavioral and Brain Sciences, 32(2): 204–205.
  • –––, J. Tenenbaum, and A. Yuille, 2006, “Probabilistic Models of Cognition: Conceptual Foundations”, Trends in Cognitive Sciences, 10(7): 287–291.
  • Chomsky, N., 1959, “A Review of B.F. Skinner’s Verbal Behavior”, Language, 35(1): 26–58.
  • Churchland, P., 1986, “Some Reductive Strategies in Cognitive Neurobiology”, Mind, 95(379): 279–309.
  • –––, 1989, A Neurocomputational Perspective: The Nature of Mind and the Structure of Science, Cambridge, MA: MIT.
  • Churchland, P. and T. Sejnowski, 1990, “Neural Representation and Neural Computation”, Philosophical Perspectives, 4: 343–382.
  • Collins, A. and E. Loftus, 1975, “A Spreading-Activation Theory of Semantic Processing”, Psychological Review, 82(6): 407–428.
  • Danks D., 2013, “Moving from Levels and Reduction to Dimensions and Constraints”, Proceedings of the 35th Annual Conference of the Cognitive Science Society, 35: 2124–2129.
  • De Houwer, J., 2009, “The Propositional Approach to Associative Learning as an Alternative for Association Formation Models”, Learning & Behavior, 37(1): 1–20.
  • –––, 2011, “Evaluative Conditioning: A Review of Procedure Knowledge and Mental Process Theories”, in T. Schachtman and S. Reilly (eds.), Associative Learning and Conditioning Theory: Human and Non-Human Applications, New York: Oxford University Press, pp. 399–416.
  • –––, 2014, “A Propositional of Implicit Evaluation”, Social and Personality Psychology Compass, 8(7): 342–353.
  • –––, 2018, “Propositional Models of Evaluative Conditioning”, Social Psychological Bulletin, 13(2): 1–21.
  • –––, 2019, “Moving Beyond System 1 and System 2: Conditioning, Implicit Evaluation, and Habitual Responding Might Be Mediated by Relational Knowledge”, Experiental Psychology, 66(4): 257–265.
  • De Houwer, J., S. Thomas, and F. Baeyens, 2001, “Association Learning of Likes and Dislikes: A Review of 25 years of Research on Human Evaluative Conditioning”, Psychological Bulletin, 127(6): 853–869.
  • Dehaene, S., 2011, The Number Sense: How the Mind Creates Mathematics, Oxford: Oxford University Press.
  • Diaz, E., G. Ruis, and F. Baeyens, 2005, “Resistance to Extinction of Human Evaluative Conditioning Using a Between-Subjects Design”, Cognition and Emotion, 19(2): 245–268.
  • Dickinson, A., D. Shanks, and J. Evenden, 1984, “Judgment of Act-Outcome Contingency: The role of Selective Attribution”, The Quarterly Journal of Experimental Psychology, 36(1): 29–50.
  • Dirikx, T., D. Hermans, D. Vansteenwegen, F. Baeyens, and P. Eelen, 2004, “Reinstatement of Extinguished Conditioned Responses and Negative Stimulus Valence as a Pathway to Return of Fear in Humans”, Learning and Memory, 11: 549–54.
  • Elman, J., 1991, “Distributed Representations, Simple Recurrent Networks, and Grammatical Structure”, Machine learning, 7(2–3): 195–225.
  • Elman, J., E. Bates, M. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plunkett, 1996, Rethinking Innateness: A Connectionist Perspective on Development, Cambridge, MA: MIT Press.
  • Evans, G., 1982, The Varieties of Reference, J. McDowell (ed.), Oxford: Clarendon Press.
  • Evans, J., and K. Frankish (eds.), 2009, In Two Minds: Dual Processes and Beyond, Oxford: Oxford University Press.
  • –––, and K. Stanovich, 2013, “Dual-Process Theories of Higher Cognition: Advancing the Debate,” Perspectives on Psychological Science, 8(3): 223–241.
  • Fazio, R., 2007, “Attitudes as Object-Evaluation Associations of Varying Strength”, Social Cognition, 25(5): 603–637.
  • Festinger, L. and J. Carlsmith, 1959, “Cognitive Consequences of Forced Compliance”, The Journal of Abnormal and Social Psychology, 58(2): 203–210.
  • Field, A. and G. Davey, 1999, “Reevaluating Evaluative Conditioning: A Nonassociative Explanation of Conditioning Effects in the Visual Evaluative Conditioning Paradigm”, Journal of Experimental Psychology: Animal Behavior Processes, 25(2): 211–224.
  • Fodor, J., 1983, The Modularity of Mind, Cambridge, MA: MIT Press.
  • –––, 2001, The Mind Doesn’t Work that Way, Cambridge, MA: MIT Press.
  • –––, 2003, Hume Variations, Oxford: Clarendon Press.
  • Fodor, J., and B. McLaughlin, 1990, “Connectionism and the Problem of Systematicity: Why Smolensky’s Solution Doesn’t Work”, Cognition, 35(2): 183–204.
  • Fodor, J., and Z. Pylyshyn, 1988, “Connectionism and Cognitive Architecture: A Critical Analysis”, Cognition, 28(1–2): 3–71.
  • Frankish, K., 2009, “Systems and Levels: Dual-System Theories and the Personal-Subpersonal Distinction”, in Evans and Frankish 2009: pp.89–107.
  • Gagliano, M., V. Vyazovsky, A. Borbely, M. Grimonprez, and M. Depczynski, 2016, “Learning by Association in Plants”, Scientific Reports, 6(38427): 1–8.
  • Gallistel, C., S. Fairhurst, and P. Balsam, 2004, “The Learning Curve: Implications of a Quantitative Analysis”, Proceedings of the National Academy of Sciences of the United States of America, 101(36): 13124–13131.
  • Gallistel, C., and A. King, 2009, Memory and the Computational Brain: Why Cognitive Science Will Transform Neuroscience, West Sussex: Wiley Blackwell.
  • Garcia, J., 1981, “Tilting at the Paper Mills of Academe”, American Psychologist, 36(2): 149–158.
  • Garcia, J., R. Kovner, and K. Green, 1970, “Cue Properties vs Palatability of Flavors in Avoidance Learning”, Psychonomic Science, 20(5): 313–314.
  • Garcia, J., B. McGowan, and K. Green, 1972, “Biological Constraints on Conditioning II”, in Black and Prokasy 1972: pp.3–27.
  • Garcia, J., W. Hankins, and K. Rusiniak, 1974, “Behavioral Regulation of the Milieu Interne in Man and Rat”, Science, 185(4154): 824–831.
  • Garcia, J., R.A. Koelling, 1966, “Relationship of cue to consequence in avoidance learning”, Psychonomic Science, 4: 123–124.
  • Gendler, T., 2008, “Alief and Belief”, Journal of Philosophy, 105(10): 634–63.
  • Gleitman, L., K. Cassidy, R. Nappa, A. Papafragou, and J. Trueswell, 2005, “Hard Words”, Language Learning and Development, 1(1): 23–64.
  • Glosser, G. and R. Freidman, 1991, “Lexical but not Semantic Priming in Alzheimer’s Disease”, Psychology and Aging, 6(4): 522–27.
  • Goldin-Meadow, S., M. Seligman, and S. Gelman, 1976, “Language in the Two-Year Old”, Cognition, 4(2): 189–202.
  • Greenwald, A., D. McGhee, and J. Schwartz, 1998, “Measuring Individual Differences in Implicit Cognition: The Implicit Association Test”, Journal of Personality and Social Psychology, 74(6): 1464–1480.
  • Hahn, A., C. Judd, H. Hirsch, and I. Blair, 2014, “Awareness of Implicit Attitudes”, Journal of Experimental Psychology: General, 143(3): 1369–1392.
  • Heyes, C., 2012, “Simple Minds: A Qualified Defence of Associative Learning”, Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1603): 2695–2703.
  • Hughes, S., Y. Ye, P. Van Dessel, and J. De Houwer, 2019, “When people co occur with good or bad events: Graded effects of relational qualifiers on evaluative conditioning.”, Personality and Social Psychology Bulletin, 45(2): 196–208.
  • Hull, C., 1943, Principles of Behavior, New York: Appleton-Century-Crofts.
  • Hume, D., 1738, A Treatise of Human Nature, L.A. Selby-Bigge (ed.), 2nd ed., revised by P.H. Nidditch, Oxford: Clarendon Press, 1975.
  • James, W., 1890, The Principles of Psychology (Vol. 1), New York: Holt.
  • Johnson, K., 2004, “On the Systematicity of Language and Thought”, Journal of Philosophy, 101(3): 111–139.
  • Kahneman, D., 2011, Thinking, Fast and Slow, New York: Farrar, Straus and Giroux.
  • Kamin, L., 1969, “Predictability, Surprise, Attention, and Conditioning”, in B. Campbell and R. Church (eds.), Punishment and Aversive Behavior, New York: Appleton-Century-Crofts, pp. 279–296.
  • Kant, I., 1781/1787, Critique of Pure Reason, in P. Guyer and A. Wood (eds.), Critique of Pure Reason, New York: Cambridge University Press.
  • Karmiloff-Smith, A., 1995, Beyond Modularity: A Developmental Perspective on Cognitive Science, Cambridge, MA: MIT Press/Bradford Books.
  • Kruglanski, A., 2013, “Only One? The Default Interventionist Perspective as a Unimodel—Commentary on Evans & Stanovich”, Perspectives on Psychological Science, 8(3): 242–247.
  • Kurdi, B., and M. Banaji, 2017, “Repeated evaluative pairings and evaluative statements: How effectively do they shift implicit attitudes?”, Journal of Experimental Psychology: General, 146(2): 194–213.
  • –––, 2019, “Attitude change via repeated evaluative pairings versus evaluative statements: Shared and unique features”, Journal of Personality and Social Psychology, 116(5): 681–703.
  • Locke, J., 1690, An Essay Concerning Human Understanding, in Peter H. Nidditch (ed.), An Essay Concerning Human Understanding, Oxford: Clarendon Press, 1975,
  • Logue, A., I. Ophir, and K. Strauss, 1981, “The Acquisition of Taste Aversion in Humans”, Behavioral Research and Therapy, 19(4): 319–33.
  • Luka, B., and L. Barsalou, 2005, “Structural facilitation: Mere exposure effects for grammatical acceptability as evidence for syntactic priming in comprehension”, Journal of Memory and Language, 52: 444–467.
  • Lycan, W, 1990, “The Continuity of the Levels of Nature”, in W. Lycan (ed.), Mind and Cognition: A Reader, Cambridge: Basil Blackwell, pp. 77–96.
  • Madva, A., and M. Brownstein, 2018, “Stereotypes, Prejudice, and the Taxonomy of the Implicit Social Mind”, Nous, 52(3): 611–644.
  • Mandelbaum, E., 2013a, “Against Alief”, Philosophical Studies, 165(1): 197–211.
  • –––, 2013b, “Numerical Architecture”, Topics in Cognitive Science, 5(2): 367–386.
  • –––, 2016, “Attitude, Inference, Association: On the Propositional Structure of Implicit Attitudes”, Nous, 50(3): 629–658.
  • –––, 2017, “Seeing and Conceptualizing: Modularity and the Shallow Contents of Vision”, Philosophy and Phenomenological Research, 97(2): 267–283.
  • –––, 2019, “Troubles with Bayesianism: An Introduction to the Psychological Immune System”, Mind & Language, 34(2): 141–157.
  • Mann, T., and M. Ferguson, 2015, “Can we undo our first impressions? The role of reinterpretation in reversing implicit evaluations”, Journal of Social and Personality Psychology, 108(6): 823–849.
  • –––, 2017, “Reversing implicit first impressions through reinterpretation after a two-day delay.”, Journal of Experimental Social Psychology, 68: 122–127.
  • Mann, T., B. Kurdi, and M. Banaji, 2019, “ How effectively can implicit evaluations be updated? Using evaluative statements after aversive repeated evaluative pairings”, Journal of Experimental Psychology: General, doi: 10.1037/xge0000701.
  • Markman, E., 1989, Categorization and Naming in Children: Problems of Induction, Cambridge, MA: MIT Press.
  • Markson, L. and P. Bloom, 1997, “Evidence Against a Dedicated System for Word Learning in Children”, Nature, 385(6619): 813–815.
  • Marr, D., 1982, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, NY: W.H. Freeman and Co.
  • Mason, M. and M. Bar, 2012, “The Effect of Mental Progression on Mood”, Journal of Experimental Psychology: General, 141(2): 217–221. doi:10.1037/a0025035
  • McClelland, J., M. Botvinick, D. Noelle, D. Plaut, T. Rogers, M. Seidenberg, and L. Smith, 2010, “Letting Structure Emerge: Connectionist and Dynamic Systems Approaches to Cognition”, Trends in Cognitive Sciences, 14(8): 348–356.
  • Minsky, M., 1963, “Steps toward Artificial Intelligence”, in E. Feigenbaum and J. Feldman (eds.), Computers And Thought, New York, NY: McGraw-Hill, pp. 406–450.
  • Mitchell, C., J. De Houwer, and P. Lovibond, 2009, “The Propositional Nature of Human Associative Learning”, Behavioral and Brain Sciences, 32(2): 183–246.
  • Nosek, B. and M. Banaji, 2001, “The Go/No-Go Association Task”, Social Cognition, 19(6): 625–66.
  • Osman, M., 2013, “A Case Study Dual-Process Theories of Higher Cognition—Commentary on Evans & Stanovich”, Perspectives on Psychological Science, 8(3): 248–252.
  • Pavlov, I., 1906, “The Scientific Investigation of the Psychical Faculties or Processes in the Higher Animals”, Science, 24(620): 613–619.
  • –––, 1927, Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex, Oxford: Oxford University Press.
  • Payne, B., Cheng, C., Govorun, O., and Stewart, B., 2005, “An Inkblot for Attitudes: Affect Misattribution as Implicit Measurement”, Journal of Personality and Social Psychology, 89(3): 277–293.
  • Perea, M. and E. Rosa, 2002, “The Effects of Associative and Semantic Priming in the Lexical Decision Task”, Psychological Research, 66(3): 180–194.
  • Prinz, J., 2002, Furnishing the Mind: Concepts and their Perceptual Basis, Cambridge, MA: MIT Press.
  • ––– and A. Clark, 2004, “Putting Concepts to Work: Some Thoughts for the 21st Century”, Mind & Language, 19(1): 57–69.
  • Quilty-Dunn, J. forthcoming, “Perceptual Pluralism”, Nous, 1–41.
  • Quilty-Dunn, J. and E. Mandelbaum, 2018, “Inferential Transitions”, Australasian Journal of Philosophy, 96(3): 532–547.
  • –––, 2019, “Non-Inferential Transitions: Imagery and Association”, in T. Chan and A. Nes (eds.),Inference and Consciousness, New York: Routledge, pp. 151–171.
  • Rescorla, R., 1968, “Probability of Shock in the Presence and Absence of CS in Fear Conditioning”, Journal of Comparative and Physiological Psychology, 66(1): 1–5.
  • –––, 1988, “Pavlovian Conditioning: It’s Not What You Think It Is”, American Psychologist, 43(3): 151–160.
  • Rescorla, E., and A. Wagner, 1972, “A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Nonreinforcement”, in Black and Prokasy 1972, pp. 64–99.
  • Roll, D. and J. Smith, 1972, “Conditioned Taste Aversion in Anesthetized Rats”, in M. Hager and J. Seligman (eds.), Biological Boundaries of Learning. New York: Appleton-Century-Crofts, pp. 98–102.
  • Rozin, P., 1986, “One-Trial Acquired Likes and Dislikes in Humans: Disgust as a US, Food Predominance, and Negative Learning Predominance”, Learning and Motivation, 17(2): 180–189.
  • Rumelhart, D., P. Smolensky, J. McClelland, and G. Hinton, 1986, “Sequential Thought Processes in PDP Models”, in J.McClelland and D. Rumelhart (eds.), Parallel Distributed Processing Vol. 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models, Cambridge, MA: MIT Press, pp. 7–57.
  • Rusiniak, K., W. Hankins, J. Garcia, and L. Brett, 1979, “Flavor-illness Aversions: Potentiation of Odor by Taste in Rats”, Behavioral and Neural Biology, 25(1): 1–17.
  • Rydell, R. and A. McConnell, 2006, “Understanding Implicit and Explicit Attitude Change: A Systems of Reasoning Analysis”, Journal of Personality and Social Psychology, 91(6): 995–1008.
  • Sandhoffer, C., L. Smith, and J. Luo, 2000, “Counting Nouns and Verbs in the Input: Differential Frequencies, Different Kinds of Learning?”, Journal of Child Language, 27(3): 561–585.
  • Seligman, M., 1970, “On the Generality of the Laws of Learning”, Psychological Review, 77(5): 406–418.
  • Shanks, D., 2010, “Learning: From Association to Cognition”, Annual Review of Psychology, 1, 273–301.
  • Skinner, B., 1938, The Behavior of Organisms: An Experimental Analysis, Oxford: Appleton-Century.
  • –––, 1953, Science and Human Behavior, New York: Simon and Schuster.
  • Sloman, S., 1996, “The Empirical Case for Two Systems of Reasoning”, Psychological Bulletin, 119(1): 3–22.
  • Smith, E. R. and J. DeCoster, 2000, “Dual-Process Models in Social and Cognitive Psychology: Conceptual Integration and Links to Underlying Memory Systems”, Personality and Social Psychology Review, 4(2): 108–131.
  • Smith, J. and D. Roll, 1967, “Trace Conditioning with X-rays as an Aversive Stimulus”, Psychonomic Science, 9(1): 11–12.
  • Smolensky, P., 1988, “On the Proper Treatment of Connectionism”, Behavioral and Bruin Sciences, 11(1): l–23.
  • Snedeker, J. and L. Gleitman, 2004, “Why it is Hard to Label Our Concepts”, in D. Hall and S. Waxman (eds.), Weaving a Lexicon, Cambridge, MA: MIT Press, pp. 257–294.
  • Stanovich, K., 2011, Rationality and the Reflective Mind, New York: Oxford University Press.
  • Tenenbaum, J., C. Kemp, T. Griffiths, and N. Goodman, 2011, “How to Grow a Mind: Statistics, Structure, and Abstraction”, Science, 331(6022): 1279–1285.
  • Thorndike, E., 1911, Animal intelligence: Experimental studies, New York: Macmillan.
  • Todrank, J., D. Byrnes, A. Wrzesniewski, and P. Rozin, 1995, “Odors can Change Preferences for People in Photographs: A Cross-Modal Evaluative Conditioning Study with Olfactory USs and Visual CSs”, Learning and Motivation, 26(2): 116–140.
  • Tolman, E., 1948, “Cognitive Maps in Rats and Men”, Psychological Review, 55(4): 189–208.
  • Van Dessel, P., Y. Ye, and J. De Houwer 2019, “Chaning deep-rooted implicit evaluation in the blink of an eye: engative verbal information shifts automatic liking of Gandhi”, Social Psychological and Personality Science, 10(2): 266–273.
  • Van Gelder, T., 1995, “What Might Cognition Be, If not Computation?”, The Journal of Philosophy, 91(7): 345–381.
  • Vansteenwegen, D., G. Francken, B. Vervliet, A. De Clercq, and P. Eelen, 2006, “Resistance to Extinction in Evaluative Conditioning”, Journal of Experimental Psychology: Animal Behavior Processes, 32(1): 71–79.
  • Wilson, T., S. Lindsey, and T. Schooler, 2000, “A Model of Dual Attitudes”, Psychological Review, 107(1): 101–26. [Wilson, Lindsey, and Schooler 2000 available online]


Helpful feedback was received from Michael Brownstein, Bryce Huebner, Zoe Jenkin, Jake Quilty-Dunn, Shaun Nichols, and Susanna Siegel who are hereby thanked for their efforts.

Copyright © 2020 by
Eric Mandelbaum <>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free