Associationist Theories of Thought

First published Tue Mar 17, 2015; substantive revision Sun Jul 13, 2025

Associationism is one of the oldest, and, in some form or another, most widely held theories of thought. Associationism has been the engine behind empiricism for centuries, from the British Empiricists through the Behaviorists and modern day Connectionists. Nevertheless, “associationism” does not refer to one particular theory of cognition per se, but rather a constellation of related though separable theses. What ties these theses together is a commitment to a certain arationality of thought: a creature’s mental states are associated because of some facts about its causal history, and having these mental states associated entails that bringing one of a pair of associates to mind will, ceteris paribus, ensure that the other also becomes activated.

1. What is Associationism?
2. Associationism as a Theory of Mental Processes: The Empiricist Connection
3. Associationism as a Theory of Learning
4. Associationism as a Theory of Mental Structure
5. Associative Transitions
6. Associative Instantiation
7. Relation between the Varieties of Association and Related Positions
8. Associationism in Social Psychology
- 8.1 Implicit Attitudes
- 8.2 Dual Process Theories
9. Criticisms of Associationism
10. Associationism and Reinforcement Learning
- 10.1 An overview of RL
- 10.2 How RL extends classical associationism
Bibliography
Academic Tools
Other Internet Resources
Related Entries

1. What is Associationism?

Associationism is a theory that connects learning to thought based on principles of the organism’s causal history. Since its early roots, associationists have sought to use the history of an organism’s experience as the main sculptor of cognitive architecture. In its most basic form, associationism has claimed that pairs of thoughts become associated based on the organism’s past experience. So, for example, a basic form of associationism (such as Hume’s) might claim that the frequency with which an organism has come into contact with Xs and Ys in one’s environment determines the frequency with which thoughts about Xs and thoughts about Ys will arise together in the organism’s future.

Associationism’s popularity is in part due to how many different functions it can subserve. In particular, associationism can be used as a theory of learning (e.g., as in behaviorist theorizing), a theory of thinking (as in Jamesian “streams of thought”), a theory of mental structures (e.g., as in concept pairs), and a theory of the implementation of thought (e.g., as in connectionism). All these theories are separable, but share a related, empiricist-friendly core. As used here, a “pure associationist” will refer to one who holds associationist theories of learning, thinking, mental structure, and implementation. The “pure associationist” is a somewhat idealized position, one that no particular theorist may have ever held, but many have approximated to differing degrees (e.g., Locke 1690/1975; Hume 1738/1975; Thorndike 1911; Skinner 1953; Hull 1943; Churchland 1986, 1989; Churchland and Sejnowski 1990; Smolensky 1988; Elman 1991; Elman et al. 1996; McClelland et al. 2010; Rydell and McConnell 2006; Fazio 2007; Demeter 2021; Buckner 2023).

Outside of these core uses of associationism the movement has also been closely aligned with a number of different doctrines over the years: empiricism, behaviorism, anti-representationalism (i.e., skepticism about the necessity of representational realism in psychological explanation), gradual learning, and domain-general learning. All of these theses are dissociable from core associationist thought (see section 7). While one can be an associationist without holding those theses, some of those theses imply associationism more than others. These extra theses’ historical and sociological ties to associationism are strong, and so will be intermittently discussed below.

2. Associationism as a Theory of Mental Processes: The Empiricist Connection

Empiricism is a general theoretical outlook, which offers a theory of learning to explain as much of our mental life as possible. From the British empiricists through Skinner and the behaviorists (see the entry on behaviorism) the main focus has been arguing for the acquisition of concepts (for the empiricists’ “Ideas”, for the behaviorists “responses”) through learning. However, the mental processes that underwrite such learning are almost never themselves posited to be learned.^[1] So winnowing down the amount of mental processes one has to posit limits the amount of innate machinery with which the theorist is saddled. Associationism, in its original form as in Hume (1738/1975), was put forward as a theory of mental processes. Associationists’ attempt to answer the question of how many mental processes there are by positing, ideally, only a single mental process: the ability to associate ideas.^[2]

Of course, thinkers execute many different types of cognitive acts, so if there is only one mental process, the ability to associate, that process must be flexible enough to accomplish a wide range of cognitive work. In particular, it must be able to account for learning and thinking. Accordingly, associationism has been utilized on both fronts. We will first discuss the theory of learning and then, after analyzing that theory and seeing what is putatively learned, we will return to the associationist theory of thinking.

3. Associationism as a Theory of Learning

In one of its senses, “associationism” refers to a theory of how organisms acquire concepts, associative structures, response biases, and even propositional knowledge. It is commonly acknowledged that associationism took hold after the publishing of John Locke’s Essay Concerning Human Understanding (1690/1975).^[3] However, Locke’s comments on associationism were terse (though fertile), and did not address learning to any great degree. The first serious attempt to detail associationism as a theory of learning was given by Hume in the Treatise of Human Nature (1738/1975).^[4] Hume’s associationism was, first and foremost, a theory connecting how perceptions (“Impressions”) determined trains of thought (successions of “Ideas”). Hume’s empiricism, as enshrined in the Copy Principle,^[5] demanded that there were no Ideas in the mind that were not first given in experience. For Hume, the principles of association constrained the functional role of Ideas once they were copied from Impressions: if Impressions IM1 and IM2 were associated in perception, then their corresponding Ideas, ID1 and ID2 would also become associated. In other words, the ordering of Ideas was determined by the ordering of the Impressions that caused the Ideas to arise.

Hume’s theory then needs to analyze what types of associative relations between Impressions mattered for determining the ordering of Ideas. Hume’s analysis consisted of three types of associative relations: cause and effect, contiguity, and resemblance. If two Impressions instantiated one of these associative relations, then their corresponding Ideas would mimic the same instantiation.^[6] For instance, if Impression IM1 was contemporaneous with Impression IM2, then (ceteris paribus) their corresponding Ideas, ID1 and ID2, would become associated.

As stated, Hume’s associationism was mostly a way of determining the functional profile of Ideas. But we have not yet said what it is for two Ideas to be associated (for that see section 4). Instead, one can see Hume’s contribution as introducing a very influential type of learning—associative learning—for Hume’s theory purports to explain how we learn to associate certain Ideas. We can abstract away from Hume’s framework of ideas and his account of the specific relations that underlie associative learning, and state the theory of associative learning more generally: if two representations of X and Y instantiate some associative relation, R, then those representations will become associated, so that future activations of X will tend to bring about activations of Y, and do so directly (i.e., without any intermediate computations). The associationist then has to explain what relation R amounts to. The Humean form of associative learning (where R is equated with cause and effect, contiguity, or resemblance) has been hugely influential, informing the accounts of those such as Jeremy Bentham, J.S. Mill, and Alexander Bain (see, e.g., the entries on John Stuart Mill and 19^th Century Scottish Philosophy).^[7]

Associative learning didn’t hit its stride until the work of Ivan Pavlov, which spurred the subsequent rise of the behaviorist movement in psychology. Pavlov introduced the concept of classical conditioning as a modernized version of associative learning. For Pavlov, classical conditioning was in part an experimental paradigm for teaching animals to learn new associations between stimuli. The general method of learning was to pair an unconditioned stimulus (US) with a novel stimulus. An unconditioned stimulus is just a stimulus that instinctively (i.e., without training) provokes a response in an organism. Since this response is not itself learned, the response is referred to as an “unconditioned response” (UR). In Pavlov’s canonical experiment, the US was a meat powder, as the smell of meat automatically brought about salivation (UR) in his canine subjects. The US is then paired with a neutral stimulus, such as a bell. Over time, the contiguity between the US and the neutral stimulus causes the neutral stimulus to provoke the same response as the US. Once the bell starts to provoke salivation, the bell has become a “conditioned stimulus” (CS) and the salivating, when prompted by the bell alone, a “conditioned response” (CR). The associative learning here is learning to form a new stimulus-response pair between the bell and the salivation.^[8]

Classical conditioning is a fairly circumscribed process. It is a “stimulus substitution” paradigm where one stimulus can be swapped for another to provoke a response.^[9] However, the responses that are provoked are supposed to remain unchanged; all that changes is the stimulus that gets associated with the response. Thus, classical conditioning seemed to some to be too restrictive to explain the panoply of novel behavior organisms appear to execute.^[10]

Edward Thorndike’s research with cats in puzzle boxes broadened the theory of associative learning by introducing the notion of consequences to associative learning. Thorndike expanded the notion of associative learning beyond instinctual behaviors and sensory substitution to genuinely novel behaviors. Thorndike’s experiments initially probed, e.g., how cats learned to lift a lever to escape the “puzzle boxes” (the forerunner of “Skinner boxes”) that they were trapped in. The cats’ behaviors, such as attempting to lift a lever, were not themselves instinctual behaviors like the URs of Pavlov’s experiments. Additionally, the cats’ behaviors were shaped by the consequences that they brought on. For Thorndike it was because lifting the lever caused the door to open that the cats learned the connection between the lever and the door. This new view of learning, operant conditioning (for the organism is “operating” on its environment), was not merely the passive learning of Pavlov, but a species-nonspecific, general, active theory of learning.

This research culminated in Thorndike’s famous “Law of Effect” (1911), the first canonical psychological law of associationist learning. It asserted that responses that are accompanied by the organism feeling satisfied will, ceteris paribus, make the response more likely to occur when the organism encounters the same situation, whereas responses that are accompanied with a feeling of discomfort to the animal will, ceteris paribus, make the response less likely to occur when the organism encounters the same situation.^[11] The greater the positive or negative feelings produced, the greater the likelihood that the behavior will be evinced. To this Thorndike added the “Law of Exercise”, that responses to situations will, ceteris paribus, be more associated with those situations in proportion to the frequency of past pairings between situation and response. Thorndike’s paradigm was popularized and extended by B.F. Skinner (see, e.g., Skinner 1953) who stressed the notion not just of consequences but of reinforcement as the basis of forming associations. For Skinner, a behavior would get associated with a situation according to the frequency and strength of reinforcement that would arise as a consequence of the behavior.

Since the days of Skinner, associative learning has come in many different variations. But what all varieties should share with their historical predecessors is that associative learning is supposed to mirror the contingencies in the world without adding additional structure (see section 9 for some examples of when supposedly associative theories smuggle in extra structure). The question of what contingencies associative learning detects (that is, one’s preferred analysis of what the associative relation R is), is up for debate and changes between theorists.

The final widely shared, though less central, property of associative learning concerns the domain generality of associative learning. Domain generality’s prevalence among associationists is due in large part to their traditional empiricist allegiances: excising domain-specific learning mechanisms constrains the amount of innate mental processes one has to posit. Thus it is no surprise to find that both Hume and Pavlov assumed that associative learning could be used to acquire associations between any contents, regardless of the types of contents they were. For example, Pavlov writes,

Any natural phenomenon chosen at will may be converted into a conditioned stimulus. Any ocular stimulus, any desired sound, any odor, and the stimulation of any portion of the skin, whether by mechanical means or by the application of heat or cold never failed to stimulate the salivary glands. (Pavlov 1906: 615)

For Pavlov the content of the CS doesn’t matter. Any content will do, as long as it bears the right functional relationship in the organism’s learning history. In that sense, the learning is domain general—it matters not what the content is, just the role it plays (for more on this topic, see section 9.4).^[12]

4. Associationism as a Theory of Mental Structure

As a theory of learning, associationism amounts to a constellation of related views that interpret learning as associating stimuli with responses (in operant conditioning), or stimuli with other stimuli (in classical conditioning), or stimuli with valences (in evaluative conditioning).^[13] Associative learning accounts raise the question: when one learns to associate contents X and Y because, e.g., previous experiences with Xs and Ys instantiated R, how does one store the information that X and Y are associated?^[14] A highly contrived sample answer to this question would be that a thinker learns an explicitly represented unconscious conditional rule that states “when a token of x is activated, then also activate a token of y”. Instead of such a highly intellectualized response, associationists have found a natural (though by no means necessary, see section 4.2) complementary view that the information is stored in an associative structure.

An associative structure describes the type of bond that connects two distinct mental states.^[15] An example of such a structure is the associative pair salt/pepper.^[16] The associative structure is defined, in the first instance, functionally: if X and Y form an associative structure, then, ceteris paribus, activations of mental state X bring about mental state Y and vice versa without the mediation of any other psychological states (such as an explicitly represented rule telling the system to activate a concept because its associate has been activated).^[17] In other words, saying that two concepts are associated amounts to saying that there is a reliable, psychologically basic causal relation that holds between them—the activation of one of the concepts causes the activation of the other. So, saying that someone harbors the structure salt/pepper amounts to saying that activations of salt will cause activations of pepper (and vice versa) without the aid of any other cognitive states.

Associative structures are most naturally contrasted with propositional structures. The key distinction is that ‘association’ denotes a causal relation among mental representations, where ‘predication’ expresses a relation between things in the world (or intentional contents that specify external relations). A pure associationist is opposed to propositional structures—strings of mental representations that express a proposition—because propositionally structured mental representations have structure over and above the mere associative bond between two concepts. Take, for example, the associative structure green/toucan. This structure does not predicate green onto toucans; it merely indicates that activating one of those concepts leads to the activation of the other. A pure associative theory rules out predication, for propositional structures aren’t merely strings of associations. Saying that someone has an associative thought green/toucan tells you something about the causal and temporal sequences of the activation of concepts in one’s mind; saying that someone has the thought that is a toucan tells you that a person is predicating greenness of a particular toucan (see Fodor 2003: 91–94, for an expansion of this point).

Associative structures needn’t just hold between simple concepts. One might have reason to posit associative structures between propositional elements (see section 5) or between concepts and valences (see section 8). But none of the proceeding is meant to imply that all structures are associative or propositional—there are other representational formats that the mind might harbor (e.g., analog magnitudes or iconic structures; see Camp 2007; Quilty-Dunn 2020). For instance, not all semantically related concepts are harbored in associative structures. Semantically related concepts may in fact also be directly associated (as in doctor/nurse) or they may not (as in horse/zebra; see Perea and Rosa 2002). The difference in structure is not just a theoretical possibility, as these different structures have different functional profiles: for example, conditioned associations appear to last longer than semantic associations do in subjects with dementia (Glosser and Friedman 1991).

4.1 Associative Symmetry

The analysis of associative structures implies that, ceteris paribus, associations are symmetric in their causal effects: if a thinker has a bond between salt/pepper, then salt should bring about pepper just as well as pepper brings about salt (for extensive discussion of the symmetry point see Quilty-Dunn and Mandelbaum 2019). But all else is rarely equal. For example, behaviorists such as Thorndike, Hull, and Skinner knew that the order of learning affected the causal sequence of recall: if one is always hearing “salt and pepper” then salt will be more poised to activate pepper than pepper to activate salt. So, included in the ceteris paribus clause in the analysis of associative structures is the idealization that the learning of the associative elements was equally well randomized in order.

Similarly, associative symmetry is violated when there are differing amounts of associative connections between the individual associated elements. For example, in the green/toucan case, most thinkers will have many more associations stemming from green than stemming from toucan. Suppose we have a thinker that only associates toucan with green, but associates green with a large host of other concepts (e.g., grass, vegetables, tea, kermit, seasickness, moss, mold, lantern, ireland, etc). In this case one can expect that toucan will more quickly activate green than green will activate toucan, for the former bond will have its activation strength less weakened amongst other associates than the latter will.

4.2 Activation Maps of Associative Structure

An associative activation map (sometimes called a “spreading activation” map, Collins and Luftus 1975) is a mapping for a single thinker of all the associative connections between concepts.^[18] There are many ways of operationalizing associative connections. In the abstract, a psychologist will attempt to probe which concepts (or other mental elements) activate which other concepts (or elements). Imagine a subject who is asked to say whether a string of letters constitutes a word or not, which is the typical goal given to subjects in a “lexical decision task”. If a subject has just seen the word “mouse”, we assume that the concept mouse was activated. If the subject is then quicker to say that, e.g., “cursor” is a word than the subject is to say that “toaster” is, then we can infer that cursor was primed, and is thus associatively related to mouse, in this thinker. Likewise, if we find that “rodent” is also responded to quicker, then we know that rodent is associatively related to mouse. Using this procedure, one can generate an associative mapping of a thinker’s mind. Such a mapping would constitute a mapping of the associative structures one harbors. However, to be a true activation map—a true mapping of what concepts facilitate what—the mapping would also need to include information about the violations of symmetry between concepts.

4.3 Relation Between Associative Learning and Associative Structures

The British Empiricists desired to have a thoroughgoing pure associationist theory, for it allowed them to lessen the load of innate machinery they needed to posit. Likewise, the behaviorists also tended to want a pure associationist theory (sometimes out of a similar empiricist tendency, other times because they were radical behaviorists like Skinner, who banned all discussion of mental representations). Pure associationists tend to be partial to a connection that Fodor (2003) refers to as “Bare-Boned Association”. The idea is that the current strength of an associative connection between X and Y is determined, ceteris paribus, by the frequency of the past associations of X and Y. As stated, Bare-Boned Association assumes that associative structures encode, at least implicitly, the frequency of past associations of X and Y, and the strength of that associative bond is determined by the organism’s previous history of experiencing Xs and Ys.^[19] In other words, the learning history of past associations determines the current functional profile of the corresponding associative structures.^[20]

Although the picture sketched above, where associative learning eventuates in associative structure, is appealing for many, it is not forced upon one, as there is no a priori reason to bar any type of structure to arise from a particular type of learning. One may, for example, gain propositional structures from associative learning (see Mitchell et al. 2009 and Mandelbaum 2016 for arguments that this is more than a mere logical possibility). This can happen in two ways. In the first, one may gain an associative structure that has a proposition as one of its associates. Assume that every time one’s father came home he immediately made dinner. In such a case one might associate the proposition daddy is home with the concept dinner (that is one might acquire: daddy is home/dinner). However, one might also just have a propositional structure result from associative learning. If every time one’s father came home he made dinner, then one might just end up learning if daddy is home then dinner will come soon, which is a propositional structure.

4.4 Extinction and Counterconditioning

There is a different, tighter relationship between associative learning and associative structures concerning how to modulate an association. Associative theorists, especially from Pavlov onward, have been clear on the functional characteristics necessary to modulate an already created association. There have been two generally agreed upon routes: extinction and counterconditioning. Suppose that, through associative learning, you have learned to associate a CS with a US. How do we break that association? Associationists have posited that one breaks an associative structure via two different types of associative learning (/unlearning). Extinction is the name for one such process. During extinction one decouples the external presentation of the CS and the US by presenting the CS without the US (and sometimes the US without the CS). Over time, the organism will learn to disconnect the CS and US.

Counterconditioning names a similar process to extinction, though one which proceeds via a slightly different method. Counterconditioning can only occur when an organism has an association between a mental representation and a valence, as acquired in an evaluative conditioning paradigm. Suppose that one associates ducks with a positive valence. To break this association via counterconditioning one introduces ducks not with a lack of positive valence (as would happen in extinction) but with the opposite valence, a negative valence. Over multiple exposures, the initial representation/valence association weakens, and is perhaps completely broken.^[21]

How successful extinction and counterconditioning are, and how they work, is the source of some controversy, and some reason to see both methods as highly ineffectual (Bouton 2004). Although the traditional view is that extinction breaks associative bonds, it is an open empirical question whether extinction proceeds by breaking the previously created associative bonds, or whether it proceeds by leaving that bond alone but creating new, more salient (and perhaps context-specific) associations between the CS and other mental states (Bouton 2002, Bendaña and Mandelbaum 2021). Additionally, reinstatement, the spontaneous reappearance of an associative bond after seemingly successful extinction, has been observed in many contexts (see, e.g., Dirikx et al. 2004 for reinstatement of fear in humans).^[22]

One fixed point in this debate is that one reverses associative structures via these two types of associative learning/unlearning, and only via these two pathways. What one does not do is try to break an associative structure by using practical or theoretical reasoning. If you associate salt with pepper, then telling you that salt has nothing to do with pepper or giving you very good reasons not to associate the two (say, someone will give you $50,000 for not associating them) won’t affect the association. This much has at least been clear since Locke. In the Essay concerning Human Understanding, in his chapter “On the Association of Ideas” (chapter XXIII) he writes,

When this combination is settled, and while it lasts, it is not in the power of reason to help us, and relieve us from the effects of it. Ideas in our minds, when they are there, will operate according to their natures and circumstances. And here we see the cause why time cures certain affections, which reason, though in the right, and allowed to be so, has not power over, nor is able against them to prevail with those who are apt to hearken to it in other cases. (2.23.13)

Likewise, say one has just eaten lutefisk and then vomited. The smell and taste of lutefisk will then be associated with feeling nauseated, and no amount of telling one that they shouldn’t be nauseated will be very effective. Say the lutefisk that made one vomit was covered in poison, so that we know that the lutefisk wasn’t the root cause of the sickness. Having this knowledge won’t dislodge the association. In essence, associative structures are functionally defined as being alterable based on counterconditioning, extinction, and nothing else. Thus, assuming one sees counterconditioning and extinction as types of associative learning, we can say that associative learning does not necessarily eventuate in associative structures, but associative structures can only be modified by associative learning.

5. Associative Transitions

So far we’ve discussed learning and mental structures, but have yet to discuss thinking. The pure associationist will want a theory that covers not just acquisition and cognitive structure, but also the transition between thoughts. Associative transitions are a particular type of thinking, akin to what William James called “The Stream of Thought” (James 1890). Associative transitions are movements between thoughts that are not predicated on a prior logical relationship between the elements of the thoughts that one connects. In this sense, associative transitions are contrasted with computational transitions as analyzed by the Computational Theory of Mind (Fodor 2001; Quilty-Dunn and Mandelbaum 2018, 2019; Quilty-Dunn et al. 2023; see the entry on Computational Theory of Mind). CTM understands inferences as truth preserving movements in thought that are underwritten by the formal/syntactic properties of thoughts. For example inferring the conclusion in modus ponens from the premises is possible just based on the form of the major and minor premise, and not on the content of the premises. Associative transitions are transitions in thought that are not based on the logico-syntactic properties of thoughts. Rather, they are transitions in thought that occur based on the associative relations among the separate thoughts.

Imagine an impure associationist model of the mind, one that contains both propositional and associative structures. A computational inference might be one such as inferring you are a g from the thoughts if you are an f, then you are a g, and you are an f. However, an associative transition is just a stream of ideas that needn’t have any formal, or even rational, relation between them, such as the transition from this coffee shop is cold to russia should annex idaho, without there being any intervening thoughts. This transition could be subserved merely by one’s association of idaho and cold, or it could happen because the two thoughts have tended to co-occur in the past, and their close temporal proximity caused an association between the two thoughts to arise (or for many other reasons). Regardless of the etiology, the transition doesn’t occur on the basis of the formal properties of the thoughts.^[23]

According to this taxonomy, talk of an “associative inference” (e.g., Anderson et al. 1994; Armstrong et al. 2012) is a borderline oxymoron. The easiest way to give sense to the idea of an associative inference is for it to involve transitions in thought that began because they were purely inferential (as understood by the computational theory of mind) but then became associated over time. For example, at first one might make the modus ponens inference because a particular series of thoughts instantiates the modus ponens form. Over time the premises and conclusion of that particular token of a modus ponens argument become associated with each other through their continued use in that inference and now the thinker merely associates the premises with the conclusion. That is, the constant contiguity between the premises and the conclusion occurred because the inference was made so frequently, but the inference was originally made so frequently not because of the associative relations between the premises and conclusion, but because the form of the thoughts (and the particular motivations of the thinker). This constant contiguity then formed the basis for an associative linkage between the premises and the conclusion. ^[24]

As was the case for associative structures, associative transitions in thought are not just a logical possibility. There are particular empirical differences associated with associative transitions versus inferential transitions (see section 6 of Quilty-Dunn et al. 2023). Associative transitions tend to move across different content domains, whereas inferential transitions tend to stay on a more focused set of contents. These differences have been seen to result in measurable differences in mood: associative thinking across topics bolsters mood when compared to logical thinking on a single topic (Mason and Bar 2012).

6. Associative Instantiation

The associationist position so far has been neutral on how associations are to be implemented. Implementation can be seen at a representational (that is psychological) level of explanation, or at the neural level. A pure associationist picture would posit an associative implementation base at one, or both, of these levels.^[25]

The most well-known associative instantiation base is a class of networks called Connectionist networks (see the entry on connectionism and section 10 below). Connectionist networks are sometimes pitched at the psychological level (see, e.g., Elman 1991; Elman et al. 1996; Smolensky 1988). This amounts to the claim that models of algorithms embedded in the networks capture the essence of certain mental processes, such as associative learning. Other times connectionist networks are said to be models of neural activity (“neural networks”). Connectionist networks consist in sets of nodes, generally input nodes, hidden nodes, and output nodes. Input nodes are taken to be analogs of sensory neurons (or sub-symbolic sensory representations), output nodes the analog of motor neurons (or sub-symbolic behavioral representations), and hidden nodes are stand-ins for all other neurons.^[26] The network consists in these nodes being connected to each other with varying strengths. The topology of the connections gives one an associative mapping of the system, with the associative weights understood as the differing strengths of connections. On the psychological reading, these associations are functionally defined; on the neurological reading, they are generally understood to be representing synaptic conductance (and are the analogs of dendrites).^[27] Prima facie, these networks are purely associative and do not contain propositional elements, and the nodes themselves are not to be equated with single representational states (such as concepts; see, e.g., Gallistel and King 2009).

However, a connectionist network can implement a classical Turing machine architecture (see, e.g., Fodor and McLaughlin 1990; Chalmers 1993). Many, if not most, of the adherents of classical computation, for example proponents of CTM, think that the brain is an associative network, one which implements a classical computational program. Some adherents of CTM do deny that the brain runs an associative network (see, e.g., Gallistel and King 2009, who appear to deny that there is any scientific level of explanation that association is intimately involved in), but they do so on separate empirical grounds and not because of any logical inconsistency with an associative brain implementing a classical mind.

When discussing an associative implementation base it is important to distinguish questions of associationist structure from questions of representational reality. Connectionists have often been followers of the Skinnerian anti-representationalist tradition (Skinner 1938). Because of the distributed nature of the nodes in connectionist networks, the networks have tended to be analyzed as associative stimulus/response chains of subsymbolic elements. However, the question of whether connectionist networks have representations which are distributed in patterns of activity throughout different nodes of the network, or whether connectionist networks are best understood as containing no representational structures at all, is orthogonal to both the question of whether the networks are purely associative or computational, and whether the networks can implement classical architectures.

7. Relation between the Varieties of Association and Related Positions

These four types of associationism share a certain empiricist spiritual similarity, but are logically, and empirically, separable. The pure associationist who wants to posit the smallest number of domain-general mental processes will theorize that the mind consists of associative structures acquired by associative learning which enter into associative transitions and are implemented in an associative instantiation base. However, many hybrid views are available and frequently different associationist positions become mixed and matched, especially once issues of empiricism, domain-specificity, and gradual learning arise. Below is a partial taxonomy of where some well-known theorists lie in terms of associationism and these other, often related doctrines.

Prinz (2002) and Karmiloff-Smith (1995) are examples of empiricist non-associationists. It is rare to find an associationist who is a nativist, but plenty of nativists have aspects of associationism in their own work. For example, even the arch-nativist Jerry Fodor maintains that intramodular lexicons contain associative structures (Fodor 1983). Similarly, there are many non-behaviorist (at least non-radical, analytic, or methodological behaviorist) associationists, such as Elman (1991), Smolensky (1988), Baeyens (De Houwer and Baeyens 2001), and modern day dual process theorists such as Evans and Stanovich (2013). It is quite difficult to find a non-associationist behaviorist, though Tolman approximates one (Tolman 1948). Elman and Smolensky also qualify as representationalist associationists, and Van Gelder (1995) as an anti-representationalist non-associationist. Karmiloff-Smith (1995) can be interpreted as, for some areas of learning, a proponent of gradual learning without being associationist (some might also read contemporary Bayesian theorists, e.g., Tenenbaum et al. 2011 and Chater et al. 2006 as holding a similar position for some areas of learning). Rescorla (1988) and Heyes (2012) claim to be associationists who are pro step-wise, one shot learning (though Rescorla sees his project as a continuation of the classical conditioning program, others see his data as grist for the anti-associationist, pro-computationalist mill, see Gallistel and King 2009; Quilty-Dunn and Mandelbaum 2019). Lastly, Tenenbaum and his contemporary Bayesians colleagues sometimes qualify as holding a domain-general learning position without it being associationist, though they are no foes of innate content as they build many aspects of core cognition into their theoretical basis (see Tenenbaum et al. 2011; Carey 2009; Spelke 2022).^[28]

8. Associationism in Social Psychology

Since the cognitive revolution, associationism’s influence has mostly died out in cognitive psychology and psycholinguistics. This is not to say that all aspects of associative theorizing are dead in these areas; rather, they have just taken on much smaller, more peripheral roles (for example, it has often been suggested that mental lexicons are structured, in part, associatively, which is why lexical decision tasks are taken to be facilitation maps of one’s lexicon). In other areas of cognitive psychology (for example, the study of causal cognition, see Gerstenberg et al. 2021), associationism is no longer the dominant theoretical paradigm, but vestiges of associationism still persist (see Shanks 2010 for an overview of associationism in causal cognition). Associationism is also still alive in the connectionist literature, as well as in the animal cognition tradition.

But the biggest contemporary stronghold of associationist theorizing resides in social psychology, an area which has traditionally been hostile to associationism (see, e.g., Asch 1962, 1969). The ascendance of associationism in social psychology has been a fairly modern development, and has caused a revival of associationist theories in philosophy (e.g., Gendler 2008). The two areas of social psychology that have seen the greatest renaissance of associationism are the implicit attitude and dual-process theory literature. However, in the late 2010s social psychology has begun to take a critical look at associationist theories (e.g., Mann et al. 2019; Kurdi and Dunham 2021; Kurdi and Mandelbaum 2023).

8.1 Implicit Attitudes

Implicit attitudes are generally operationally defined as the attitudes tested on implicit tests such as the Implicit Association Test (Greenwald et al. 1998), the Affect Misattribution Procedure (Payne et al. 2005), the Sorted Paired Feature Task (Bar-Annan et al. 2009) and the Go/No-Go Association Task (Nosek and Banaji 2001). Implicit attitudes are contrasted with explicit attitudes, attitudes operationalized as the one’s being probed when one gives an explicit response like a marking on a Likert scale, feeling thermometer, or in free report. Such operationalizations leave open the question of whether there are any natural kinds to which explicit and implicit attitudes refer. In general implicit attitudes are characterized as being mental representations that are unavailable for explicit report and inaccessible to consciousness (Morris and Kurdi 2023; cf. Hahn et al. 2014; Berger 2020).

The default position among social psychologists is to treat implicit attitudes as if they are associations among mental representations (Fazio 2007), or among pairs of mental representations and valences. In particular, they treat implicit attitudes as associative structures which enter into associative transitions. Recently this issue has come under much debate. In an ever expanding series of studies, De Houwer and his collaborators have taken to show that associative learning is, at base, relational, propositional contingency learning; i.e., that all putatively associative learning is in fact a nonautomatic learning process that generates and evaluates propositional hypotheses (Mitchell et al. 2009; De Houwer 2009, 2011, 2014, 2019; Hughes et al. 2019). Other researchers have approached the question also using learning as the entrance point to the debate, demonstrating effects that non-associative acquisition creates stronger attitudes than associative acquisition (Hughes et al. 2019). For example, one might demonstrate that learning through merely reading an evaluative statement creates a stronger implicit attitude than repeated associative exposures (Kurdi and Banaji 2017, 2019; Mann et al. 2019). Other researchers have championed propositional models not based on learning, but instead based on how implicit attitudes change regardless of how they are acquired. For instance, Mandelbaum (2016) argued that logical/evidential interventions modulate implicit attitudes in predictable ways (e.g., using double negation to cancel each other out), while others have used diagnosticity to show that implicit attitudes update in a non-associationistic, propositional way (e.g., after reading a story about a man who broke into a building and appeared to ransack it you learn that we jumped into save people from a fire and immediately change your opinion of the man from negative to positive; Mann and Ferguson 2015; Mann et al. 2017; Van Dessel et al. 2019). (For more on implicit attitudes see the entry on implicit bias). Perhaps the most probing work in this area has been the work of Benedek Kurdi and colleagues which has pitted associative vs. propositional models in both acquisition (Kurdi and Banaji) and change (Kurdi and Dunham 2021), finding very little work for associative models to accomplish.

8.2 Dual Process Theories

Associative structures and transitions are widely implicated in a particular type of influential dual-process theory. Though there are many dual-process theories in social psychology (see, e.g., the papers in Chaiken and Trope 1999, or the discussion in Evans and Stanovich 2013), the one most germane to associationism is also the most popular. It originates from work in the psychology of reasoning and is often also invoked in the heuristics and biases tradition (see, e.g., Kahneman 2011). It has been developed by many different psychological theorists (Sloman 1996; Smith and DeCoster 2000; Wilson et al. 2000; Evans and Stanovich 2013) and, in parts, taken up by philosophers too (see, e.g., Gendler 2008; Frankish 2009; see also some of the essays in Evans and Frankish 2009).

The dual-process strain most relevant to the current discussion posits two systems, one evolutionarily ancient intuitive system underlying unconscious, automatic, fast, parallel and associative processing, the other an evolutionarily recent reflective system characterized by conscious, controlled, slow, “rule-governed” serial processes (see, e.g., Evans and Stanovich 2013). The ancient system, sometimes called “System 1”, is often understood to include a collection of autonomous, distinct subsystems, each of which is recruited to deal with distinct types of problems (see Stanovich 2011 for a discussion of “TASS—the autonomous set of systems”). Although theories differ on how System 1 interacts with System 2,^[29] the theoretical core of System 1 is arguing that its processing is essentially associative. As in the implicit attitude debate, dual-systems models have recently come under sustained critique (see Kruglanski 2013; Osman 2013; Mandelbaum 2016; De Houwer 2019), though they remain very popular.

9. Criticisms of Associationism

Associationism has been a dominant theme in mental theorizing for centuries. As such, it has garnered an appreciable amount of criticism.

9.1 Learning Curves

The basic associative learning theories imply, either explicitly or implicitly, slow, gradual learning of associations (Baeyens et al. 1995). The learning process can be summarized in a learning curve which plots the frequency (or magnitude) of the conditioned response as a function of the number of reinforcements (Gallistel et al. 2004: 13124). Mappings between CRs and USs are gradually built up over numerous trials (in the lab) or experiences (in the world). Gradual, slow learning has come under fire from a variety of areas (see section 9.3 and section 9.4.1). However, here we just focus on the behavioral data. In a series of works re-analyzing animal behavior, Gallistel (Gallistel et al. 2004; Gallistel and King 2009) has argued that although group-level learning curves do display the properties of being negatively accelerated and gradually developing, these curves are misleading because no individual’s learning curve has these properties. Gallistel has argued that learning for individuals is generally step-like, rapid, and abrupt. An individual’s learning from a low-level of responding to asymptotic responding is very quick. Sometimes, the learning is so quick that it is literally one-shot learning. For example, after analyzing multiple experiments of animal learning of spatial location Gallistel writes,

The learning of a spatial location generally requires but a single experience. Several trials may, however, be required to convince the subject that the location is predictable from trial to trial. (Gallistel et al. 2004: 13130)

Gallistel argues that the reason the group learning curves look to be smooth and gradual is that there are large individual differences between subjects in terms of when the onset latency of the step-wise curves begin (Gallistel et al. 2004: 13125); in other words, different animals take different amounts of time for the learning to commence. The differences between individual subjects’ learning curves are predicated on when the steps begin and not by the speed of the individual animal’s learning process. All individuals appear to show rapid rises in learning, but since each begins their learning at a different time, when we average over the group, the rapid step-wise learning appears to look like slow, gradual learning (Gallistel et al. 2004: 13124).

9.2 The Problem of Predication

The problem of predication is, at its core, a problem of how an associative mechanism can result in the acquisition of subject/predicate structures, structures which many theorists believe appear in language, thought, and judgment. The first major discussion of the problem appears in Kant (1781/1787), but variants of the basic Kantian criticism can be seen across the contemporary literature (see, e.g., Chomsky 1959; Fodor and Pylyshyn 1988; Fodor 2003; Mandelbaum 2013a; for the details of the Kantian argument see the entry on Kant’s Transcendental Argument).

For a pure associationist, association is “semantically transparent” (see Fodor 2003), in that it purports to add no additional structure to thoughts. When a simple concept, X and a simple concept Y, become associated one acquires the associative structure X/Y. But X/Y has no additional structure on top of their contents. Knowing that X and Y are associated amounts to knowing a causal fact: that activating Xs will bring about the activation of Ys and vice versa. However, so the argument goes, some of our thoughts appear to have more structure than this: the thought birds fly predicates the property of flying onto birds. The task for the associationist is to explain how associative structures can distinguish a thinker who has a single (complex) thought birds fly from a thinker who conjoins two simple thoughts in an associative structure where one thought, birds, is immediately followed by another, fly. As long as the two simple thoughts are reliably causally correlated so that, for a thinker, activations of birds regularly brings about fly, then that thinker has the associative structure birds/fly. Yet it appears that thinker hasn’t yet had the thought birds fly. The problem of predication is explaining how a purely associative mechanism could eventuate in complex thoughts. In Fodor’s terms the problem boils down to how association, a causal relation among mental representations, can affect predication, a relation among intentional contents (Fodor 2003).

A family of related objections to associationism can be interpreted as variations on this theme. For example, problems of productivity, compositionality, and systematicity for associationist theorizing appear to be variants of the problem of predication (for more on these specific issues see the entries on the Language of Thought Hypothesis and on compositionality). If association doesn’t add any additional structure to the mental representations that get associated, then it is hard to see how it can explain the compositionality of thought, which relies on structures that specify relations among intentional contents. Compositionality requires that the meaning of a complex thought is determined by the meanings of its simple constituents along with their syntactic arrangements. The challenge to associationism is to explain how an associative mechanism can give rise to the syntactic structures necessary to distinguish a complex thought like birds fly from the temporal succession of two simple thoughts birds and fly. Since the compositionality of thought is posited to undergird the productivity of thought (thinkers’ abilities to think novel sentences of arbitrary lengths, e.g., green birds fly, giant green birds fly, cuddly giant green birds fly, etc.), associationism has problems explaining productivity.

Systematicity is the thesis that there are predictable patterns among which thoughts a thinker is capable of entertaining. Thinkers that can entertain thoughts of certain structures can always entertain distinct thoughts that have related structure. For instance, any thinker who can think a complex thought of the form “X transitive verb Y” can think “Y transitive verb X”.^[30] Systematicity entails that we won’t find any thinker that can only think one of those two thoughts, in which case we could not find a person who could think audrey wronged max, but not max wronged audrey. Of course, these two thoughts have very different effects in one’s cognitive economy. The challenge for the associationist is to explain how the associative structure audrey/wronged/max can be distinguished from the structure max/wronged/audrey, while capturing the differences in those thoughts’ effects.

Associationists have had different responses to the problem. Some have denied that human thought is actually compositional, productive, and systematic, and other non-associationists have agreed with this critique. For example, Prinz and Clark claim “concepts do not compose most of the time” (2002: 62), and Johnson (2004) argues that the systematicity criterion is wrongheaded (see Aydede 1997 for extended discussion of these issues). Rumelhart et al. offer a connectionist interpretation of “schemata”, one which is intended to cover some of the phenomenon mentioned in this section (Rumelhart et al. 1986). Others have worked to show that classical conditioning can indeed give rise to complex associative structures (Rescorla 1988). In defense of the associationist construal of complex associations Rescorla writes,

Clearly, the animals had not simply coded the RH [complex] compound in terms of parallel associations with its elements. Rather they had engaged in some more hierarchical structuring of the situation, forming a representation of the compound and using it as an associate. (Rescorla 1988: 156)

Whether or not associationism has the theoretical tools to explain such complex compounds by itself is still debated (see, e.g., Fodor 2003; Mitchell 2009; Gallistel and King 2009; Quilty-Dunn and Mandelbaum 2019; Quilty-Dunn et al. 2023). Notably, recent work in deep learning suggests that connectionist models may be capable of exhibiting systematic compositional generalization. For example, Lake and Baroni (2023) found that neural networks trained through meta-learning—a process where models learn how to learn by training on a distribution of related tasks—can acquire human-like systematic generalization abilities, allowing them to correctly interpret novel combinations of familiar elements. While debate continues about whether such models truly capture the nature of human compositionality, these findings challenge the long-standing assumption that connectionist architectures cannot generalize systematically.

9.3 Word Learning

Multiple issues in the acquisition of the lexicon appear to cause problems for associationism. Some of the most well known examples are reviewed below (for further discussion of word learning and associationism see Bloom 2000).

9.3.1 Fast Mapping

Children learn words at an incredible rate, acquiring around 6,000 words by age 6 (Carey 2010: 184). If gradual learning is the rule, then words too should be learned gradually across this time. However, this does not appear to be the case. Susan Carey discovered the phenomenon of “fast mapping”, which is one-shot learning of a word (Carey 1978a, 1978b; Carey and Bartlett 1978). Her most influential example investigated children’s acquisition of “chromium” (a color word referring to olive green). Children were shown one of two otherwise identical objects, which only differed in color and asked, “Can you get me the chromium tray, not the red one, the chromium one” (recited in Carey 2010: 2). All of the children handed over the correct tray at that time. When the children were later tested in differing contexts, more than half remembered the referent of “chromium”. These findings have been extended—for example, Markson and Bloom (1997) showed that they are not specific to the remembering of novel words, but also hold for novel facts.

Fast mapping poses two problems for associationism. The first is that the learning of a new word did not develop slowly, as would be predicted by proponents of gradual learning. The second is that in order for the word learning to proceed, the mind must have been aided by additional principles not given by the environment. Some of these principles such as Markman’s (1989) taxonomic, whole object, and mutual exclusivity constraints, and Gleitman’s syntactic bootstrapping (Gleitman et al. 2005), imply that the mind does add structure to what is learned. Consequently, the associationist claim that learning is just mapping external contingencies without adding structure is imperiled.

Recent research complicates the critique that associationist models cannot account for fast mapping. For example, Wang et al. (2025) show that neural networks can develop one-shot word learning abilities through meta-learning—practicing word learning across many examples. Their models achieve efficient word learning without explicit structural constraints, using only human-scale child-directed language. However, this approach still requires training on the specific task of word learning itself, suggesting a middle ground: while pure associationism may be insufficient, structured associative learning through meta-learning might support fast mapping without requiring innate constraints.

9.3.2 Syntactic Category Learning

“Motherese”, the name of the type of language that infants generally hear, consists of simple sentences such as “Nora want a bottle?” and “Are you tired?”. These sentences almost always contain a noun and a verb. Yet, the infant’s vocabulary massively over-represents nouns in the first 100 words or so, while massively under-representing the verbs (never mind adjectives or adverbs, which almost never appear in the first 100 words infants produce; see, e.g., Goldin-Meadow, Seligman, and Gelman 1976). Even more surprising is that the over-representation of nouns to verbs holds even though

the incidence of each word (that is, the token frequency) is higher for the verbs than for the nouns in the common set used by mothers. (Snedeker and Gleitman 2004: 259, citing data from Sandhoffer, Smith, and Luo 2000)

Moreover, children hear a preponderance of determiners (“the” and “a”) but don’t produce them (Bloom 2000). These facts are not specific to English, but hold cross-culturally (see, e.g., Caselli et al. 1995). The disparity between the variation of the syntactic categories infants receive as input and produce as output is troublesome to associationism, insofar as associationism is committed to the learned structures (and the behaviors that follow from them) merely patterning what is given in experience.

9.4 Against the Contiguity Analysis of Associationism

Contiguity has been a central part of associationist analyses since the British Empiricists. In the experimental literature, the problem of figuring out the parameters needed for acquiring an association due to the contiguity of its relata has sometimes been termed the problem of the “Window of Association” (e.g., Gallistel and King 2009). Every associationist theory has to specify what temporal window two properties must instantiate in order for those properties to be associated.^[31] A related problem for contiguity theorists is that if the domain generality of associative learning is desired, then the window needs to be homogenous across content domains. The late 1960s saw persuasive attacks on domain generality, as well as the necessity and sufficiency of the contiguity criterion in general.

9.4.1 Against the Necessity of Contiguity

Research on “taste aversions” and “bait-shyness” provided a variety of problems with contiguity in the associative learning tradition of classical conditioning. Garcia observed that a gustatory stimulus (e.g., drinking water or eating a hot dog) but not an audiovisual stimulus (a light and a sound) would naturally become associated with feeling nauseated. For instance, Garcia and Koelling (1966) paired an audiovisual stimulus (a light and a sound) with a gustatory stimulus (flavored water). The two stimuli were then paired with the rats receiving radiation, which made the rats nauseated. The rats associated the feeling of nausea with the water and not with the sound, even though the sound was contiguous with the water. Moreover, the delay between ingesting the gustatory stimulus and feeling nauseated could be quite long, with the feeling not coming on until 12 hours later (Roll and Smith 1972), and the organism needn’t even be conscious when the negative feeling arises. (For a review, see Seligman 1970; Garcia et al. 1974). The temporal delay shows that the CS (the flavored water) needn’t be contiguous with the US (the feeling of nausea) in order for learning to occur, thus showing that contiguity isn’t necessary for associative learning.

Garcia’s work also laid bare the problems with the domain-general aspect of associationism. In the above study the rat was prepared to associate the nausea with the gustatory stimulus, but would not associate it with the audiovisual stimulus. However, if one changes the US from feeling nauseated to receiving shocks in perfect contiguity with the audiovisual and gustatory stimuli, then the rats will associate the shocks with the audiovisual stimulus but not with the gustatory stimulus. That is, rats are prepared to associate audiovisual stimuli with the shock but are contraprepared to associate the shocks with the gustatory stimulus. Thus, learning does not seem to be entirely domain-general (for similar content specificity effects in humans, see Baeyens et al. 1990).^[32]

Lastly, “The Garcia effect” has also been used to show problems in the learning curve (see section 9.1). “Taste aversions” are the phenomena whereby an organism gets sick from ingesting the stimulus and the taste (or odor, Garcia et al. 1974) of that stimulus gets associated with the feeling of sickness. As anyone who has had food poisoning can attest, this learning can proceed in a one-shot fashion, and needn’t have a gradual rise over many trials (taste aversions have also been observed in humans, see, e.g., Bernstein and Webster 1980; Bernsetin 1985; Logue et al. 1981; Rozin 1986).

9.4.2. Against the Sufficiency of Contiguity

Kamin’s famous blocking experiments (1969) showed that not all contiguous structures lead to classical conditioning. A rat that has already learned that CS1 predicts a US, will not learn that a subsequent CS2 predicts the US, if the CS2 is always paired with the CS1. Suppose that a rat has learned that a light predicts a shock because of the constant contiguity of the light and shock. After learning this, the rat has a sound introduced which only arises in conjunction with the light and the shock. As long as the rat had previously learned that the light predicts the shock, it will not learn that the sound does (as can be seen on later trials that have the sound alone). In sum, having learned that the CS1 predicts the US blocks the organism from learning that the CS2 predicts the US.^[33] So even though CS2 is perfectly contiguous with the US, the association between CS2 and the US remains unlearned, thus serving as a counterexample to sufficiency of contiguity.^[34]

Similarly Rescorla (1968) demonstrated that a CS can appear only when the US appears and yet still have the association between them be unlearnable. If a tone is arranged to bellow only when there are shocks, but there are still shocks when there are no tones (that is, the CS only appears with the US, but the US sometimes appears without the CS), no associative learning between the CS and the US will occur. Instead, subjects (in Rescorla 1968, rats) will only learn a connection between the shock and the experimental situation—e.g., the room in which the experiment is carried out.

In large part because of the problems discussed in 9.4, many classical conditioning theorists gave up the traditional program. Some, like Garcia, appeared to give up the classical theoretical framework altogether (Garcia et al. 1974), others, such as Rescorla and Wagner, tried to usher the framework into the modern era (see, Rescorla and Wagner 1972; Rescorla 1988), where conditioning is seen as sensitive to base rates and driven by informational pick-up.^[35] The Rescorla-Wagner model, for example, proposes that learning occurs when there is a discrepancy between what is expected and what actually occurs—known as a prediction error (Rescorla & Wagner, 1972). The model explains blocking as follows: once CS1 fully predicts the US, no prediction error occurs when CS2 is added, preventing new learning. It also accounts for the insufficiency of contiguity by showing that mere co-occurrence is less important than the information stimuli provide about outcomes. The model’s emphasis on prediction error has been influential beyond associative learning, informing computational models of dopamine function (Schultz et al., 1997) and contemporary reinforcement learning algorithms (Sutton & Barto, 1998; see section 10). The shift from simple contiguity to prediction error illustrates a tension in the evolution of associationism: critics like Fodor and Gallistel argue that adding mechanisms like error correction effectively abandons associationism’s core commitment to parsimony, while defenders see such additions as necessary refinements that preserve the spirit of associative explanation. Whether this movement is interpreted as a substantive revision of classical conditioning (Rescorla 1988; Heyes 2012) or a wholesale abandoning of it (Gallistel and King 2009) is debatable.

9.5 Coextensionality

The Rescorla experiment also demonstrates another problem in associative theorizing: the question of why some property is singled out as a CS as opposed to different, equally contemporaneously instantiated properties. Put a different way, one needs a principle to say what the “same situation” amounts to in generalizations such as Thorndike’s laws. For instance, if a CS and a US, say a tone and a shock, are perfectly paired so that they are either both present or both absent, the organism won’t associate the location it received shocks (e.g., the experimental setting) with getting shocked, it will just associate the tone with the shocks. But in the condition where the US occurs without the CS, but the CS does not occur without the US, the organism will gain an association between the shocks and the location. However, in both cases the location is present on every trial.^[36] In contrast to shocks, x-ray radiation, when used as a US, never appears to become associated with location, even if they are always perfectly paired (Garcia et al. 1972).^[37]

The problem of saying which properties become associated when multiple properties are coinstantiated sometimes goes by the name the “Credit Assignment Problem” (see, e.g., Gallistel and King 2009, and below in section 10.2.3).^[38] Some would argue that this problem is a symptom of a larger issue: trying to use extensional criteria to specify intentional content (see, e.g., Fodor 2003). Associationists need a criterion to specify which of the coextensive properties will in fact be learned, and which not.

An additional worry stems from the observation that sometimes the lack of a property being instantiated is an integral component of what is learned. To deal with the problem of missing properties, contemporary associationists have introduced an important element to the theory: inhibition. For example, if a US and a CS only appear when the other is absent, the organism will learn a negative relationship holds between them; that is, the organism will learn that the absence of the CS predicts the US.^[39] Here the CS becomes a “conditioned inhibitor” of the US. Inhibition, using associations as modulators and not just activators, is a central part of current associationist thinking. For example, in connectionist networks, inhibition is implemented by the activation of certain nodes inhibiting the activation of other nodes. Connection weights can be positive or negative, with the negative weight standing in for the inhibitory strength of the association.

Various solutions to the coextensionality problem have been proposed by associationists. Mackintosh (1975) developed a selective attention model in which organisms learn through experience which stimuli are most predictive of important outcomes and dynamically shift attention to these cues. Attention increases to stimuli that are better predictors than other available stimuli and decreases to poorer predictors. This selective attention process helps explain both apparently sudden learning (as attention rapidly shifts) and why only certain co-extensive properties become associated (because attention selectively focuses on the most predictive cues). Pearce’s (1987) configural theory offers another associationist solution. Rather than representing stimuli as separate elements that independently associate with outcomes, Pearce proposed that organisms form representations of entire stimulus configurations. These configural representations can then associate with outcomes, and similar configurations will produce generalization of responding proportional to their similarity. This approach addresses the problem of which co-occurring features become associated by treating stimulus combinations as unique representational wholes.

10. Associationism and Reinforcement Learning

Reinforcement learning (RL) is a computational approach to understanding how agents learn optimal behavior through interaction with their environment. At its core, RL can be seen as formalizing the core problem of associationist theories of learning: how an agent learns to select beneficial actions by associating stimuli and responses based on their experienced consequences. Unlike other machine learning frameworks such as supervised learning which relies on labeled examples, or unsupervised learning which finds patterns in unlabeled data, RL involves learning through direct interaction with an environment and feedback about chosen actions. This trial-and-error approach allows agents to discover behavior that maximizes cumulative reward over time, even when the relationship between actions and their long-term consequences is initially unknown.

There is a direct lineage between associationist theories of learning and the development of modern RL. As discussed in section 3, Thorndike’s Law of Effect proposed that organisms learn by repeating behaviors followed by positive outcomes and avoiding those followed by negative outcomes, with the strength of stimulus-response connections depending on the perceived outcomes of the response. This emphasis on trial-and-error learning and the gradual strengthening of successful stimulus-response associations directly influenced early artificial intelligence researchers. In 1948, Alan Turing described one of the earliest designs for implementing trial-and-error learning in a computer—a ‘pleasure-pain system’ whose initially random decisions when faced with an undetermined choice would be canceled by ‘pain’ stimuli and made permanent by ‘pleasure’ stimuli (Turing 1948). Several early mechanical learning devices were inspired by similar ideas. In 1951, Marvin Minsky built the Stochastic Neural Analog Reinforcement Calculator (SNARC)—one of the first artificial neural network machines—in close consultation with Skinner himself (Minsky 1952). SNARC implemented a form of RL using a network of 40 artificial synapses based on Hebbian principles. It simulated rats running through mazes, with each synapse maintaining a probability of signal propagation that could be modified through reinforcement through a manually delivered reward. When the simulated rat reached its goal, a mechanical system would strengthen the recently active connections based on operant conditioning. These early efforts to implement mechanical learning by trial-and-error through reinforcement helped establish ideas that would later be formalized in modern RL algorithms.

The development of RL theory has both operationalized and extended associationist principles. While it maintained the fundamental idea that learning occurs through experience-dependent modification of associations, RL also introduced more sophisticated learning mechanisms for addressing challenges that simple associationism struggled to explain, and incorporated insights from other fields such as optimal control theory. In what follows, we will review some of the key innovations introduced by RL, how they relate to the limitations of traditional associationist learning, and their broader philosophical implications.

10.1 An overview of RL

RL models intelligent behavior as an interactive process between an agent and its environment. The agent—which could be anything from a chess-playing program to a robot—learns through direct experience by perceiving aspects of its environment, taking actions, and receiving feedback about their consequences. The environment represents everything external to the agent with which it can interact but whose responses it cannot directly control. When the agent takes an action, the environment responds by transitioning to a new situation and providing evaluative feedback in the form of a reward signal that indicates how well the agent is progressing toward its goals.

The agent’s perception of its environment at any moment is captured by the notion of state. States can be fully observable, where the agent has complete information about its current situation, or partially observable, where some relevant information remains hidden. For instance, in chess, the current board position is fully observable, while an agent exploring a maze can only observe part of the environment. The information available in the current state determines which actions are possible for the agent to take.

The agent’s interactions with its environment occur through actions, such as moving left or right in a maze or selecting moves in chess. After each action, the environment provides the agent with a reward signal—a scalar value that indicates the immediate desirability of the agent’s choice. This reward signal can be sparse (occurring infrequently) or dense (provided frequently), and may be positive or negative. It is both evaluative and sequential: it indicates the desirability of outcomes rather than prescribing correct actions, and the consequences of actions may only become apparent after multiple steps of interaction. The reward signal is fundamental to RL as it defines what constitutes success for the agent: it allows the latter to learn which actions are beneficial without requiring explicit instruction about optimal strategies.

Another important component of RL is the policy, which represents the agent’s strategy for selecting actions in different situations. More formally, it maps the agent’s “perceived” states (i.e., its observations of the environment) to actions, either deterministically (always choosing the same action in a given state) or probabilistically (selecting actions according to learned probabilities). The policy can be implemented through various methods, from simple lookup tables to sophisticated neural networks that can handle complex state representations.^[40] The fundamental goal of RL is to discover a policy that maximizes the agent’s accumulated rewards over time.

To make good decisions, the agent needs to evaluate not just immediate rewards but also the long-term consequences of its actions. This evaluation is captured by value functions, which estimate the total rewards the agent can expect to accumulate from a given state or state-action pair when following a particular policy. Value functions account for both immediate rewards and anticipated future rewards, with future rewards typically discounted to reflect their uncertainty and temporal distance. For example, when considering a move in chess, the value function helps the agent assess not just the immediate strength of its position but also its prospects for eventual victory. By learning accurate value functions, the agent can make decisions that optimize for long-term success rather than just immediate advantages.

RL has proven remarkably successful across various domains of artificial intelligence, such as game playing and robotic control. For example, RL systems have surpassed human expertise in increasingly complex board games. Earlier RL systems mastered simple games like and backgammon (Tesauro, 1994), while more recent approaches have achieved superhuman performance in chess and Go (Silver et al., 2016, 2018)—the latter being particularly significant given the game’s strategic complexity. This progress has extended to imperfect information games, with systems achieving expert-level performance in poker (Brown et al., 2019) and multiplayer strategy games like StarCraft II (Vinyals et al., 2019). In arcade-style video games, RL agents have learned to play dozens of Atari games at human-level performance or better, using only raw pixel inputs and game scores as feedback (Mnih et al., 2015). In robotics, RL has enabled significant advances in both locomotion and manipulation tasks. Quadrupedal robots have learned to navigate difficult terrain and maintain balance (Lee et al., 2020), while robotic arms have mastered precise manipulation tasks such as in-hand object manipulation (OpenAI et al., 2019).

These achievements suggest that associative learning principles, when implemented in sophisticated computational systems, can give rise to behavior that appears goal-directed and strategic. During its match against Go champion Lee Sedol in March 2016, for example, AlphaGo made a surprising and decisive move (move 37) that no human player would have considered making.^[41] This move has been widely discussed as evidence that RL training allows game-playing agents to come up with original strategic decisions that go beyond mimicking human play patterns. In fact, professional human Go players have since improved their own strategies by studying the decision-making process of RL-based Go-playing programs—including win probability calculations and expected optimal move sequences for different possible moves (Shin et al. 2021).

It should be noted, however, that some game-playing systems like AlphaGo combine neural networks with a traditional search algorithm to explore and evaluate possible move sequences before committing to actions. This hybrid architecture suggests that while RL is important for learning strategic patterns, the addition of explicit forward planning through search may be important for enabling creative problem-solving that goes beyond the training data. As such, these systems’ ability to produce original moves largely results from exploring vast possibility spaces within their specialized Go model rather than from the kind of flexible, generalizable reasoning that allows humans and some animals to creatively solve novel problems through understanding abstract causal principles (Halina 2021).

Pure RL methods also traditionally face several challenges. First, RL agents often require massive amounts of learning episodes to learn good policies. To achieve human-level performance on Atari video games, for example, Mnih et al. (2015) had to train their agent on 50 million frames—the equivalent of 38 days of playing time—for every single game. RL agents also typically need to be trained separately for each specific task, with limited ability to transfer knowledge between different problems; for instance, an agent trained to excel at one Atari game generally cannot perform well on other games without extensive retraining from scratch. Third, RL systems are often limited to relatively simple and constrained environments like games with well-defined rules and objectives, and have more difficulty handling the multidimensional and unstructured nature of real-world tasks. As we will see, recent research has made significant progress to address these challenges with more sophisticated RL methods. For example, RL systems can now achieve human-level performance on many Atari games with less than two hours of play (Schwarzer et al. 2023). Robotics has also made progress in applying RL to real-world tasks by bridging the so-called “sim-to-real” gap, allowing agents trained in simulation to transfer their skills to physical robots (Ju et al. 2022).

10.2 How RL extends classical associationism

RL shares associationism’s fundamental premise that learning occurs through an agent’s interactions with its environment based on its causal history. Just as associationism proposes that mental states become associated through experienced contingencies, RL algorithms learn by forming associations between states, actions, and rewards through repeated environmental interactions. However, RL provides a more precise computational framework for understanding how these associations form and influence behavior. In fact, modern RL algorithms extend associationism in ways that partially address some of the limitations of associationist theories of learning reviewed in section 9.

10.2.1 Prediction and control

Like associationism, RL addresses two fundamental aspects of learning: prediction (learning to anticipate future events) and control (learning appropriate behavioral responses). In associationist theories of learning, these correspond respectively to classical (Pavlovian) conditioning, where organisms learn predictive relationships between stimuli, and instrumental conditioning, where organisms learn to select actions based on their consequences. Reinforcement learning provides precise computational mechanisms that implement and extend both forms of associative learning (Sutton & Barto 2018).

For prediction learning, the RL method known as temporal-difference (TD) learning formalizes how agents learn to anticipate future events based on current stimuli (Sutton, 1988). TD learning allows agents to learn value functions through direct interaction with an environment, without requiring a model of that environment. The key idea is that TD learning updates value estimates based on the difference between temporally successive predictions, rather than waiting for the final outcome. Specifically, TD learning uses the current reward and the estimated value of the next state to update the value estimate of the current state (a process called “bootstrapping”). This means that TD learning can learn online, updating estimates at each time step, rather than having to wait until the end of a learning episode (for related issues see section 10.2.6).

Like classical conditioning, TD learning updates predictions when actual outcomes differ from expected ones. However, TD learning goes beyond simple stimulus-stimulus associations by incorporating mechanisms that can bridge temporal gaps between predictive cues and outcomes. This allows TD learning to account for phenomena like second-order conditioning, where previously conditioned stimuli can themselves act as reinforcers—which eluded simpler associationist models.

For control learning, RL implements the associationist principle that behaviors become associated with situations based on their consequences. However, rather than just forming simple stimulus-response associations, RL agents learn value functions that estimate the long-term cumulative reward expected from different actions in different situations. This provides a more sophisticated mechanism for behavioral control that can account for both habitual responses (through model-free learning of action values) and goal-directed behavior (through model-based planning, see section 10.2.6 below).

10.2.2 Beyond simple contiguity

A central tenet of classical associationism is that temporal contiguity — the close temporal proximity of stimuli or events — is necessary for forming associations. This assumption faced significant empirical challenges, particularly from phenomena like taste aversion learning, where organisms form strong associations despite long delays between stimuli and consequences (see Section 9.4). RL provides several mechanisms that explain how learning can occur without strict temporal contiguity.

TD learning enables learning across temporal gaps by comparing predictions at successive time steps rather than waiting for final outcomes. Unlike classical associationism’s requirement for immediate temporal relationships, TD learning can propagate learning backwards through time by “bootstrapping” from intermediate predictions. This allows the system to bridge temporal gaps that posed problems for traditional associationist theories. Schultz et al. (1997) showed that dopamine neuron activity closely matches TD prediction errors, lending some credibility to TD learning as a biological learning mechanism mediated by dopamine signaling—although this hypothesis remains disputed (Namboodiri 2024).

Eligibility traces provide another mechanism for handling temporal gaps in learning. In classical conditioning, Hull’s notion of “stimulus trace” refers to a short-term memory of a conditioned stimulus that persists in the subject’s mind even after the physical stimulus has ended, allowing learning to occur despite gaps between the conditioned and unconditioned stimuli. In RL, eligibility traces serve as a distinct mechanism that tracks which states or stimuli were recently experienced and are therefore “eligible” for learning updates, without affecting behavioral responses, enabling more efficient learning across temporal delays. Eligibility traces thus create temporally-extended records of past states and actions that can be updated when feedback eventually arrives, acting as a form of temporary memory that allows credit or blame to be assigned to events that occurred significantly earlier in time. This provides an additional computational mechanism for learning problems that are difficult to address under strict contiguity requirements.^[42]

10.2.3 The credit assignment problem

Classical associationism faced what we called the “coextensionality problem” (Section 9.5), also known in AI as the “credit assignment problem” (Minsky 1961): when multiple stimuli are present simultaneously, how does the system determine which ones should become associated with subsequent outcomes? This problem manifests both spatially (which of multiple concurrent stimuli matter) and temporally (which past events caused current outcomes). Modern RL provides computational solutions to tackle both of these credit assignment challenges.

TD learning addresses temporal credit assignment by propagating error signals backwards through time based on differences between successive predictions. When an outcome occurs, the system can update not just recent events but also states and actions from further in the past, weighted by their temporal distance through eligibility traces. This provides a principled mechanism for determining which past events contributed to current outcomes, though only for events within the system’s hypothesis space. TD learning doesn’t inherently solve the feature selection aspect of the coextensionality problem—distinguishing genuinely relevant features from spurious correlations. Modern RL typically addresses this through inductive biases that favor simpler hypotheses, though some recent approaches like causal RL attempt to directly identify genuine causal relationships (Bareinboim et al. 2024).

Value functions in RL help solve the simultaneous credit assignment problem by learning to predict the long-term consequences of different states and actions. Through experience, the system learns which aspects of the current situation are predictive of future outcomes, effectively determining which stimuli deserve credit for results. Insofar as TD learning is biologically plausible, this may help explain blocking effects in classical conditioning — when a stimulus fails to acquire associative strength because another stimulus already predicts the outcome perfectly.

10.2.4 Rapid and gradual learning

Traditional associationist theories of learning imply that associations could only be formed through slow, incremental strengthening through repeated exposure to stimulus pairings, in contrast with evidence that individual learning is often rapid and step-like. Modern RL offers a new perspective on this apparent tension between rapid and gradual learning. Rather than viewing them as competing accounts, some RL methods suggest that rapid learning capabilities can emerge from and depend upon slower learning processes. For example, in meta-reinforcement learning, a “slow” outer loop of learning gradually tunes the parameters of a neural network through extensive experience across many related tasks (Schweighofer & Doya 2003). This slow learning process shapes the network’s dynamics to implement a “fast” inner loop of learning that can rapidly adapt to new situations within a familiar task domain. The fast learning capabilities emerge precisely because the slow outer loop has discovered useful regularities and inductive biases that constrain and guide learning in new situations. This behaviorally similar to how human subjects, after solving many puzzles of a certain type, become increasingly quick at solving new puzzles of the same kind—not because they memorized specific solutions, but because they’ve learned general problem-solving strategies for that domain. The success of meta-RL in modeling flexible behavior challenges the view that associative learning is inherently inflexible and unable to account for rapid adaptation.

Another method that leverages both gradual and fast learning is episodic RL, which draws inspiration from biological episodic memory systems—particularly the hippocampus’s role in memory consolidation through replay (Gershman & Daw 2017). Episodic RL combines traditional RL with an episodic memory system to improve learning efficiency and performance. It allows the agent to store past experiences as discrete episodes, typically represented as sets containing the state, action taken, reward received, and resulting next state. When the agent encounters a new situation, it can draw on past experiences to compute the value of possible actions based on the recorded action values for similar states. While the system can immediately leverage memories to inform decisions in new situations, the effectiveness of this process depends on having gradually learned appropriate representations that make meaningful similarity comparisons possible.^[43] The rapid deployment of episodic memories thus builds upon slower processes that shape how experiences are encoded and compared.

Episodic RL is typically combined with experience replay, which allows the agent to sample and replay past experiences during training to break the correlation between consecutive training samples and enables the agent to learn more efficiently. This method was instrumental in training RL agents that match human-level performance at Atari games (Mnih et al. 2013, 2015). While basic experience replay samples past experiences at random, more advanced methods prioritize which experiences to replay based on their potential learning value (Schaul et al., 2016).

It has been suggested that meta-RL and episodic RL with experience replay could help explain how organisms can exhibit both rapid, one-shot learning in familiar domains while still requiring extensive experience to master entirely novel types of tasks (Botvinick et al. 2019).^[44] Both methods suggest that associative learning can in principle operate simultaneously across multiple timescales, with slower processes laying the groundwork for faster forms of learning. Meta-RL in particular suggests that associative learning principles may play an important role not just in forming specific associations, but in shaping how organisms learn to learn (Sandbrink & Summerfield 2024). When organisms show increasingly rapid learning of new problems within a domain, this may reflect the gradual tuning of learning mechanisms themselves through meta-learning processes, rather than simple stimulus-response associations.

10.2.5 Content specificity

Connectionist models learn from specific input-output mappings in their training data through associative mechanisms. As such, they implement content-specific computations: computations that are faithful to content only because of the specific contents represented at input and output (Shea 2023). For example, a neural network trained to classify images might learn to map certain patterns of edges and textures to the label “dog”, but this tells us nothing about how it should classify images of cats or trees. Likewise, purely associative transitions in psychology are content-specific (Quilty-Dunn & Mandelbaum 2019). By contrast, a non-content-specific computation is a computational process that operates in the same way regardless of the particular content of the representations it takes as input. For example, rules of logical inference work the same way regardless of the specific concepts involved; as such, inferential transitions are non-content-specific.

The ability to perform non-content-specific computations allows for more flexible and generalizable processing, and is traditionally taken to elude connectionist models, including most RL systems. However, Shea (2023) argues that episodic RL systems implement non-content-specific computations. When an episodic RL system encounters a new state, it computes the similarity between that state and all previously stored episodes using the same algorithm, regardless of what specific states are being compared. This is a departure from classical associationism’s reliance on content-specific transitions, where the relationship between two states depends entirely on their specific contents and learning history.

This feature of episodic RL systems explains why they learn more flexibly and efficiently. They can adapt more quickly to new situations and avoid problems like catastrophic forgetting—where newly learnt associations overwrite past learning episodes—that can plague simpler neural network architectures relying exclusively on content-specific transitions. It should be noted, however, that episodic RL still relies on similarity-based computations (using a similarity metric to compare vector-based representations), rather than inferential transitions that are sensitive to the constituent structure of representations. While the representation of past experiences in episodic RL may have some compositional structure, it normally lacks the kind of discrete constituent structure often taken to underlie more regimented mental transitions such as logical inference.^[45]

10.2.6 Model-based RL

Another important distinction relevant to the reappraisal of associationism is that of model-free and model-based RL. In model-free RL, the agent learns directly from experience without building an explicit model of its environment. This is the typical RL setup we described, in which the agent learns a policy through trial-and-error and updates its estimates based on observed rewards and state transitions. By contrast, model-based RL involves learning an explicit model of the environment, including the transition probabilities between states and the reward function. The agent can then use this model to plan and make decisions (Daw et al., 2005).

The distinction between model-free and model-based RL reflects a fundamental trade-off between tractability and efficiency. Model-free RL is computationally inexpensive, but it is not very sample-efficient or flexible as agents typically need very large amounts of interactions to learn optimal policies. Model-based methods are more sample-efficient and flexible, as the agent can use its model to simulate experiences and plan ahead without actually taking actions in the environment. However, they may struggle if the learned model is inaccurate or if the environment is too complex to model effectively. There is converging evidence that humans make use of both model-free and model-based RL to balance these computational trade-offs (Lake et al. 2017; Botvinick et al. 2019). On this view, model-based planning can take over model-free learning to enable flexible adaptation to novel tasks, although with enough training certain skills acquired through model-based RL can become “habituated” as model-free routines to alleviate computational resources.

Model-based RL goes beyond associative chaining by leveraging internal structured knowledge of the environment that encodes relationships between states, actions, and outcomes to plan ahead. Some model-based RL systems in AI have a hybrid architecture where the model is built-in rather than learned by a neural network. AlphaGo, for example, combines two neural network components—a “policy network” that selects moves and a “value network” that evaluates board positions—with Monte Carlo tree search (MCTS)—a traditional search algorithm which uses the policy network to focus the search on promising moves (Silver et al., 2016). In this system, the model of the rules of Go are encoded as handcrafted features. By contrast, some model-based RL systems learn a model of the environment with a neural network. For example, Kaiser et al. (2024) achieved excellent sample-efficiency on Atari games by training a model-based RL system with a “world model”, consisting of a neural network that learns to predict future frames of the game and expected rewards given past frames and possible actions. This “world model” can then be used to simulate the game environment and allow the agent to learn optimal policies much more quickly.

Among computational innovations introduced by modern RL, model-based methods are probably those that most clearly strain the bare notion of association inherited from classical associationism. On the one hand, model-based RL systems like Kaiser et al. (2024)’s Atari-playing neural network do fundamentally learn from associations between actions, observations, and rewards. On the other, it might be misleading to describe the resulting “world model” as containing unstructured pairings of representations. A fortiori, hybrid RL systems that rely on built-in rules like AlphaGo contain plenty of explicit structure. While the algorithmic innovations and behavioral success of RL do address some of the core limitations of associationist theories of learning, they also abandon the latter’s original commitment to simplicity.

Bibliography

Anderson, J., K. Spoehr, and D. Bennett, 1994, “A Study in Numerical Perversity: Teaching Arithmetic to a Neural Network”, in Neural Networks for Knowledge Representation and Inference, D. Levine and M. Aparicio IV (eds.), East Sussex: Psychology Press, pp. 311–335.
Armstrong, K., S. Kose, L. Williams, A. Woolard, and S. Heckers, 2012, “Impaired Associative Inference in Patients with Schizophrenia”, Schizophrenia Bulletin, 38(3): 622–629.
Asch, S., 1962, “A Problem in the Theory of Associations”, Psychologische Beitrage, (6): 553–563.
–––, 1969, “A Reformulation of the Problem of Association”, American Psychologist, 24(2): 92–102.
Aydede, M., 1997, “Language of Thought: The Connectionist Contribution”, Minds and Machines, 7(1): 57–101.
Baeyens, F., P. Eelen, O. Van den Bergh, and G. Crombez, 1990, “Flavor-Flavor and Color-Flavor Conditioning in Humans”, Learning and Motivation, 21(4): 434–455.
Baeyens,F., P. Eelen, and G. Crombez, 1995, “Pavlovian Associations are Forever: On Classical Conditioning and Extinction”, Journal of Psychophysiology, 9(2): 127–141.
Bain, A., 1855, The Senses and The Intellect, London: John W. Parker and Son.
Bar-Anan Y., B. Nosek, and M. Vianello, 2009, “The Sorting Paired Features Task: A Measure of Association Strengths”, Experimental Psychology, 56(5): 329–343.
Bareinboim, E., J. Zhang, and S. Lee, 2024, “An Introduction to Causal Reinforcement Learning”, Technical Report R-65. CausalAI Lab, New York: Columbia University.
Bates, E. and B. MacWhinney, 1987, “Competition, Variation, and Language Learning”, in B. MacWhinney (ed.), Mechanisms of Language Acquisition, Hillsdale, N.J.: Lawrence Erlbaum Associates, pp. 157–193.
Bendana, J. and E. Mandelbaum, forthcoming, “The Fragmentation of Belief”, in D. Kindermann, C. Borgoni, and A. Onofri (eds.), The Fragmented Mind, Oxford: Oxford University Press.
Berger, J., 2020, “Implicit attitudes and awareness”, Synthese, 197(3): 1291–1312.
Bernstein, I. and M. Webster, 1980, “Learned Taste Aversions in Humans”, Physiology and Behavior, 25(3): 363–366.
Bernstein, I., 1985, “Learned Food Aversions in the Progression of Cancer and its Treatment”, in N. Braveman and P. Bronstein, (eds.), Experimental Assessments and Clinical Applications of Conditioned Food Aversions, New York: New York Academy of Sciences, pp. 365–80.
Binz, Marcel, Ishita Dasgupta, Akshay K. Jagadish, Matthew Botvinick, Jane X. Wang, and Eric Schulz, 2024, “Meta-Learned Models of Cognition”, Behavioral and Brain Sciences, 47 (January):e147.
Botvinick, Matthew, Sam Ritter, Jane X. Wang, Zeb Kurth-Nelson, Charles Blundell, and Demis Hassabis, 2019, “Reinforcement Learning, Fast and Slow”, Trends in Cognitive Sciences, 23 (5): 408–22.
Black, W. and W. Prokasy (eds.), 1972, Classical Conditioning II: Current Research and Theory, New York: Appleton-Century-Crofts.
Bloom, P., 2000, How Children Learn the Meanings of Words, Cambridge, MA: MIT Press.
Bouton, M., 2002, “Context, Ambiguity, and Unlearning: Sources of Relapse after Behavioral Extinction”, Biological Psychiatry, 52(10): 976–986.
–––, 2004, “Context and Behavioral Processes in Extinction”, Learning and Memory, 11(5): 485–494.
Brett, L., W. Hankins, and J. Garcia, 1976, “Prey-Lithium Aversions. III: Buteo hawks”, Behavioral Biology, 17(1): 87–98.
Brown, Noam, and Tuomas Sandholm. 2019. “Superhuman AI for Multiplayer Poker”, Science, 365 (6456): 885–90.
Buckner, C., 2023, From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence, Oxford: Oxford University Press.
Camp, L., 2007, “Thinking with Maps”, Philosophical Perspectives, 21(1): 145–182.
Carey, S., 1978a, “Less May Never Mean More”, in R. Campbell and P. Smith, (eds.), Recent Advances in the Psychology of Language, New York: Plenum Press, p. 109–132.
–––, 1978b, “The Child as Word Learner”, in J. Bresnan, G. Miller, and M. Halle, (eds.), Linguistic Theory and Psychological Reality, Cambridge, MA: MIT Press, pp. 264–293.
–––, 2010, “Beyond Fast Mapping”, Language Learning and Development, 6(3): 184–205.
Carey, S. and E. Bartlett, 1978, “Acquiring a Single New Word”, Proceedings of the Stanford Child Language Conference, 15: 17–29.
Caselli, M.C., E. Bates, P. Casadio, J. Fenson, L. Fenson, L. Sanderl, and J. Weir, 1995, “A Cross-linguistic Study of Early Lexical Development”, Cognitive Development, 10(2): 159–199.
Chaiken, S. and Y. Trope (eds.), 1999, Dual-Process Theories in Social Psychology, New York: Guilford Press.
Chalmers, D., 1993, “Connectionism and Compositionality: Why Fodor and Pylyshyn Were Wrong”, Philosophical Psychology, 6(3): 305–319.
Chater, N., 2009, “Rational Models of Conditioning”, Behavioral and Brain Sciences, 32(2): 204–205.
–––, J. Tenenbaum, and A. Yuille, 2006, “Probabilistic Models of Cognition: Conceptual Foundations”, Trends in Cognitive Sciences, 10(7): 287–291.
Chomsky, N., 1959, “A Review of B.F. Skinner’s Verbal Behavior”, Language, 35(1): 26–58.
Churchland, P., 1986, “Some Reductive Strategies in Cognitive Neurobiology”, Mind, 95(379): 279–309.
–––, 1989, A Neurocomputational Perspective: The Nature of Mind and the Structure of Science, Cambridge, MA: MIT.
Churchland, P. and T. Sejnowski, 1990, “Neural Representation and Neural Computation”, Philosophical Perspectives, 4: 343–382.
Collins, A. and E. Loftus, 1975, “A Spreading-Activation Theory of Semantic Processing”, Psychological Review, 82(6): 407–428.
Danks D., 2013, “Moving from Levels and Reduction to Dimensions and Constraints”, Proceedings of the 35th Annual Conference of the Cognitive Science Society, 35: 2124–2129.
De Houwer, J., 2009, “The Propositional Approach to Associative Learning as an Alternative for Association Formation Models”, Learning & Behavior, 37(1): 1–20.
–––, 2011, “Evaluative Conditioning: A Review of Procedure Knowledge and Mental Process Theories”, in T. Schachtman and S. Reilly (eds.), Associative Learning and Conditioning Theory: Human and Non-Human Applications, New York: Oxford University Press, pp. 399–416.
–––, 2014, “A Propositional of Implicit Evaluation”, Social and Personality Psychology Compass, 8(7): 342–353.
–––, 2018, “Propositional Models of Evaluative Conditioning”, Social Psychological Bulletin, 13(2): 1–21.
–––, 2019, “Moving Beyond System 1 and System 2: Conditioning, Implicit Evaluation, and Habitual Responding Might Be Mediated by Relational Knowledge”, Experimental Psychology, 66(4): 257–265.
De Houwer, J., S. Thomas, and F. Baeyens, 2001, “Association Learning of Likes and Dislikes: A Review of 25 years of Research on Human Evaluative Conditioning”, Psychological Bulletin, 127(6): 853–869.
Dehaene, S., 2011, The Number Sense: How the Mind Creates Mathematics, Oxford: Oxford University Press.
Demeter, T., 2021, “Fodor’s guide to the Humean mind”, Synthese, 199(1), 5355–5375. doi:10.1007/s11229-021-03028-4
Diaz, E., G. Ruis, and F. Baeyens, 2005, “Resistance to Extinction of Human Evaluative Conditioning Using a Between-Subjects Design”, Cognition and Emotion, 19(2): 245–268.
Dickinson, A., D. Shanks, and J. Evenden, 1984, “Judgment of Act-Outcome Contingency: The role of Selective Attribution”, The Quarterly Journal of Experimental Psychology, 36(1): 29–50.
Dirikx, T., D. Hermans, D. Vansteenwegen, F. Baeyens, and P. Eelen, 2004, “Reinstatement of Extinguished Conditioned Responses and Negative Stimulus Valence as a Pathway to Return of Fear in Humans”, Learning and Memory, 11: 549–54.
Elman, J., 1991, “Distributed Representations, Simple Recurrent Networks, and Grammatical Structure”, Machine learning, 7(2–3): 195–225.
Elman, J., E. Bates, M. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plunkett, 1996, Rethinking Innateness: A Connectionist Perspective on Development, Cambridge, MA: MIT Press.
Evans, G., 1982, The Varieties of Reference, J. McDowell (ed.), Oxford: Clarendon Press.
Evans, J., and K. Frankish (eds.), 2009, In Two Minds: Dual Processes and Beyond, Oxford: Oxford University Press.
–––, and K. Stanovich, 2013, “Dual-Process Theories of Higher Cognition: Advancing the Debate,” Perspectives on Psychological Science, 8(3): 223–241.
Fazio, R., 2007, “Attitudes as Object-Evaluation Associations of Varying Strength”, Social Cognition, 25(5): 603–637.
Festinger, L. and J. Carlsmith, 1959, “Cognitive Consequences of Forced Compliance”, The Journal of Abnormal and Social Psychology, 58(2): 203–210.
Field, A. and G. Davey, 1999, “Reevaluating Evaluative Conditioning: A Nonassociative Explanation of Conditioning Effects in the Visual Evaluative Conditioning Paradigm”, Journal of Experimental Psychology: Animal Behavior Processes, 25(2): 211–224.
Fodor, J., 1983, The Modularity of Mind, Cambridge, MA: MIT Press.
–––, 2001, The Mind Doesn’t Work that Way, Cambridge, MA: MIT Press.
–––, 2003, Hume Variations, Oxford: Clarendon Press.
Fodor, J., and B. McLaughlin, 1990, “Connectionism and the Problem of Systematicity: Why Smolensky’s Solution Doesn’t Work”, Cognition, 35(2): 183–204.
Fodor, J., and Z. Pylyshyn, 1988, “Connectionism and Cognitive Architecture: A Critical Analysis”, Cognition, 28(1–2): 3–71.
Frankish, K., 2009, “Systems and Levels: Dual-System Theories and the Personal-Subpersonal Distinction”, in Evans and Frankish 2009: pp.89–107.
Gagliano, M., V. Vyazovsky, A. Borbely, M. Grimonprez, and M. Depczynski, 2016, “Learning by Association in Plants”, Scientific Reports, 6(38427): 1–8.
Gallistel, C., S. Fairhurst, and P. Balsam, 2004, “The Learning Curve: Implications of a Quantitative Analysis”, Proceedings of the National Academy of Sciences of the United States of America, 101(36): 13124–13131.
Gallistel, C., and A. King, 2009, Memory and the Computational Brain: Why Cognitive Science Will Transform Neuroscience, West Sussex: Wiley Blackwell.
Garcia, J., 1981, “Tilting at the Paper Mills of Academe”, American Psychologist, 36(2): 149–158.
Garcia, J., R. Kovner, and K. Green, 1970, “Cue Properties vs Palatability of Flavors in Avoidance Learning”, Psychonomic Science, 20(5): 313–314.
Garcia, J., B. McGowan, and K. Green, 1972, “Biological Constraints on Conditioning II”, in Black and Prokasy 1972: pp.3–27.
Garcia, J., W. Hankins, and K. Rusiniak, 1974, “Behavioral Regulation of the Milieu Interne in Man and Rat”, Science, 185(4154): 824–831.
Garcia, J., R.A. Koelling, 1966, “Relationship of cue to consequence in avoidance learning”, Psychonomic Science, 4: 123–124.
Gendler, T., 2008, “Alief and Belief”, Journal of Philosophy, 105(10): 634–63.
Gleitman, L., K. Cassidy, R. Nappa, A. Papafragou, and J. Trueswell, 2005, “Hard Words”, Language Learning and Development, 1(1): 23–64.
Glosser, G. and R. Friedman, 1991, “Lexical but not Semantic Priming in Alzheimer’s Disease”, Psychology and Aging, 6(4): 522–27.
Goldin-Meadow, S., M. Seligman, and S. Gelman, 1976, “Language in the Two-Year Old”, Cognition, 4(2): 189–202.
Greenwald, A., D. McGhee, and J. Schwartz, 1998, “Measuring Individual Differences in Implicit Cognition: The Implicit Association Test”, Journal of Personality and Social Psychology, 74(6): 1464–1480.
Hahn, A., C. Judd, H. Hirsch, and I. Blair, 2014, “Awareness of Implicit Attitudes”, Journal of Experimental Psychology: General, 143(3): 1369–1392.
Heyes, C., 2012, “Simple Minds: A Qualified Defence of Associative Learning”, Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1603): 2695–2703.
Hughes, S., Y. Ye, P. Van Dessel, and J. De Houwer, 2019, “When people co occur with good or bad events: Graded effects of relational qualifiers on evaluative conditioning”, Personality and Social Psychology Bulletin, 45(2): 196–208.
Hull, C., 1943, Principles of Behavior, New York: Appleton-Century-Crofts.
Hume, D., 1738, A Treatise of Human Nature, L.A. Selby-Bigge (ed.), 2^nd ed., revised by P.H. Nidditch, Oxford: Clarendon Press, 1975.
James, W., 1890, The Principles of Psychology (Vol. 1), New York: Holt.
Johnson, K., 2004, “On the Systematicity of Language and Thought”, Journal of Philosophy, 101(3): 111–139.
Kahneman, D., 2011, Thinking, Fast and Slow, New York: Farrar, Straus and Giroux.
Kamin, L., 1969, “Predictability, Surprise, Attention, and Conditioning”, in B. Campbell and R. Church (eds.), Punishment and Aversive Behavior, New York: Appleton-Century-Crofts, pp. 279–296.
Kant, I., 1781/1787, Critique of Pure Reason, in P. Guyer and A. Wood (eds.), Critique of Pure Reason, New York: Cambridge University Press.
Karmiloff-Smith, A., 1995, Beyond Modularity: A Developmental Perspective on Cognitive Science, Cambridge, MA: MIT Press/Bradford Books.
Kruglanski, A., 2013, “Only One? The Default Interventionist Perspective as a Unimodel—Commentary on Evans & Stanovich”, Perspectives on Psychological Science, 8(3): 242–247.
Kurdi, B., and M. Banaji, 2017, “Repeated evaluative pairings and evaluative statements: How effectively do they shift implicit attitudes?”, Journal of Experimental Psychology: General, 146(2): 194–213.
–––, 2019, “Attitude change via repeated evaluative pairings versus evaluative statements: Shared and unique features”, Journal of Personality and Social Psychology, 116(5): 681–703.
Locke, J., 1690, An Essay Concerning Human Understanding, in Peter H. Nidditch (ed.), An Essay Concerning Human Understanding, Oxford: Clarendon Press, 1975,
Logue, A., I. Ophir, and K. Strauss, 1981, “The Acquisition of Taste Aversion in Humans”, Behavioral Research and Therapy, 19(4): 319–33.
Luka, B., and L. Barsalou, 2005, “Structural facilitation: Mere exposure effects for grammatical acceptability as evidence for syntactic priming in comprehension”, Journal of Memory and Language, 52: 444–467.
Lycan, W, 1990, “The Continuity of the Levels of Nature”, in W. Lycan (ed.), Mind and Cognition: A Reader, Cambridge: Basil Blackwell, pp. 77–96.
Mandelbaum, E., 2013a, “Against Alief”, Philosophical Studies, 165(1): 197–211.
–––, 2013b, “Numerical Architecture”, Topics in Cognitive Science, 5(2): 367–386.
–––, 2016, “Attitude, Inference, Association: On the Propositional Structure of Implicit Attitudes”, Nous, 50(3): 629–658.
–––, 2017, “Seeing and Conceptualizing: Modularity and the Shallow Contents of Vision”, Philosophy and Phenomenological Research, 97(2): 267–283.
–––, 2019, “Troubles with Bayesianism: An Introduction to the Psychological Immune System”, Mind & Language, 34(2): 141–157.
Mann, T., and M. Ferguson, 2015, “Can we undo our first impressions? The role of reinterpretation in reversing implicit evaluations”, Journal of Social and Personality Psychology, 108(6): 823–849.
–––, 2017, “Reversing implicit first impressions through reinterpretation after a two-day delay”, Journal of Experimental Social Psychology, 68: 122–127.
Mann, T., B. Kurdi, and M. Banaji, 2019, ” How effectively can implicit evaluations be updated? Using evaluative statements after aversive repeated evaluative pairings”, Journal of Experimental Psychology: General, doi:10.1037/xge0000701.
Markman, E., 1989, Categorization and Naming in Children: Problems of Induction, Cambridge, MA: MIT Press.
Markson, L. and P. Bloom, 1997, “Evidence Against a Dedicated System for Word Learning in Children”, Nature, 385(6619): 813–815.
Marr, D., 1982, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, NY: W.H. Freeman and Co.
Mason, M. and M. Bar, 2012, “The Effect of Mental Progression on Mood”, Journal of Experimental Psychology: General, 141(2): 217–221. doi:10.1037/a0025035
McClelland, J., M. Botvinick, D. Noelle, D. Plaut, T. Rogers, M. Seidenberg, and L. Smith, 2010, “Letting Structure Emerge: Connectionist and Dynamic Systems Approaches to Cognition”, Trends in Cognitive Sciences, 14(8): 348–356.
Minsky, M., 1963, “Steps toward Artificial Intelligence”, in E. Feigenbaum and J. Feldman (eds.), Computers And Thought, New York, NY: McGraw-Hill, pp. 406–450.
Mitchell, C., J. De Houwer, and P. Lovibond, 2009, “The Propositional Nature of Human Associative Learning”, Behavioral and Brain Sciences, 32(2): 183–246.
Mnih, V., K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, 2013, “Playing Atari with Deep Reinforcement Learning”, Neural Information Processing Systems 2013, Deep Learning Workshop. [Mnih et al. 2013 available online]
Mnih, V., K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, 2015, “Human-Level Control Through Deep Reinforcement Learning”, Nature, 518(7540): 529–533.
Nosek, B. and M. Banaji, 2001, “The Go/No-Go Association Task”, Social Cognition, 19(6): 625–66.
Osman, M., 2013, “A Case Study Dual-Process Theories of Higher Cognition—Commentary on Evans & Stanovich”, Perspectives on Psychological Science, 8(3): 248–252.
Pavlov, I., 1906, “The Scientific Investigation of the Psychical Faculties or Processes in the Higher Animals”, Science, 24(620): 613–619.
–––, 1927, Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex, Oxford: Oxford University Press.
Payne, B., Cheng, C., Govorun, O., and Stewart, B., 2005, “An Inkblot for Attitudes: Affect Misattribution as Implicit Measurement”, Journal of Personality and Social Psychology, 89(3): 277–293.
Perea, M. and E. Rosa, 2002, “The Effects of Associative and Semantic Priming in the Lexical Decision Task”, Psychological Research, 66(3): 180–194.
Prinz, J., 2002, Furnishing the Mind: Concepts and their Perceptual Basis, Cambridge, MA: MIT Press.
–––, and A. Clark, 2004, “Putting Concepts to Work: Some Thoughts for the 21st Century”, Mind & Language, 19(1): 57–69.
Quilty-Dunn, J., 2020, “Perceptual Pluralism”, Nous, 1–41.
Quilty-Dunn, J. and E. Mandelbaum, 2018, “Inferential Transitions”, Australasian Journal of Philosophy, 96(3): 532–547.
–––, 2019, “Non-Inferential Transitions: Imagery and Association”, in T. Chan and A. Nes (eds.), Inference and Consciousness, New York: Routledge, pp. 151–171.
Rescorla, R., 1968, “Probability of Shock in the Presence and Absence of CS in Fear Conditioning”, Journal of Comparative and Physiological Psychology, 66(1): 1–5.
–––, 1988, “Pavlovian Conditioning: It’s Not What You Think It Is”, American Psychologist, 43(3): 151–160.
Rescorla, E., and A. Wagner, 1972, “A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Nonreinforcement”, in Black and Prokasy 1972, pp. 64–99.
Roll, D. and J. Smith, 1972, “Conditioned Taste Aversion in Anesthetized Rats”, in M. Hager and J. Seligman (eds.), Biological Boundaries of Learning. New York: Appleton-Century-Crofts, pp. 98–102.
Rozin, P., 1986, “One-Trial Acquired Likes and Dislikes in Humans: Disgust as a US, Food Predominance, and Negative Learning Predominance”, Learning and Motivation, 17(2): 180–189.
Rumelhart, D., P. Smolensky, J. McClelland, and G. Hinton, 1986, “Sequential Thought Processes in PDP Models”, in J.McClelland and D. Rumelhart (eds.), Parallel Distributed Processing Vol. 2: Explorations in the Microstructure of Cognition: Psychological and Biological Models, Cambridge, MA: MIT Press, pp. 7–57.
Rusiniak, K., W. Hankins, J. Garcia, and L. Brett, 1979, “Flavor-illness Aversions: Potentiation of Odor by Taste in Rats”, Behavioral and Neural Biology, 25(1): 1–17.
Rydell, R. and A. McConnell, 2006, “Understanding Implicit and Explicit Attitude Change: A Systems of Reasoning Analysis”, Journal of Personality and Social Psychology, 91(6): 995–1008.
Sandhoffer, C., L. Smith, and J. Luo, 2000, “Counting Nouns and Verbs in the Input: Differential Frequencies, Different Kinds of Learning?”, Journal of Child Language, 27(3): 561–585.
Seligman, M., 1970, “On the Generality of the Laws of Learning”, Psychological Review, 77(5): 406–418.
Shanks, D., 2010, “Learning: From Association to Cognition”, Annual Review of Psychology, 1, 273–301.
Skinner, B., 1938, The Behavior of Organisms: An Experimental Analysis, Oxford: Appleton-Century.
–––, 1953, Science and Human Behavior, New York: Simon and Schuster.
Sloman, S., 1996, “The Empirical Case for Two Systems of Reasoning”, Psychological Bulletin, 119(1): 3–22.
Smith, E. R. and J. DeCoster, 2000, “Dual-Process Models in Social and Cognitive Psychology: Conceptual Integration and Links to Underlying Memory Systems”, Personality and Social Psychology Review, 4(2): 108–131.
Smith, J. and D. Roll, 1967, “Trace Conditioning with X-rays as an Aversive Stimulus”, Psychonomic Science, 9(1): 11–12.
Smolensky, P., 1988, “On the Proper Treatment of Connectionism”, Behavioral and Bruin Sciences, 11(1): l–23.
Snedeker, J. and L. Gleitman, 2004, “Why it is Hard to Label Our Concepts”, in D. Hall and S. Waxman (eds.), Weaving a Lexicon, Cambridge, MA: MIT Press, pp. 257–294.
Stanovich, K., 2011, Rationality and the Reflective Mind, New York: Oxford University Press.
Tenenbaum, J., C. Kemp, T. Griffiths, and N. Goodman, 2011, “How to Grow a Mind: Statistics, Structure, and Abstraction”, Science, 331(6022): 1279–1285.
Thorndike, E., 1911, Animal intelligence: Experimental studies, New York: Macmillan.
Todrank, J., D. Byrnes, A. Wrzesniewski, and P. Rozin, 1995, “Odors can Change Preferences for People in Photographs: A Cross-Modal Evaluative Conditioning Study with Olfactory USs and Visual CSs”, Learning and Motivation, 26(2): 116–140.
Tolman, E., 1948, “Cognitive Maps in Rats and Men”, Psychological Review, 55(4): 189–208.
Van Dessel, P., Y. Ye, and J. De Houwer 2019, “Chaning deep-rooted implicit evaluation in the blink of an eye: negative verbal information shifts automatic liking of Gandhi”, Social Psychological and Personality Science, 10(2): 266–273.
Van Gelder, T., 1995, “What Might Cognition Be, If not Computation?”, The Journal of Philosophy, 91(7): 345–381.
Vansteenwegen, D., G. Francken, B. Vervliet, A. De Clercq, and P. Eelen, 2006, “Resistance to Extinction in Evaluative Conditioning”, Journal of Experimental Psychology: Animal Behavior Processes, 32(1): 71–79.
Wilson, T., S. Lindsey, and T. Schooler, 2000, “A Model of Dual Attitudes”, Psychological Review, 107(1): 101–26.

Academic Tools

How to cite this entry.

Preview the PDF version of this entry at the Friends of the SEP Society.

Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).

Enhanced bibliography for this entry at PhilPapers, with links to its database.

Other Internet Resources

Acknowledgments

Helpful feedback was received from Michael Brownstein, Cameron Buckner, Bryce Huebner, Zoe Jenkin, Xander Macswan, Griffin Pion, Jake Quilty-Dunn, Shaun Nichols, Soren Schlassa, and Susanna Siegel who are hereby thanked for their efforts.

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free

	How to cite this entry.
	Preview the PDF version of this entry at the Friends of the SEP Society.
	Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO).
	Enhanced bibliography for this entry at PhilPapers, with links to its database.