Connectionism and Representation
Connectionism and Representation
Connectionism and Representation
Tuesday, 29 May 2007
Are connectionist networks capable of representing the structure of thought?
I. Introduction
Connectionist, or parallel distributed processing (PDP), networks are typically very good at pattern recognition and responding to previously encountered stimuli. However, it is less clear that they are suited to the kind of symbol manipulation that is often taken — although not known a priori (pace Davies 1991) — to be a prerequisite for abstract thought and language. According to Fodor (1975), such abilities are only explicable in terms of a computational architecture that is based upon a symbolic ‘language of thought’ (ibid.) whose structure is substantially similar, if not identical, to that of natural and formal languages. Furthermore, Fodor (1988: 8) claims that the intimate connection between syntax (i.e. formal structure) and semantics (meaning) exhibited by such languages is essential to explaining the causal structure and systematicity of human thought. This commits Fodor to a view of consciousness in which beliefs, desires, knowledge, etc. are not just convenient approximations, or ways of talking about underlying neurological processes, but literal descriptions of the causal structure of the human mind. A vivid illustration of this can be found in the following quote, in which the fictional Sherlock Holmes recounts the means by which he arrived at his conclusion:
I instantly reconsidered my position … [this] gave rise to the suspicion that the rope was there as a bridge for something passing through the hole … The idea of a snake instantly occurred to me, and when I coupled it with my knowledge that the Doctor was furnished with a supply of the creatures from India I felt that I was probably on the right track.
(Fodor 1987: 13–14)
For Fodor, Holmes isn’t simply reconstructing his thought processes, but giving a literally accurate account in which the terms ‘suspicion’, ‘idea’, ‘knowledge’, and the feeling ‘that [he] was probably on the right track’ refer to actual states or processes that took place within his conscious mind as the relevant events took place.
Whilst a full consideration of Fodor’s systematicity argument lies outside the scope of this essay, I will challenge his claim that the language of thought is ‘the only game in town’ on the grounds that connectionist networks are also capable of representing the kind of structures necessary for thought and language. I will begin by examining the nature of the representations contained within such networks, and whether they exhibit the features that Fodor supposes are required for linguistic ability, such as compositionality and context independence. I will then examine some strengths and weaknesses of the connectionist paradigm, including the use of complex network architectures. Finally, I will conclude that connectionist networks are capable of representing the structure of thought and language, but are currently at too early a stage of development to carry out advanced cognitive processes, such as abstract reasoning. There is, however, no philosophical reason why such capabilities could not be achieved without needing to implement the sort of ‘language of thought’ that Fodor et al. describe.
II. Local and Distributed Representation
All representation must necessarily be of something. In the case of mental representations, the relationship is one of intentionality, or ‘aboutness’, and causal connection. That is, the representation must be caused — directly or indirectly — by what it represents; i.e. its content. Additionally, a representation may also resemble its content in terms of its structure or appearance. This may not be immediately obvious since the relevant information may be encoded in any number of different ways — spatially, temporally, and so on. However, as Putnam’s (1981: 1–5) Winston Churchill-drawing ant demonstrates, mere resemblance does not suffice for representation. Similarly, whilst resemblance in the form of isomorphic co-variance may be characteristic of certain classes of representation — visual images, for example — it is not necessary that a representation should resemble its content. A word, for example, may represent a complex proposition whose structure is in no way ‘contained within’ its form. Rather, it is the fact that it has content, i.e. that it is used as a symbol, that makes it representational. When it comes to representing complex statements, propositions or thoughts, however, it seems reasonable to expect the form of representation to possess a complex internal structure like that of sentences, which are built out of simpler component parts, such as words and phrases. The classical symbol processing view of the mind, hereafter abbreviated as CSP, takes such forms of composition to be essential to human cognition. Furthermore, Fodor claims that this is only explicable through CSP, since it alone is based upon a system of interrelated symbols that stand in structural relationships to one another, thus building up more complex representational forms.
Since internal structure and causal connection are taken to be necessary for representing thoughts, I will first examine whether connectionist networks exhibit either of these characteristics. The simplest kind of ‘first generation’ (Clark 2001: 68) networks contain arrays of ‘input’ and ‘output’ units — essentially very simple signal processors — that are connected together in such a way that the outputs of one set of units feed the inputs of other units. By applying a mathematical function to the weighted sum of its inputs, each unit controls the activation of processing units further along in the network, which may be ‘trained’ to recognise salient features of some external stimuli by repeated exposure to a suitable training corpus (ibid. 69). Each pattern of ‘input’ activations registers a distinct pattern of activation in the ‘output’ units, with optional layers of ‘hidden’ units contributing intermediate processing capabilities. By a suitable method of ‘back propagation’ (Clark 1990: 210), connectionist networks can be trained to detect features of an input signal that are not apparent from its surface structure; e.g. radar echoes indicating the presence of an underwater mine (Churchland op. cit. 157–162). Simple first generation networks tend to exhibit a strong degree of correlation between the presence of such features and the activation of individual units within the network, resulting in a ‘localized representation scheme’ (Rowlands 1994: 486). This is comparable to the way in which CSP represents symbols as distinct and identifiable tokens within the computational system (although there are also important differences, as described below).
Localised activation, however, is far from typical in connectionist networks, especially in more complex network architectures, which tend to distribute information across multiple processing units. This results in ‘locally distributed representations’ (ibid.), which involve a number of different physical units in the recognition of each input feature. Furthermore, in ‘fully distributed’ (ibid.) schemes, the ‘representational’ aspects of the network are spread out across its entire structure, rather than being localised to any single unit or group of units. Since individual units are no longer dedicated to processing particular stimuli, but instead participate in many diverse and physically overlapping sub-networks, it is no longer possible to simply ‘read off’ the network’s representational content by inspecting its fine-grained structure. This poses a problem for the claim that connectionist networks are representational since there is no longer any clear separation between individual representations, or between representations and the underlying computational architecture (the traditional ‘software–hardware’ distinction). Such ‘representations’ are, at best, functional states of the whole network or, at worst, a confused amalgamation of interacting and overlapping elements with no clearly defined representational content.
In analysing connectionist networks, cognitive scientists typically differentiate between their logical and physical structure. Although ‘fully distributed’ representations employ different physical parts of a network to encode what a symbol-based system might represent with a single localised token, the network can also be described as having a particular logical structure. This is comprised by the hidden structures and correlations between different regions of the network, and can be only retrieved by carrying out a detailed mathematical analysis, such as principle component analysis (PCA). This identifies not only significant clusters of network activity, but also how certain states ‘can promote or impede movement into future states’ (Clark, op. cit. 71–2), taking into account second- and third-generation networks’ sensitivity to temporal sequence. This moves even further away from the static, localised representations of CSP, towards a more dynamic, continuously changing sequence of states in which no individual element can be said to ‘represent’ a particular object, but in which the network as a whole is able to encode information relating to — and indeed caused by — the appropriate stimuli. Consequently, provided that it contains sufficient logical structure to represent the syntactic and semantic relations necessary for language, there is no reason why a connectionist network cannot represent such content in physically distributed form.1
III. Causal Structure and Compositionality
Having established that connectionist networks satisfy the conditions necessary for representation, I will now turn to the question of whether they contain the kind of structure that is required to represent human thought and language. Fodor argues against this on two counts. Firstly, connectionist representations (of sentences, for example) are not necessarily composed of representations of the relevant component parts (i.e. words and phrases), as they would be in a symbolic system. Consequently, the productive capacity of thought does not arise as a matter of necessity, or ‘psychological law’ (Fodor 1990: 184), out of the computational architecture, as with CSP, but must be explained by some other factor — something that Fodor considers to be highly implausible and lacking from the connectionist account. Secondly, the tokens within a CSP system, which Fodor takes to be straightforwardly correlated with language and psychological concepts (Holmes’s beliefs, suspicions, and so on), play an essential role in any causal description of the system’s behaviour. However, in a connectionist system, such symbols are irrelevant since its behaviour is only explicable in terms of the functioning of individual processing units. Furthermore, Fodor claims that the only way that such symbols would play a causal role is if the connectionist network were to implement a classical ‘language of thought’ architecture.2 As we will see, this argument turns upon the precise meaning of the term ‘implement’, but it also leaves open the possibility of networks that merely approximate CSP, but whose causal structure is fundamentally connectionist, as discussed in section IV.
As Smolensky (1988) has argued, the activation states of a connectionist network may also be analysed in terms of ‘tensor product representations’, which are mathematical vectors that describe constituent components of each state. Multiple tensor products may be summed using vector arithmetic to produce a composite state, e.g. Vblack + Vcat = Vblack-cat, much as symbolic representations may be combined to create composite representations, e.g. ‘a black cat’. However, unlike CSP tokens, tensor products typically exhibit a high degree of context sensitivity (Rowlands op. cit. 487). For example, the representation of a cup of coffee minus the representation of a cup would not give a representation of coffee, but that of coffee in the context of a cup (Smolensky 1990: 207). Coffee in the context of a flask, for example, might be represented quite differently by the network, which may have no single context-independent way of representing coffee, but rather a whole family of otherwise unrelated representations of coffee as it appears in different contexts (coffee beans, coffee shop, etc.) [ibid. 209].3 The use of context-neutral symbols (i.e. words) to represent content is highly characteristic of human thought and language and, Fodor claims, essential to the representational scheme that underlies it. Once again, Fodor’s objection is not that connectionist networks cannot generate context-independent symbols — they can — but that they do not do so as a matter of nomological necessity (Fodor 1990: 202).4 So, whilst advanced connectionist networks undoubtedly contain significant representational structure, it is not necessarily of the kind that facilitates abstract thought.
In order to overcome Fodor’s objection, the connectionist has to show that (i) there is some other reason or mechanism behind the emergence of complex semantic structure, and (ii) that connectionist networks possess comparable combinatorial ability to CSP (the systematicity argument). The first of these may be explained in terms of environmental selection and other systemic pressures that favour the development of context-neutral representations over context-dependent ones; e.g. for survival or reproductive success. Combined with exposure to an appropriately diverse range of stimuli (existing networks are typically trained for very specific tasks or functions), this may in itself be sufficient to explain the predominance of context-neutral representation in human thought under the connectionist model. However, other approaches are possible. Although the basic architecture of a connectionist network does not necessarily lead to context-neutral representations, certain types of connectionist networks do. For example, a network that is organised in terms of ‘roles’ and ‘fillers’ (Smolensky op. cit. 212) will exhibit similar computational abilities to CSP, especially when combined or ‘layered’ with conventional context-sensitive networks, as described below. Such an organisation mirrors the capabilities of a primitive Von Neumann machine (Copeland 2006) in which symbols are held within a small number of memory locations, or ‘registers’, and may be manipulated and combined in various ways, thus addressing the second point above regarding connectionism’s adequacy for representing and manipulating semantic structure.
IV. Architectural Concerns
The possibility that a connectionist network might be able to emulate CSP raises the question of whether such an arrangement would constitute a truly connectionist model of computation, or whether it is really a ‘language of thought’ implemented in connectionist hardware (Fodor 1988: 49). The answer to this question hinges upon whether the logical structure of such a network can be described solely in terms of its symbolic structure, or whether the underlying connectionist network merely approximates a CSP architecture. This may seem like a minor distinction, but if the resulting system cannot be explained solely in terms of its symbol processing abilities, then it cannot be said to implement a ‘language of thought’ (Smolensky op cit. 204, 210, 216). This point cuts both ways. Conventional connectionist networks cannot be explained in purely symbolic terms since the only causally efficacious entities they contain are the individual processing units, and not the symbols which are merely approximated by the network. Any description of the network that makes use of such symbols — including Smolensky’s tensor product representations (ibid. 210) — is merely a convenient approximation, or at least only one possible view, of the network’s causal structure.
Fodor takes this to be a knock-down argument against connectionism (Fodor op. cit. 203), but it may also help to explain another important aspect of human cognition, namely, the ability of humans and other animals to act upon poorly defined, vague or incomplete representations. On the connectionist view, such abilities can be explained in terms of the network’s ability to make use of ‘sub-symbolic’ elements representing subtle aspects of the network structure that are not fully articulated (or are unarticulable) in symbolic form. Sherlock Holmes, for example, might draw upon these in the form of the ‘hunches’, intuitions and tacit knowledge that helped lead him to his conclusion, but that later play no part in his dramatic reconstruction of events, which is fully symbolic. Fodor rightly claims that symbolic architectures can also employ finer levels of granularity than conventional words or concepts (Fodor 1988: 5 fn.), but this fails to address the question of why some symbols should be accessible to conscious introspection and may be described in natural language, whilst others remain seemingly elusive and unavailable to consciousness. With connectionism, this distinction falls out as a natural consequence of the architecture, much as the systematicity of natural language falls out of the ‘language of thought’ hypothesis, since only the symbols that are approximated by the system are translatable into natural language, whilst the remaining content remains hidden at the ‘sub-symbolic’ level.
A striking demonstration of connectionism’s ability to capture and make use of such ‘sub-symbolic’ structure can be found in a groundbreaking piece of research by David Chalmers (1990). In it, a form of connectionist network known as Recursive Auto-Associative Memory, or RAAM (Pollack 1988), is trained to recognise sentences worded in the passive voice with a reasonable degree of accuracy (over 80%). This is wired directly into a second ‘transformation network’, which translates the resulting sentences back into the active voice — a task for which the original RAAM network has no competency. The transformation network achieved a success rate of 65% with sentences that the RAAM network had not previously been trained to recognise, and 100% with those it had. However, since translation proceeded directly from the distributed representations within the RAAM to the output stage without passing through any intermediate stages, at no point was any internal symbolic representation or ‘language of thought’ present or required. What is crucial about this experiment is that it demonstrates the ability of connectionist networks to make use of the hidden ‘sub-symbolic’ structure within a distributed representation to perform the sort of transformations that are usually associated with CSP, but without having to implement any formal symbol processing mechanism. In other words, it gives the appearance of symbol processing without the need for actual symbols.
Fodor might reject this as a counterexample to the language of thought hypothesis on the basis that such translation may be conceived a one-stage, rather than two-stage process, and so no intermediate representation is required. However, this objection is unconvincing since what the RAAM example shows is that there is no reason in principle why cognitive processes cannot be modelled in this way, thus undermining Fodor’s claim that beliefs, desires, knowledge etc. need be explicitly represented within the network provided that the required behaviour is produced. Chalmers’ result is particularly remarkable given that connectionism is still at a relatively early stage of development and, by any account, far from being able to exhibit the kind of generalised and highly abstract processes that are characteristic of human thought (Smolensky 1990: 202). It also strongly suggests that the ‘layering’ and interconnection of multiple networks, each specialised for a particular task, may offer a much more powerful and flexible model of human cognition than conventional single-purpose networks, highlighting the need for further research in this area.
V. Conclusion
Connectionist networks are capable of encoding the sort of complex structures necessary for representing thought and language. Unlike CSP architectures, which typically employ highly localised forms of representation, these may be distributed across the entire physical structure of the network. Moreover, their logical and causal structure is only approximated, and not literally described, by the kind of symbolic representations envisaged by Fodor’s ‘language of thought’ hypothesis. Instead, the syntactic (i.e. causal) structure of a connectionist network is related to its physical structure, whereas its semantic structure (i.e. meaning) resides at a higher level of abstraction (Smolensky 204–5). Connectionist networks are therefore able to mimic the abilities of conventional CSP architectures without literally implementing a ‘language of thought’, throwing serious doubt upon Fodor’s claim that only CSP can explain human cognitive ability. On the connectionist view, the folk psychological description recounted by Sherlock Holmes in the above quotation is not a literal description of his thought processes, but a post hoc reconstruction, or approximation, of the causal processes that took place within his brain. Although such approximations necessarily fail to capture the precise causal structure of a connectionist system, they may nevertheless be sufficiently accurate for use in everyday life (although a significant amount of activity may also take place at a lower ‘sub-symbolic’ level).
This leaves open the question of whether the human mind is itself a kind of connectionist network — an issue upon which the current scientific evidence is inconclusive. To answer this question, we should look less to structural differences between connectionist and CSP architectures, and more to computational differences. These include whether the human mind implements the kind of ‘fetch’ and ‘store’ operations that are required for symbolic computation, and whether its operational data and algorithms — the ‘rules’ of the system — are inextricably intertwined, as per the connectionist approach, or distinct, as with CSP. By differentiating the algorithms from the architecture, an empirical analysis of the strengths and weaknesses of both accounts may be carried out, thus answering the a posteriori question as to the structure of the mind and its methods of representation.
——————
1 Fodor (op. cit. 40) claims that CSP can also be implemented in physically distributed system, although whether classical symbol processing algorithms can be suitably decomposed to enable the sort of ‘massive parallelism’ that is possible with connectionism is debatable.
2 Fodor’s (1988: 38–46) claim that such an arrangement would exhibit many of the benefits of connectionism (increased parallelism, graceful degradation in the event of damage to the network, tolerance of noise and partial input data) combined with the representational ability of CSP is again highly questionable due to the nature of the algorithms involved.
3 Note that similarities and interrelationships between representational states may, however, still exist without these having to be explicitly coded within the network (Clark 1993: 9 in Garson 2007: §6).
4 Although one could level the same objection against symbolic representation, since there is no reason why the symbols for cup of coffee, coffee beans and coffee house must be composite, as opposed to simple — e.g. they could instead be labelled ‘1’, ‘2’, ‘3’ and ‘4’ — it is true that symbols are naturally combinatorial in a manner that tensor products are not.
Bibliography
Chalmers, David J. 1990: ‘Syntactic Transformations on Distributed Representations’. Connection Science, 2 (1 & 2), pp. 53–62.
Churchland, Paul M. 1988: Matter and Consciousness. Cambridge, Massachusetts: Massachusetts Institute of Technology
Clark, Andy 1990: ‘Connectionism, Competence, and Explanation’. The British Journal for the Philosophy of Science, 41(2), pp. 195-222.
————— 1993: Associative Engines. Cambridge, Massachusetts: MIT Press
————— 2006: Mindware. Oxford: Oxford University Press.
Copeland, Jack B. 2006: ‘The Modern History of Computing’. The Stanford Encyclopedia of Philosophy, Summer 2006 Edition. Edward N. Zalta (ed.), <http://plato.stanford.edu/archives/sum2006/entries/computing-history/>.
Davies, Martin 1991: Concepts, Connectionism, and the Language of Thought. In W. Ramsey, S. Stich and D. Rumelhart (eds), pp. 229–57. <http://philrsss.anu.edu.au/~mdavies/papers/lot.pdf> (accessed 24/6/07)
Fodor, Jerry A. 1975: The Language of Thought. Cambridge, Massachusetts: MIT Press.
————— 1987: Psychosemantics. Cambridge, Massachusetts: MIT Press.
Fodor, Jerry A. and Zenon W. Pylyshyn 1988 (preprint): ‘Connectionism and Cognitive Architecture: A Critical Analysis’. <http://citeseer.comp.nus.edu.sg/cache/papers/cs/22408/http:zSzzSzruccs.rutgers.eduzSzpubzSzpaperszSzjaf.pdf/fodor88connectionism.pdf> (accessed 29/3/07)
Fodor, Jerry and Brian P. McLaughlin 1990: ‘Connectionism and the Problem of Systematicity: Why Smolensky’s Solution Doesn’t Work’. Cognition, 35, pp. 183–204.
Loewer, Barry and Georges Rey 1991: Meaning and Mind: Fodor and His Critics. Oxford: Blackwell.
Garson, James 2007: ‘Connectionism’. In The Stanford Encyclopedia of Philosophy, Spring 2007 Edition. Edward N. Zalta (ed.), <http://plato.stanford.edu/archives/spr2007/entries/connectionism/>.
Pollack, J. B. 1988: ‘Recursive Auto-Associative Memory: Devising Compositional Distributed Representations’. In Proceedings of the Tenth Annual Conference of the Cognitive Science Society. Montreal, Canada, pp. 33–9.
Putnam, Hilary 1982: Reason, Truth and History. Cambridge: Cambridge University Press.
Rowlands, Mark 1994: ‘Connectionism and the Language of Thought’. The British Journal for the Philosophy of Science, 45(2), pp. 485–503.
Smolensky, P. 1988: ‘On the Proper Treatment of Connectionism’. The Behavioural and Brain Sciences, 1 (1), pp. 1–74.
————— 1991: ‘Connectionism, Constituency, and the Language of Thought’. In Meaning and Mind, B. Loewer and G. Rey (eds.), pp. 201–28.
Picture: large metal sculpture of a dandelion clock that was part of one of the show gardens at a festival in Westonbirt Arboretum, North East Somerset, October 2003.