|
|
| connectionism, symbolic cognition & the past-tense debate |
| version of Apr 8, 1999 |
When you want to indicate that something has happened prior to now, you inflect a verb to ``put it in the past tense.'' The exact way to do this is part of your knowledge of morphology.
|
|
Assuming that computation = cognition, what kind of computation do people in fact do to work out what the correct inflection is?
Rote memorization is a symbolic computational theory that describes English past-tense using just 1 rule. It also requires some kind of memory to store irregular forms: this memory is a sequence of slots, like a computer memory.
Knowing regular inflection is knowing the regular inflection rule. By contrast all irregular verbs are memorized. Sometimes this memory fails, and instead, the regular inflectional system fires by mistake, resulting in an ``overregularization error'' i.e. breaked. On this view, only the regular system is productive - only the regular system can spontaneously apply to novel words.
On this theory, we should expect all unknown words to be inflected by the regular system. But in fact this doesn't happen. When presented with nonsense words like spling, lots of people inflect it as splung (Pinker and others).
Inflection immediately starts to follow the rule-like pattern once the learner hits upon the correct rule. This corresponds to an observed developmental stage around age 3 at which all verbs are treated as regular, e.g. goed. This called the U-shaped learning curve because performance has to go down before it can go back up again.
Rumelhart & McClelland 1986 present a parallel distributed processing model that realizes the intuition ``kids inflect new verbs like the ones they already know.'' Their central goal was to accurately model the U-shaped curve without resorting to explicit computational rules.
| A set of processing units | |
| A state of activation | |
| An output function for each unit | |
| A pattern of connectivity among units | |
| quick review of PDP models: | A propagation rule for propagating patterns of activities through the network of connectivities |
| An activation rule for combining the inputs impinging on a unit with the current state of that unit to produce a new level of activation for the unit | |
| A learning rule whereby patterns of connectivity are modified by experience | |
| An environment within which the system must operate |
Rumelhart and McClelland used the presence (1) or nonpresence (0) of phonetic features of sounds to define the state of activation in their neural net. They translated each word into a 460-digit string of 1s and 0s. The patterns for the stems were associated with the correctly-inflected past-tense form using a learning method called the ``perceptron learning rule.''
In the perceptron learning rule, connectivity weights are adjusted by a teacher if the network gives the wrong answer. The weight on the line coming into the errant node is increased (output value was too low) or decreased (output value was too high) depending on what the problem with actual response was.
To simulate the earliest phase of past-tense learning, the model was
first trained on the 10 high-frequency verbs, receiving 10 cycles of
training presentations through the set of 10 verbs. This was enough
to produce quite good performance on these verbs. We take the performance
of the model at this point to correspond to the performance of a child
in Phase 1 of acquisition. To simulate later phases of learning, the
410 medium-frequency verbs were added to the first 10 verbs, and the system
was given 190 more learning trials, with each trial consisting of one
presentation of each of the 420 verbs. The responses of the model early
on in this phase of training correspond to Phase 2 of the acquisition
process; its ultimate performance at the end of 190 exposures to each
of the 420 verbs corresponds to Phase 3.
Pinker & Prince 88 demonstrates that the Rumelhart & McClelland 86 past-tense network has many shortcomings, including these
Data from an intensive study of children's language acquisition (Brown, 1973) indicate that children's vocabulary does not undergo changes as radical as the changes to the input mix in the RM86 model.
conclusion: RM86 is a single route model which cannot accurately characterize the facts about the acquisition of the English past-tense.
In a series of articles, (Pinker 1991 and Prasada & Pinker 1993) Steven Pinker describes a ``dual route'' model which is intended to capture the best parts of the symbolist and connectionist accounts of the past-tense.
Regular verbs computed by a suffixation rule just as in rule-and-rote theory. Irregular verbs (only) retrieved from an associative memory.
Subjects inflect nonsense words that sound like irregulars in a way similar to irregulars that they know, rather than always overregularizing.
A continuous effect of similarity has been measured experimentally: subjects
frequently (44%) convert spling to splung (based on string,
sling, et cetera), less often (24%) convert shink to shunk,
and rarely (7%) convert sid to sud.
If there do exist two different inflectional routes, one for regulars and one for irregulars, then we should find people who are impaired on one but not the other.
SJD's speech is characterised by fluent, usually complete sentences
with occasionally morphological and function word errors, semantic
paraphasias, phonemic paraphasias, and hesitations for word-retrieval.
... the contrast between her performance on the regularly inflected
words and that for the irregularly inflected verbs indicates that
it is the morpho-phonemic structure of the inflected words, and not their
morpho-semantic complexity, that is relevant to this effect.
SJD is a 47-year-old college educated female who suffered a thromboembolic
left-hemisphere stroke in June 1984...
Patients like SJD are taken as evidence supporting a dual route model, because it looks like the two routes can be neurologically damaged independent of one another. The distributed nature of the Rumelhart & McClelland 86 model suggests that any damage (snipping of connections?) would globally make things worse, for all verbs regular or irregular. In a PDP model, no single connection holds all of the network's knowledge about a particular kind of input.
Dual-route, single-route issue is about the character of the human computation of inflection. Multi-route proponents must prove that each route implements a qualitatively different kind of function. Single-route proponents must show that the architecture of their one mechanism is sufficient to realize all of the different kinds of inflection that we see, while not being so powerful that it fails to say what is so human about the computation.
Philosophy of science question All of the exceptional cases that the ``default'' route can't handle (i.e. irregular verbs) are lumped into the ``nondefault'' route. The nondefault route must be pretty powerful since it is doing all the hard work. How is this an explanation?