The University of Queensland Homepage
School of ITEE ITEE Main Website

 Context Free and Context Sensitive Languages

CF & CS Languages

HOME      LINKS     RESEARCH     PUBLICATIONS

Contents

Finding Structure in Time by Elman

Language As a Dynamical System by Elman

On Learning Context Free and Context Sensitive Languages by Boden and Wiles

Context-free and context-sensitive dynamics in recurrent neural networks by Boden and Wiles

Sentence Processing and Linguistic Structure by Tabor

A Constructivist Neural Network Learns the Past Tense of English Verbs by Westermann

Language evolution without natural selection: From vocabulary to syntax in a population of learners by Kirby

Dynamical Models of Sentence Processing by Tabor

Inductive Bias in Context-Free Language Learning by Tonkes et al


Elman – Finding Structure in Time

This article was a preliminary study into the possibility of representing time implicitly (instead of explicitly as a spatial representation) when presenting networks with learning tasks. This was done by using recurrent neural networks (RNNs) with context units, that did not interact with the external environment. Instead, the units stored the previous state (passed to them by the hidden layer), in effect acting as memory. The networks therefore need to learn internal representations that incorporate task and memory demands.

A number of simulations were carried out involving recurrent neural networks. The networks were given progressively more complex tasks, all of which required that the network predict the next symbol that it would be presented with, based on previous input. The architecture of the network was kept relatively static (changes were only made when moving from the initial task of predicting input and output of the XOR function to the more complex language prediction tasks). The networks were not expected to predict the first symbol of a new sentence, but results showed that it was generally able to predict the type of symbol that it would be presented with (for instance the network could predict that the first symbol of a new sentence in this simplified language would be a noun).

Elman closes by indicating that future work lies in looking at the nature of the temporal relationship between internal states – one way of doing this is to look at the trajectories between states. The results also indicate that some tasks apparently change nature (especially the nature of their solution) when considered temporally.

Go to Elman’s article “Finding Structure in Time”:

http://citeseer.nj.nec.com/elman90finding.html

Reference details:

Elman, J.L., 1990, Finding Structure in Time, Cognitive Science 14: 179-211

 

Elman - Language as a Dynamical system

This article starts by discussing some of the theories that exist in the area of language representation. Most agree that language is a static system, where words are part of a lexicon that is assumed to be context free and static – that is, it is a data structure that exists independently of it’s use. Elman’s article claims that the use of a Simple Recurrent Network (SRN) as the language processor will show that language processing is dynamic, rather than static. The need for the SRN arises from the fact that none of the previous theories on language processing have adequately explained or accounted for the role of time in processing.

Two simulations were carried out where 10 000 sentences were presented to a network as the training set. Each word was presented in sequence to the network. The task was to predict the successive word in the sentence – the output from the network was the prediction of the next word. This was compared with the correct word and weights were adjusted using back propagation. The network was generally successful, at a level at which most humans are (i.e. probabilistically, but not 100% correct).

The conclusion mentions speech, and this has not been mentioned anywhere else in the article. It can be inferred that speech is under discussion, when the author writes about discourse – where a person responds to a sentence that someone else has said. The networks that are under discussion in this paper were not able to respond in this sense. The author indicates that this might be an interesting area of further work.

Go to Elman’s paper at Citeseer:

http://citeseer.nj.nec.com/elman95language.html

Reference details for “Language as a dynamical system”:

Elman, J.L., 1995, Language as a dynamical system in (eds) Port, R.F. & van Gelder, T. Mind as Motion: Explorations in the Dynamics of Cognition, pp 195-225, Cambridge MA: MIT Press.

 

Boden & Wiles – On Learning Context Free and Context Sensitive Languages

This article aims to show that the Long Short Term Memory (LSTM) is not the only network that can learn and generalize a context sensitive language – that second order Sequential Cascaded Networks (SCNs) are also able to learn and generalize. The purpose of these simulations is to demonstrate ways of processing recursive languages.

The network is tested on its ability to predict the first letter of the next string that it is presented with. The SCN that was used for the Context Sensitive Language (CSL) had three input and three output units, with two hidden units. The results indicated that only a relatively small percentage of the SCNs were able to learn a CSL. Further simulations were carried out to see if the SCNs could generalize on a Context Free Language (CFL). The simulations showed that SCNs were able to generalize, in a manner that was similar to an SRN.

The simulations showed that both SRNs and SCNs could implement monotonic counters, but these were not able to generalize on the training set. The behaviour of the SRNs when learning a recursive language resembled that of human memory. This article showed two possible ways of dealing with increased input size – monotonic and fractal encoding, which need to be investigated further.

Go to Boden & Wiles’ article:

http://www.hh.se/staff/mibo/publicat.html

Reference details for “On Learning Context Free and Context Sensitive Languages”:

Boden, M. & Wiles, J., 2001, On Learning Context Free and Context Sensitive Languages, IEEE Transactions on Neural Networks, in press.

 

Boden & Wiles – Context-free and context-sensitive dynamics in recurrent neural networks

The aim of this article was to demonstrate how the dynamics of RNNs that can process CFLs can also process some CSLs (which are usually thought to require additional computational resources). Focuses on the dynamics of Sequential Cascaded Network (SCN) to process CFLs (based on damped oscillations). The SCN is a second order network that learns to predict a finite subset of a CSL with good generalization, using BPTT. The article also shows an analysis based on dynamical systems theory, that describes how the network solves the problem.

The SCN operates by propagating outer product between the current input pattern & the previous state, together with the input & state patterns, through a set of weights to the output layer. The network has 3 input, 3 output and 2 state units. The networks task is to predict the next letter in a string. A solution is a net that correctly predicts all strings and generalizes all strings larger than those shown in training. The SCN was able to learn both a CFL and a CSL, and demonstrate some generalization for both types of languages.

This article shows that a CSL can be processed with computational resources that are sufficient for a CFL. The dynamics obtained by the networks trained to predict CFL and CSL are similar to fractal encoding/decoding. The SCN has the competence of processing infinitely deep strings, but performance is inhibited by noise and lack of precision of dynamical components. Also shows the advantages of the SCN versus the Simple Recurrent Network (SRN).

Go to “Context-free and context-sensitive dynamics in recurrent neural networks”:

http://www.hh.se/staff/mibo/publicat.html

Reference details for “Context-free and context-sensitive dynamics in recurrent neural networks”:

Boden, M. & Wiles, J., 2000, Context-free and context-sensitive dynamics in recurrent neural networks, Connection Science 12(3): 197-210.

 

Tabor – Sentence Processing and Linguistic Structure

The aim of this article is to demonstrate the behaviour of the bramble network (BRN) and the dynamical automaton (DA) when presented with sentence processing tasks. Knowing more about this behaviour can help understand the relationship between the topological and metric perspectives of complex systems. The topological view groups states together based on their long term behaviour, and aids understanding of system wide behaviour. The metric view shows the details of short term behaviour.

Two simulations were carried out to demonstrate the behaviours of these types of networks. The first aimed to show that the BRN could learn a language and demonstrate the inverse correlation between word frequency and recognition time that is a characteristic of human behaviour. The BRN was able to learn the language, and also demonstrated the required frequency sensitivity. The second simulation was designed to show the ability of the DA to deal with phase structure (syntax organising mechanism).

The bramble network is able to learn well, and can handle a large amount of background noise by degrading gracefully, in a manner that resembles the way that humans process sentences. However, it is unable to handle the level of complexity that appears in context sensitive languages. The dynamical automaton can handle this level of complexity, but it does not appear to handle noise as well as the bramble network. Further work is suggested in combining the capabilities of the two networks.

Go to “Sentence Processing and Linguistic Structure” at:

http://www.sp.uconn.edu/~ps300vc/papers.html

Reference details for "Sentence Processing and Linguistic Structure":

Tabor, W., 2001, Sentence Processing and Linguistic Structure in Kolen, J.F. & Kremer, S.C. (eds),A Field Guide to Dynamical Recurrent Networks, pp 291-309, New York: IEEE Press.

Network performs a one-in-n prediction task, which requires that the output units be normalized exponential (softmax):

(Above is the hyperbolic tangent function, often used as an activation function)

Matlab code for the hyperbolic tangent function

 

Westermann – A Constructivist Neural Network Learns the Past Tense of English Verbs

The aim of this article was to show that a neural network that constructs hidden units as required by the learning task was able to learn a task that is considered to be non-trivial (i.e. learning the past tense of some regular and irregular English verbs). This task has been attempted previously by other models. However, the network presented in this paper performed better than pervious models, and also modelled the ability of children when learning a language better than previous models.

The constructivist algorithm that was used to create the neural networks was a modified version of the Supervised Growing Neural Gas (SGNG) algorithm that starts with two hidden units. A new hidden unit is inserted when the performance of the network does not improve with the current architecture. The experiments that were carried out used the same data sets as previous experiments on different models.

The results of the experiment showed that this network performed better than others that have previously been applied to the same problem with the same data set. One of the models that the constructivist network is compared against is symbolic (as opposed to this network which is sub symbolic), others are sub symbolic with a fixed architecture. The author claims that the comparison of the networks shows that the most fundamental dichotomy is between fixed architecture and constructivist networks, not symbolic versus sub symbolic, as is traditionally believed.

Go to Westermann’s “A Constructivist Neural Network Learns the Past Tense of English Verbs”:

http://www.anc.ed.ac.uk/publications/

Reference details for “A Constructivist Neural Network Learns the Past Tense of English Verbs”:

Westermann, G., 1997, A constructivist neural network learns the past tense of English verbs. In Proceedings of the GALA '97 Conference on Language Acquisition, pp 393-398, Edinburgh, UK: HCRC.

 

Kirby – Language evolution without natural selection: From vocabulary to syntax in a population of learners

The aim of this article is to show that linguistic ability and evolution is not necessarily solely based on biological evolution. The author suggests that the behaviour shown in the simulations can be explained by the dynamics of the generalizations.

The simulations that were carried out in this paper involved a population of individuals who were given the task of learning a language. Some of the constraints on the system included: knowledge in the population was gained by watching other individual’s behaviour, there was a gradual turnover of the population over time, there was no selection of individuals (to show that biological evolution was not a factor) and the population initially had no communication system at all. For the first couple of hundred generations, very little happened, then there was a leap in the size of the grammar possessed by the population. This was then refined in the last few generations to demonstrate compositionality. Another simulation was run where the population had some initial language skills – a complete language was generated within the first few generations.

The importance of these simulations was that they showed that a simple language-like syntax could be generated from a random population that was not constrained to learn only syntactic languages. These results also showed that biological evolution was not necessary for linguistic skills to be evolved.

Go to Kirby’s “Language evolution without natural selection: From vocabulary to syntax in a population of learners”:

http://citeseer.nj.nec.com/14554.html

Reference details for “Language evolution without natural selection: From vocabulary to syntax in a population of learners”:

Kirby, S., 1997, Language evolution without natural selection: From vocabulary to syntax in a population of learners, Econometrica 65, 5: pp 1181-1193.

 

Tabor – Dynamical Models of Sentence Processing

The aim of this article was to demonstrate that dynamical systems theory can provide a good framework to show details of factors underlying syntactic processing. The authors use a dynamical model called the Visitation Set Gravitation (VSG) (a dynamical processor which transforms representations from the Simple Recurrent Network (SRN - the other part of these simulations) into parse hypotheses) to develop representations. The simulations are used to demonstrate a distinction between strings that are syntactically and semantically incorrect.

The simulation consisted of an SRN and the dynamical processor, VSG. The SRN was a three layer feed forward network with a “context” layer that was used as input into the hidden layer at each time step. The unit consisted of 37 input, 10 hidden and 37 output units. Backpropagation Through Time was used to train the network. The VSG was trained like the SRN. The aim of the simulation was to look for semantic and syntactic incongruity (incorrect strings). The grammar that was used to train the network allowed 15552 grammatical strings.

The network was able to distinguish between grammatical and ungrammatical strings, both syntactically and semantically. The interesting result that arose from the simulation results was that semantic violations were processed much faster than syntactical errors. The conclusion that the authors reached was that this occurred because most semantic violations would have been seen by the network during training (the implication was that not all syntactic violations would have been seen during training).

Go to Tabor’s “Dynamical Models of Sentence Processing”:

http://www.sp.uconn.edu/~ps300vc/papers.html

Reference details for Tabor’s “Dynamical Models of Sentence Processing”:

Tabor, W. and Tanenhaus, M. K., 1999, Dynamical Models of Sentence Processing, Cognitive Science, 23(4): 491-515.

 

Tonkes, Blair and Wiles – Inductive Bias in Context-Free Language Learning

This article looked at the performance of recurrent neural networks (RNNs) at learning prediction tasks for context-free languages (CFLs). Simulations were carried out to look at the effect of biasing learning towards finding a solution, with the aim of improving the performance of an RNN. Improvement is either an increase in efficiency, reliability or consistency (or a combination of these).

Two types of simulation were carried out – in the first, different combinations of weights were fixed to restrict the search space of the algorithm (the algorithm that was used was backpropagation through time (BPTT)). Some of these networks were able to learn and generalise, while others were completely unable to learn a solution at all. The second simulation used evolutionary hill climbing. This algorithm found some solutions that were similar to those found by BPTT, and was also able to find solutions that the BPTT only found with difficulty or did not find at all. This network was also able to generalise very well.

The results of the first simulation showed that the performance of networks trained under BPTT was extremely sensitive to restrictions in the search space – changes that were similar produced results from being able to generalise well to not being able to find a solution at all. It was expected that the evolutionary hill climbing algorithm would be less efficient than the BPTT. However, the hill climbing algorithm actually appeared to be more biased towards finding a solution than BPTT.

Go to “Inductive Bias in Context-Free Language Learning”:

http://www.itee.uq.edu.au/~btonkes/

Reference details for “Inductive Bias in Context-Free Language Learning”:

Tonkes, B., Blair, A. & Wiles, J. (1998). Inductive bias in context-free language learning. In Proceedings of the Ninth Australian Conference on Neural Networks (ACNN98), 52 - 56.