Friday, April 07, 2006

investigating the role of fast vs. slow components in learning

I've been working during the last days but haven't been able to report on any of it. Mainly because I have not made as much straightforward progress as I can easily explain.

I've been working on evolving learning on a continuum in the abstract scenario. I have managed to evolve circuits that can 'memorise' features on a continuum without the need to introduce any parameter-changing rules.

How do they memorise the features? The key to answering this question is in the interaction at very different time-scales between the 'neurons' of the circuit.

Currently I am performing experiments in this one-shot remembering features on a continuum where the delay can vary greatly (between 10 and 100 units of time).

From 20 evolutionary runs using 10 node CTRNNs I observed that there were, roughly, 2 possible types of evolved behaviors: discrete categorisers or generalisers. The discrete categorising circuits are unable to differentiate most of the features within the continuum and rather they make up 2 or 3 categories for them. Either small and large, or small, medium and large type of features. The ones that make 3 categories score better than the ones that only make 2 categories. The generalisers do much better and can distinguish between many more signals, without making any obvious categories a priori.

In terms of the mechanisms available, there's 10 nodes and they can act in very different time-scales. The fastest ones have a time-parameter of 1 while the slowest can be as slow as 140 units of time. This parameter describes the way in which state leaks out of this component. One thing is common among the categorisers, they do not make use of slow acting neurons, but only fast-acting ones. Whereas the generalisers make use of both.

For the purpose of determining the role of fast versus slow acting components in the circuit in this memorisation task, the best evolved agent's time parameters were studied and from it we defined what slow acting neuron ranges and what fast acting neuron ranges where needed to solve the task. Slow acting then corresponded to neurons with time parameters between (50 and 100) while fast acting nodes where defined to be between (1, 7).

To observe the role of this different components I am running evolutionary runs while the circuit is constrained to have different numbers of fast/slow neurons. I am using a 10 node CTRNN so there are 11 cases. In one extreme is the case when all neurons are restricted to the fast range, then there is the case when one is restricted to be slow and the rest restricted to be fast, and so on until all neurons are restricted to be slow. I am running 10 evolutionary runs using different seeds for each case. The intuition is that neither of the extremes do well, and that the optimum for the learning task lays somewhere in-between, possibly with more fast neurons than slow neurons. The resulting curve should be interesting.

On the other hand, I have not managed to evolve circuits that can continue to remember the feature after they have had to make a first decision. I am working on changes to the current evolutionary methodology that will lead to success here. One of the changes that I will try might be to add a feedback input (I have to explain why I think this is important but will do so later on). Also I must simply wait longer (more generations) as well as try larger circuits (currently the biggest has been a 10 node one).

On a completely different note, I have started to design the experiments in the embodied version of this memorisation task loosely inspired on behavioral plasticity observed in C. elegans. A similar form of learning to imprinting has been observed in these organisms. In particular, it has been observed that the animals that were cultivated normally with food at temperatures ranging from 15C to 25C migrate to the cultivation temperature on a temperature gradient and move isothermally at that temperature. By contrast, the animals migrate away from the temperature at which they were previously starved. They don't call it imprinting, but it is a very appropriate behaviorual paradigm to continue to study learning and memory in the embodied version. I will say some more on this soon. more

Wednesday, March 15, 2006

results from noise-less experiments: success

This post will describe results from 20 evolutionary runs using 5 and 10 node CTRNNs (10 runs each). Each evolutionary run uses a different seeds and stops after 5000 generations [roughly around 11 hours in the cluster]. The experimental set-up is as follows: [1] parents and test individuals are picked at random from a uniform distribution (no incremental beta distribution), [2] the random initialisation has been taken off - CTRNNs now begin from the same state, [3] there is a fixed delay at the start and between presentations - no longer is there a random component, [4] Gaussian-weighted evaluation is used, [5] traditional gen-phen mapping is used and [6] number of trials per fitnes went up from 100 to 200. The main idea of these experiments will be to see whether I can evolve agents for the continuum irreversible learning task without noise. There is still a lot of inter-trial variability.

Most (7/10) runs with 5 node CTRNNs end up categorizing ‘just’ two signals – indifferent to the differences between the ones in-between. Some make very strict categories from the inputs, others less. See this figure for an example of this kind of 'discrete' learning performance. As in this blog you will encounter this particular type of figure all over the place I will provide once the explanation of it. The vertical axis represents the different ‘parent’ signals that the agent can receive. The horizontal axis represents the different ‘test’ signals. Notice that both of these axes provide information about the continuum of that feature. The line along the diagonal, then, represents the trials where the parent and test individual are the same. The colour shading represents the agent's decision to accept as their parent or not, the test individual. The colormap goes from blue (i.e. 'this is not my parent') to red (i.e. 'this is my parent').

Out of the 5node runs, number 10 did much better than the rest. See its performance here in this figure. The activation of the nodes and the input for several trials can be seen in this figure for 10 random trials. You will probably see this style of figure often as well around here so I will explain it once first. The figure shows all ‘neuronal’ activations over time. The signals at the bottom represent the input presented to the circuit at that particular time. The vertical dashed lines represent the end and beginning of a different trial. At the beginning of each trial the circuit’s state is re-initialised. The activation in the shaded area is the ‘designated’ output node and the black boxes around the end of each trial represent the evaluation period as well as what the ‘correct’ of the output node should be for the previously presented individuals. This five node CTRNN is using all of its nodes to generate this behaviour, but the ‘strategy’ we cannot tell from this view alone. We can see that the agent makes mistakes, in for example, the 5th trial, where two different individuals are presented but the agent classifies them as the same. We can also appreciate from that figure, nevertheless, the relative proximity in ‘appearance’ of these two individuals.

Out of the 10-node runs, 2 did particularly well [runs 3 and 9]. I will show here their performance (3, 9) and the circuit’s activation for 10 random trials (3 and 9), but I will not analyse them much further yet.

It is likely that the agents use the transients and the precise timing of the presentations and delays in this particular scenario. Experiments introducing delays in the in-between periods for these best evolved agents show that, delays do affect their performance (see for example how the performance shifts as delays are introduced in one of the 10 node scenarios in this figure). One question of interest is whether we can re-evolve the successful agents to cope with random delays? The same applies for the random initialisation of the state of the circuit? If we can, then an incremental approach to noise would be rather useful. more

Tuesday, March 14, 2006

variants on current experimental set-ups

Evolving to remember and distinguish a feature within a continuum is truly hard for abstract dynamical systems, even if irreversible, at least in comparison with its discrete counterpart. I will detail ahead a number of several variants that I am experimenting with, in order to get something that works.

First of all, by abstract I mean disembodied and non-situated in an environment and accordingly where the system itself cannot vary the way the feature is being given, at least not directly.

[1] Gradual increment from discrete to continuum during evolution: The ‘zen’ approach would be to pick parents and individuals randomly from a uniform distribution every time from the beginning of evolutionary time until the end. But it is not easy to be ‘zen’ (particularly when you are not making as much progress as you would like to be), so we put our engineering hats on. One could think that it would be easier for evolution to gradually change the task into ‘more complex’ as the population gets better – this is probably most of the times the case, but I’m not so sure it applies in this particular scenario. As the title suggests the idea is to evolve agents for the 2 most different parents first and as the agent gets good at doing this, more closely related parents are presented. One way to do this is drawing parents randomly from a distribution that gradually shifts from the binary 2-parent case towards the complete continuum. The Beta distribution is quite ideal for this, it has two parameters and changing the parameters from 0.1 to 1 causes the distribution to change from ‘almost’ discrete to ‘almost’ uniform.

This figure shows the histogram of random samples taken a beta distribution for parameters 0.01, 0.11, 0.21, 0.31, 0.41, 0.51, 0.61, 0.71, 0.81. As you can guess, when the parameter (alpha) reaches 1, the distribution is effectively uniform across the continuum. The motivation for this is to encourage the agent to distinguish between the more different individuals first and later on to learn the smaller differences. This distribution can be used to pick both the parent and the test individual.

I’ve used this for a while and have run comparisons with and without this ‘incremental’ approach. I have not made a thorough comparison (yet), nor have I performed sufficiently big tests (basically compared 10 runs with the dynamic Beta dist and 10 with the simple uniform dist), nevertheless (and interestingly enough indeed), it does not seem to improve evolvability in any way. It actually made things worst as far as I can tell. Again, I have not studied this in depth but hope to do so at some point.

[2] Gradual increment from noise-less to noise-full: For the behaviour to be interesting it has to be able to cope with several forms of noise: [a] random initialisation of the activation of the nodes, [b] random delays at the beginning of the run and between presentation of individuals and [c] inter-trial variability. There are also other types of noise that I am not considering at the moment (e.g. [d] inter-node, sensory and motor node noise).

[3] Gaussian-weighted Evaluation of agent’s output: This idea is taken directly from (Phattanasri et al., 2002, 2006). The agent is evaluated for 10 units of time after the presentation of a test individual. Instead of simply taking the area under the output node’s activation, a Gaussian-weighted area is taken. This takes away importance to the beginning and end of the evaluation period and concentrates on the middle region. It helped them, since mine is not evolving I figured I’d use it until it works – then I would run tests to see whether this in particular helped or not.

Finally, there are several parameters we know (from common sense or experience) to be crucial:

[4] Genotype-phenotype mapping:
[i] Weights: mapped linearly from [0,1] to something like [-6, 6]. I’m quite happy with this.
[ii] Time-parameters: Map exponentially from [0,1] to [e0,e3]. The important thing here is that the smallest possible is around 10 times bigger than the time-step of integration (usually 0.1) and the largest will depend on how long a trial is – definitely not longer than that. The exponential mapping simply provides more precision for the small time-scales (where more precision might be needed), and less as the time-parameter becomes bigger. I’m quite happy with this as well.
[iii] Biases: generally I map it linearly from [0,1] to something like [-10,10] but I’m not happy with this one at all. One idea (which could be very fruitful to CTRNN evolution in general) is to map the genotype value always relative to the centre of activation of that node (which would depend on its incoming weights). Furthermore, this could be extended to include an exponential mapping so that there’s more precision around the centre-crossing region and less towards the outer parts,

[5] Number of trials per fitness (this is related to the inter-trial variability): Each agent is tested 200 times every time it is chosen during selection. I’ve done 100 until now so this is an experiment to see whether there’s any difference.

[6] Evolutionary operators: mutation and recombination: For mutation I used the common Gaussian vector mutation, where basically each value in the genotype gets perturbed with a random number drawn from a Normal distribution around 0 with very small standard deviation. I’m happy with this. For the recombination I take x/N genes from the winner of the tournament and (N-x)/N genes from the loser. I’m less happy about this method. It makes a lot of sense for the discrete (e.g. binary) genotypes but less so for continuous ones. Ideally the new individual should be made from a combination between winner and loser that is not constrained to the gene dimension. One way would be to take a new point in genotype space using regular Euclidean n-space that is an x proportion away from the winner and a (1-x) proportion away from the loser. Not sure though.

[7] Finally, the very obvious, number of nodes in the CTRNN: I have played around with circuits between 3 and 10. As there are 5 node CTRNNs whose nodes are all active but which are not doing the full* task then I will be running experiments with 10 node CTRNNs and when it works I’ll see how many are (if any) saturated on/off and take it from there. more

Saturday, March 11, 2006

project finally begun, first update.

In this post I will give the overview of the broad plan (more or less again), followed by the prioritization that I have planned for my experiments. I will then provide the details of the first experimental set-up followed by the first update on the experiments that begun 4 days ago.

The long-term plan to answer the questions posted previously will be to evolve and thoroughly analyze dynamical system agents in several different experimental set-ups: two different tasks (i) reversible and (ii) irreversible learning on a continuum; and two experimental set-ups: (a) a disembodied/non-situated one and (b) an embodied/situated one. The reversible learning further subdivides in classical conditioning or operant conditioning.

This could add altogether to 6 different experimental scenarios. However, they will not be tackled all at once. I am prioritising the set-ups in terms of relatively simplicity and in terms of the main goal of my research. Although eventually I would like to understand all of these in terms of dynamical systems, I will give priority to the abstract scenarios over the embodied/situated ones because my main interest is in the analysis of the dynamics underlying learning and memory.

Furthermore, I will prioritize these abstract set-ups in terms of relative simplicity. The first set-up to be tackled will be concerned irreversible learning. Second, I will tackle the reversible learning scenarios.

The first experimental set up is to do with evolving agents that can learn once during their lifetime. They require that the agent ‘memorise’ the presentation of a feature in the environment and be able to make a decision concerning this feature after a time delay. There are a number of examples of irreversible forms of learning in animals; one which is particularly interesting is parental imprinting in birds as studied by Konrad Lorenz.

The tasks are similar to the imprinting scenarios that I have played with until now but they extend in very important directions: [1] So far the learning on a continuum has only been evolved and tested on few successive presentations of test individuals (i.e. one or two). One important direction is to extend this imprinting-like-learning over many successive presentations of test individuals. This will address questions regarding the persistence of memory.

The agent is a fully connected CTRNN. There’s one sensory signal that is being fed to all nodes in the CTRNN via a set of weights that are evolved along with the rest of the CTRNN parameters. The feature that the agent has to remember is a signal between [1, 2] provided for a fixed length of time (10 units of time). At the beginning of a trial, a random delay in introduced ([10, 20] ut). The first individual is then presented and this should be interpreted as the ‘parent’ individual in the imprinting metaphor of the task. Random delays are introduced again. A second signal is then produced. This can be of the same value or a different one to the first. This is interpreted as a ‘test’ individual. The agent has one output node. The output of this node is interpreted as ‘this is my parent’ when 1 and ‘this is not my parent’ when the output is 0. The agent is evaluated after the test individual is presented and a successful agent must produce the correct output for any number of individuals presented after the ‘parent’.

The learning is irreversible in this case because the agent cannot relearn a new parent at any particular point during ‘its life’ only at the beginning. At the same time, this learning is interesting because the agent has to hold on to its ‘memory’ of the first individual for the longest time possible.

So, I finally started on Tuesday (09.03.06). I have further subdivided the first set-up into 4 stages: [i] 2-possible-parents irreversible learning with only one evaluation, [ii] 2-possible-parents irreversible learning with several successive evaluations, [iii] possible parents on a continuum irreversible learning with only one evaluation, and [iv] possible parents on a continuum with several successive evaluations.

I was able to evolve on the same day I started 2 and 3 node CTRNNs for stage [i]. I have not had the chance to analyse these circuits nor their evolutionary dynamics for two reasons. First, because all of it has evolved so quickly, but also because the plan is to wait until successfully evolving the full task (i.e. [iv]) before stopping and analysing.

Using an incremental approach I was also able to evolve 3 node CTRNNs for stage [ii]. The incremental approach was very simple (and inspired loosely in Phattanasri's work): the parent individual is always presented at first, after a random delay the first test individual is presented. Once one agent in the population achieves 95% fitness score the fitness trial changes to include an extra test individual after some other random delay. So on, until an agent can discriminate up to 5 test individuals one after another.

In the time that I get between evolving, I am building up some tools (in C and Matlab) to help me visualise the performance of the agents and analyse both the evolutionary dynamics and the CTRNN dynamics.

Now I've set-up the evolutionary runs for the continuum and successive case [iv] at once. There are two parallel incremental approaches at work for these runs. The first incremental approach is the same one as the already described to go from discriminating the first test individual correctly to identifying successive individuals, until it generalises. The second incremental approach concerns the shift from 2-possible-parents to a continuum of possibilities. This approach is a bit more subtle and I will describe later on.

There are a rather large number of questions to be answered in this first scenario (including questions in i, ii, iii and iv). These have begun to come up as I evolve the simpler milestones. I will write about the questions in the next update. more

Wednesday, March 08, 2006

current directions of research

The broad motivation of my research is to understand the mechanisms underlying learning and memory of features on a continuum in dynamical systems.

In particular I'm interested in understanding, how can dynamical systems ‘record’ a feature from the environment within a continuum and later ‘make-use’ of this ‘stored-information’ to make a decision? For how long can such a memory persist? And what is required to make it persist for longer or indefinitely? I'm interested in these issues from an evolutionary perspective as well, so questions like: what sort of evolutionary pressure is needed to evolve agents with the capacity to retain a particular memory throughout its lifetime? This corresponds to certain aspects of learning irreversibility and the evolution of critical periods. But also, this work will be concerned with understanding how this learning can be made reversible. So, what sort of dynamical mechanisms does a system 'need' to be able to re-learn a feature in a continuum from the environment over and over?

Currently there are (at least) three different directions my research could take. This has become clearer from discussions. Which directions to emphasize depend a great deal on what the main goal that I have is
[a] If the interest is analyzing the dynamics underlying learning and memory, then I could initially focus on abstract tasks.
[b] If the interest is exploring the role of embodiment in learning, then I should focus on embodied/situated tasks.
[c] If the interest is demonstrating that a CTRNN/evolutionary approach can produce a wide variety of learning phenomena, then I might want to focus more on designing a set of evolutionary experiments that illustrate a wide range of learning behavior. more

Thursday, March 02, 2006

still planning

The stepping-back-&-planning-ahead phase is almost over. For the last weeks I have resorted to writing on my (physical) research notebook, making concrete the cognitively interesting tasks that I will evolve and analyse, figuring out how they relate to each other and what the big question that they are answering is. This is, obviously, a crucial step in the making of my PhD thesis. But before I leap into the experiments themselves I am running these ideas by my advisor and a couple of other people. I hope to publish here the overall plan and motivation once it's all there. more

Tuesday, February 28, 2006

in their shoes

I have been terribly busy with the Journal of Adaptive Behavior special issue that Fernando and I are editing. This issue is based on work from our ECAL2005 workshop. I must say that the issue is looking very good and most importantly we should be wrapping up this work very soon. The complete process has been incredibly draining and time-consuming but the experience has been undeniably fruitful. Putting myself in the place of the reviewers has proven to be an excellent academic exercise because it gives me a very good notion of how to improve the writing of my own papers - which things to do and which things not to. Particularly when comparing my reviews of somebody’s work with the reviews made from other (more experienced) researchers in the field. more