20 December 2012

Hierarchies of temporal scales in perception


The brain seems to do a lot of stuff. We know this because cognitive neuroscience continues to probe ever finer into this stuff, and report back voluminously on the details. These details are of course essential, but it's helpful to take a step back every once in a while and think about the big picture.

This is what appeals to me about the recent work of Stefan Kiebel and colleagues (e.g. Kiebel et al. 2008, 2009). These authors offer an interesting and compelling perspective on perception, learning, brain structure, brain dynamics, and the relationship between agents and the world in which they live.

"We...motivate the hypothesis that the environment exhibits temporal structure, which is exploited by the brain to optimise its predictions. This optimisation transcribes temporal structure in the environment into anatomical structure, lending the brain a generic form of structure-function mapping."

Bullet point summary of the central ideas:
  • brains / organisms / agents make predictions about the world
  • ...and have an inherent tendency to minimize 'surprise' by trying to make correct predictions*
  • structure exists in world at multiple temporal scales (millisecods, seconds, minuted, hours, days, etc.)
  • this fact is 'exploited by the brain to optimize its predictions
  • ...resulting in a 'transcription' of temporal structure in the environment into anatomical structure
  • this transcription takes the form of a 'hierarchy' or 'gradient', where lower-level brain regions (closer to sensory systems) represent structure at faster temporal scales, and higher-level regions at represent structure at slower temporal scales
  • a natural mathematical description of such a system is a hierarchy of dynamical systems at different time scales with the higher (slower) systems specifying the manifold on which the faster systems unfold
  • (the primary 'result' of the paper is a proof-of-principle mathematical model showing how this could work, including a demonstration that the model is able to learn certain structures in bird song )  
* (this is the so-called 'free energy principle', Karl Friston's much-discussed 'theory of everything')



Figure 1 of Kiebel et al. 2009




Figure 4 of Kiebel et al. 2009


Initially I wasn't sure how the separation of temporal scales idea fits in with the conventional conception of the visual processing hierarchy, where progressively more abstract types of structure are processed at progressively higher brain regions in the occipito-temporal processing stream - all of which operates (at least in the majority of experimental paradigms!) on static retinal images, with no temporal structure at all. This was cleared up for me in the supporting materials, which contain (somewhat unusually) a lengthy literature review of evidence from different domains showing that the cortex does indeed show the hypothesised rostro-caudal gradient of representational time-scales. In short: properties of higher-order visual representations like orientation and motion invariance, non-retinotopicity, etc. are in general insensitive to the rapidly changing edges and contrasts of immediate sensory input, even if this rapid change is not typically included in experimental paradigms.

I think these ideas make the most sense in the auditory domain, particularly language processing, which is well-understood to contain structure at multiple temporal scales. The authors clearly recognize this, as birdsong is in many ways a simple spoken language model. In a more recent frontiers in neuroinformatics paper that is classed as a review but also describes some new results, the authors develop the model and the general argument in the direction of artificial speech recognition, making firmer contact with an already extant literature on hierarchical models in this area.

I think language researchers will be quite comfortable and familiar with the idea that higher order language regions track temporal dependencies at longer time scales; it's almost a truism. This is certainly how I understand the computational role of LIFG / Broca's region in sentence-level syntax, one of the main research foci at the CSLB. I'm not sure how the Zatorreian idea of right hemisphere regions being more specialized for longer-duration auditory features such as prosody and music fits in with the rostro-caudal gradient idea.

A final thought: perhaps there may be some relevance here to 'PASA' - the 'posterior to anterior shift in ageing (Cabeza, Davis, etc.). This refers to the observation that older adults generally engage more anterior and less posterior areas across a range of cognitive functions. This is generally understood in terms of 'compensation', utilizing more 'cognitive resources', or using different 'cognitive strategies'. I personally find these somewhat unsatisfactory as explanatory constructs. One possibility that could be explored, then, is that ageing brings with it a preference towards utilizing slower-changing aspects of the environment to inform perception, decision making, and action.

One thing is certain: whether or not the Kiebel et al. slow+fast dynamic hierarchies idea holds sway in teh long term, predictive coding (the less jazzy and more generic uncle of  Fristonian free energy) is very en vogue at the moment, and I suspect will work itself into the core neuroscience dogma before too long.


Main refs:

Kiebel, S.J., Daunizeau, J. & Friston, K.J. A Hierarchy of Time-Scales and the Brain. PLoS Computational Biology 4, 12 (2008).

Kiebel, S.J., Daunizeau, J. & Friston, K.J. Perception and Hierarchical Dynamics. Frontiers in neuroinformatics 3, 9 (2009).
 
Davis, S.W., Dennis, N.A., Daselaar, S.M., Fleck, M.S. & Cabeza, R. Que PASA? The posterior-anterior shift in aging. Cerebral Cortex 18, 1201-1209 (2008).