Tuesday 23 April 2013

In the news: Decoding dreams with fMRI

Recently Horikawa and colleagues from ATR Computational Neuroscience Laboratories, in Kyoto (Japan), caused a media sensation with the publication of the study in Science that shows first-time proof-of-principle that non-invasive brain scanning (fMRI) can be used to decode dreams. Rumblings were already heard in various media circles after Yuki Kamitani presented their initial findings at the annual meeting of the Society for Neuroscience in New Orleans last year [see Mo Costandi's report]. But now the peer-reviewed paper is officially published, the press releases have gone out and the journal embargo has been lifted, there was a media frenzy [e.g., here, here and here]. The idea of reading people's dreams was always bound to attract a lot of media attention.

OK, so this study is cool. OK, very cool - what could be cooler than reading people's dreams while they sleep!? But is this just a clever parlour trick, using expensive brain imaging equipment? What does it tell us about the brain, and how it works?

First, to get beyond the hype, we need to understand exactly what they have, and have not, achieved in this study. Research participants were put into the narrow bore of an fMRI for a series of mid afternoon naps (up to 10 sessions in total). With the aid of simultaneous EEG recordings, the researchers were able to detect when their volunteers had slipped off into the earliest stage of sleep (stage 1 or 2). At this point, they were woken and questioned about any dream that they could remember, before being allowed to go back to sleep again. That is, until the EEG next registered evidence of early stage sleep again, and then again they were awoken, questioned, and allowed back to sleep. So on and so forth, until they had recorded at least 200 distinct awakenings.

After all the sleep data were collected, the experimenters then analysed the verbal dream reports using a semantic network analysis (WordNet) to help organise the contents of the dreams their participants had experience during the brain scans. The results of this analysis could then be used to systematically label dream content associated with the sleep-related brain activity they had recorded earlier.

Having identified the kind of things their participants had been dreaming about in the scanner, the researchers then searched for actual visual images that best matched the reported content of dreams. Scouring the internet, the researchers built up a vast database of images that more or less corresponded to the contents of the reported dreams. In a second phase of the experiment, the same participants were scanned again, but this time they were fully awake and asked to view the collection of images that were chosen to match their previous dream content. These scans provided the research team with individualised measures of brain activity associated with specific visual scenes. Once these patterns had been mapped, the experimenters returned to the sleep data, using the normal waking perception data as a reference map.

If it looks like a duck...

In the simplest possible terms, if the pattern of activity measured during one dream looks more like activity associated with viewing a person, compared to activity associated with seeing an empty street scene, then you should say that the dream probably contains a person, if you were forced to guess. This is the essence of their decoding algorithm. They use sophisticated ways to characterise patterns in fMRI activity (support vector machine), but essentially the idea is simply to match up, as best they can, the brain patterns observed during sleep with those measures during wakeful viewing of corresponding images. Their published result is shown on the right for different areas of the brain's visual system. Lower visual cortex (LVC) includes primary visual cortex (V1), and areas V2 and V3; whereas higher visual cortex (HVC) includes lateral occipital complex (LOC), fusiform face area (FFA) and parahippocampal place area (PPA).

Below is a more creative reconstruction of this result. The researchers have put together a movie based on one set of sleep data taken before waking. Each frame represents the visual image from their database that best matches the current pattern of brain activity. Note, the reason why the image gets clearer towards the end of the movie is because the brain activity is nearer to the time point at which the participants were woken, and therefore were more likely to be described at waking. If the content at other times did not make it into the verbal report, then the dream activity would be difficult to classify because the corresponding waking data would not have been entered into the image database. This highlights how this approach only really works for content that has been characterised using the waking visual perception data.      

OK, so these scientists have decoded dreams. The accuracy is hardly perfect, but still, the results are significantly above chance, and that's no mean feat. In fact, it has never been done before. But some might still say, so what? Have we learned anything very new about the brain? Or is this just a lot of neurohype?

Well, beyond the tour de force technical achievement of actually collecting this kind of multi-session simultaneous fMRI/EEG sleep data, these results also provide valuable insights into how dreams are represented in the brain. As in many neural decoding studies, the true purpose of the classifier is not really to make perfectly accurate predictions, but rather to work out how the brain represented information by studying how patterns of brain activity differ between conditions [see previous post]. For example, are there different patterns of visual activity during different types of dreams? Technically, this could be tested by just looking for any difference in activity patterns associated with different dream content. In machine-learning language, this could be done using a cross-validated classification algorithm. If a classifier trained to discriminate activity patterns associated with known dream states can then make accurate predictions of new dreams, then it is safe to assume that there are reliable differences in activity patterns between the two conditions. However, this only tells you that activity in a specific brain area is different between conditions. In this study, they go one step further.

By training the dream decoder using only patterns of activity associated with the visual perception of actual images, they can also test whether there is a systematic relationship between the way dreams are presented, and how actual everyday perception is represented in the brain. This cross-generalisation approach helps isolate the shared features between the two phenomenological states. In my own research, we have used this approach to show that visual imagery during normal waking selectively activates patterns in high-level visual areas (lateral occipital complex: LOC) that are very similar to the patterns associated with directly viewing the same stimulus (Stokes et al., 2009, J Neurosci). The same approach can be used to test for other coding principles, including high-order properties such as position-invariance (Stokes et al., 2011, NeuroImage), or the pictorial nature of dreams, as studied here. As in our previous findings during waking imagery, Horikawa et al show that the visual content of dreams shares similar coding principles to direct perception in higher visual brain areas. Further research, using a broader base of comparisons, will provide deeper insights into the representational structure of these inherently subject and private experiences.

Many barriers remain for an all-purpose dream decoder

When the media first picked up this story, the main question I was asked went something like: are scientists going to be able to build dream decoders? In principle, yes, this result shows that a well trained algorithm, given good brain data, is able to decode the some of the content of dreams. But as always, there are plenty of caveats and qualifiers.

Firstly, the idea of downloading people's dreams while they sleep is still a very long way off. This study shows that, in principle, it is possible to use patterns of brain activity to infer the contents of peoples dreams, but only at a relatively coarse resolution. For example, it might be possible to distinguish between patterns of activity associated with a dream containing people or an empty street, but it is another thing entirely to decode which person, or which street, not to mention all the other nuances that make dreams so interesting.

To boost the 'dream resolution' of any viable decoding machine, the engineer would need to scan participants for much MUCH longer, using many more visual exemplars to build up an enormous database of brain scans to use as a reference for interpreting more subtle dream patterns. In this study, the researchers took advantage of prior knowledge of specific dream content to limit their database to a manageable size. By verbally assessing the content of dreams first, they were able to focus on just a relatively small subset of all the possible dream content one could imagine. If you wanted to build an all-purpose dream decoder, you would need an effectively infinite database, unless you could discover a clever way to generalise from a finite set of exemplars to reconstruct infinitely novel content. This is an exciting area of active research (e.g., see here).

Another major barrier to a commercially available model is that you would also need to characterise this data for each individual person. Everyone's brain is different, unique at birth and further shaped by individual experiences. There is no reason to believe that we could build a reliable machine to read dreams without taking this kind of individual variability into account. Each dream machine would have to be tuned to each person's brain.

Finally, it is also worth noting that the method that was used in this experiment requires some pretty expensive and unwieldy machinery. Even if all the challenges set out above were solved, it is unlikely that dream readers for the home will be hitting the shelves any time soon. Other cheaper, and more portable methods for measuring brain activity, such as EEG, can only really be used to identify difference sleep stages, not what goes on inside them. Electrodes placed directly into the brain could be more effective, but at the cost of invasive brain surgery.

For the moment, it is probably better just to keep a dream journal.


Horikawa, Tamaki, Miyawaki & Kamitani (2013) Neural Decoding of Visual Imagery During Sleep, Science [here]

Tuesday 16 April 2013

Statistical power is truth power

This week, Nature Reviews Neuroscience published an important article by Kate Button and colleagues quantifying the extent to which experiments in neuroscience may be statistically underpowered. For a number of excellent, and accessible summaries of the research, see here, here, here and this one in the Guardian from the lead author of the research.

The basic message is clear - collect more data! Data collection is expensive, and time consuming, but underpowered experiments are a waste of both time and money. Noisy data will decrease the likelihood detecting important effects (false negative), which is obviously disappointing for all concerned. But noisy datasets are also more likely to be over-interpreted, as the disheartened experimenter attempts to find something interesting to report. With enough time, and effort, trying lots of different analyses, something 'worth reporting' will inevitably emerge, even by chance (false positive). Put a thousand monkeys to a thousand typewriters, or leave an enthusiastic researcher alone long enough with a noisy data set, and eventually something that reads like a coherent story will emerge. If you are really lucky (and/or determined), it might even sound like a pretty good story, and end up published in a high-impact journal.

This is the classic Type 1 error, the bogeyman of undergraduate Statistics 101. But the problem of  false positives is very real, and continues to plague empirical research, from biological oncology to social psychology. Failure to replicate published results is the diagnostic marker of a systematic failure to separate signal from noise.

There are many bad scientific practices that increase the likelihood of false positives entering the literature, such as peeking, parameter tweaking, and publication bias, and there are some excellent initiatives out there to clean up these common forms of bad research practice. For example, Cortex has introduced a Registered Report format that should bring some rigour back to hypothesis testing, Psychological Science in now hoping to encourage replications and Nature Neuroscience has drawn up clearer guidelines to improve statistical practices.

These are all excellent initiatives, but I think we also need to consider simply increasing the margin of error. In a previous post, I argued that the accepted statistical threshold is far too lax. A 1-in-20 false discovery rate already seems absurdly permissive, but if we consider in all the other factors that invalidate basic statistical assumptions, then the true rate of false positives must be extremely high (perhaps 'Why Most Published Research Findings are False'). To increase the safety margin seems like an obvious first step to improving the reliability of published findings.

The downside, of course, to a more stringent threshold for separating signal from noise is that it demands a lot more data. Obviously, this will reduce the total number of experiments that can be conducted for the same amount of money. But as I recently argue in the Guardian, science on a shoestring budget can lead to more harm than good. If the research is important enough to fund, then it is even more important that it is funded properly. Spreading resources too thinly will only add noise and confusion to the process, leading further research down expensive and time-consuming blind alleys opened up by false positives.

So, the take home message is simple - collect more data! But how much more?

Matt Wall recently posted his thoughts on power analyses. These are standardised procedures for estimating the probability that you will be able to detect a significant effect, given a certain effect size and variance, for a given number of subjects. This approach is used widely for planning clinical studies, and is essentially the metric that Kate and colleagues use for demonstrate the systematic lack of statistical power in the neuroscience literature. But there's an obvious catch 22, as Matt points out. How are you supposed to know the effect size (and variance) if you haven't done the experiment? Indeed, isn't that exactly why you have proposed to conduct the experiment? To sample the distribution for an estimate of effect size (and variance)? Also, in a typical experiment, you might be interested in a number of possible effects, so which one do you base your power analysis on?

I tend to think that power analysis is best served for clinical studies, in which there is already a clear idea of the effect size you should be looking for (as it is bounded by practical concerns of clinical relevance). In contrast, basic science is often interested in whether there is an effect, in principle. Even if very small, it could be of major theoretical interest. In this case, there may be no lower bound effect size to impose, so without pre-cognition, it seems difficult to see how to establish the necessary sample size. Power calculations would clearly benefit replication studies, but it difficult to see how they could be applied for planning new experiments. Researchers can make a show of power calculations, by basing effect size estimations on some randomly selected previous study, but this is clearly a pointless exercise.

Instead, researchers often adopt rules of thumb, but I think the new rule of thumb should be: double your old rule of thumb! If you were previously content with 20 participants for fMRI, then perhaps you should recruit 40. If you have always relied on 100 cells, then perhaps you should collect data from 200 cells instead. Yes, these are essentially still just numbers, but there is nothing arbitrary about improving statistical power. And you can be absolutely sure that the extra time and effort (and cost) will pay dividends in the long run. You will spend less time analysing your data trying to find something interesting to report, and you will be less likely to send some other research down the miserable path of persistent failures to replicate your published false positive.