Tuesday, 23 April 2013

In the news: Decoding dreams with fMRI

Recently Horikawa and colleagues from ATR Computational Neuroscience Laboratories, in Kyoto (Japan), caused a media sensation with the publication of the study in Science that shows first-time proof-of-principle that non-invasive brain scanning (fMRI) can be used to decode dreams. Rumblings were already heard in various media circles after Yuki Kamitani presented their initial findings at the annual meeting of the Society for Neuroscience in New Orleans last year [see Mo Costandi's report]. But now the peer-reviewed paper is officially published, the press releases have gone out and the journal embargo has been lifted, there was a media frenzy [e.g., here, here and here]. The idea of reading people's dreams was always bound to attract a lot of media attention.

OK, so this study is cool. OK, very cool - what could be cooler than reading people's dreams while they sleep!? But is this just a clever parlour trick, using expensive brain imaging equipment? What does it tell us about the brain, and how it works?

First, to get beyond the hype, we need to understand exactly what they have, and have not, achieved in this study. Research participants were put into the narrow bore of an fMRI for a series of mid afternoon naps (up to 10 sessions in total). With the aid of simultaneous EEG recordings, the researchers were able to detect when their volunteers had slipped off into the earliest stage of sleep (stage 1 or 2). At this point, they were woken and questioned about any dream that they could remember, before being allowed to go back to sleep again. That is, until the EEG next registered evidence of early stage sleep again, and then again they were awoken, questioned, and allowed back to sleep. So on and so forth, until they had recorded at least 200 distinct awakenings.

After all the sleep data were collected, the experimenters then analysed the verbal dream reports using a semantic network analysis (WordNet) to help organise the contents of the dreams their participants had experience during the brain scans. The results of this analysis could then be used to systematically label dream content associated with the sleep-related brain activity they had recorded earlier.

Having identified the kind of things their participants had been dreaming about in the scanner, the researchers then searched for actual visual images that best matched the reported content of dreams. Scouring the internet, the researchers built up a vast database of images that more or less corresponded to the contents of the reported dreams. In a second phase of the experiment, the same participants were scanned again, but this time they were fully awake and asked to view the collection of images that were chosen to match their previous dream content. These scans provided the research team with individualised measures of brain activity associated with specific visual scenes. Once these patterns had been mapped, the experimenters returned to the sleep data, using the normal waking perception data as a reference map.

If it looks like a duck...

In the simplest possible terms, if the pattern of activity measured during one dream looks more like activity associated with viewing a person, compared to activity associated with seeing an empty street scene, then you should say that the dream probably contains a person, if you were forced to guess. This is the essence of their decoding algorithm. They use sophisticated ways to characterise patterns in fMRI activity (support vector machine), but essentially the idea is simply to match up, as best they can, the brain patterns observed during sleep with those measures during wakeful viewing of corresponding images. Their published result is shown on the right for different areas of the brain's visual system. Lower visual cortex (LVC) includes primary visual cortex (V1), and areas V2 and V3; whereas higher visual cortex (HVC) includes lateral occipital complex (LOC), fusiform face area (FFA) and parahippocampal place area (PPA).

Below is a more creative reconstruction of this result. The researchers have put together a movie based on one set of sleep data taken before waking. Each frame represents the visual image from their database that best matches the current pattern of brain activity. Note, the reason why the image gets clearer towards the end of the movie is because the brain activity is nearer to the time point at which the participants were woken, and therefore were more likely to be described at waking. If the content at other times did not make it into the verbal report, then the dream activity would be difficult to classify because the corresponding waking data would not have been entered into the image database. This highlights how this approach only really works for content that has been characterised using the waking visual perception data.      


videoOK, so these scientists have decoded dreams. The accuracy is hardly perfect, but still, the results are significantly above chance, and that's no mean feat. In fact, it has never been done before. But some might still say, so what? Have we learned anything very new about the brain? Or is this just a lot of neurohype?

Well, beyond the tour de force technical achievement of actually collecting this kind of multi-session simultaneous fMRI/EEG sleep data, these results also provide valuable insights into how dreams are represented in the brain. As in many neural decoding studies, the true purpose of the classifier is not really to make perfectly accurate predictions, but rather to work out how the brain represented information by studying how patterns of brain activity differ between conditions [see previous post]. For example, are there different patterns of visual activity during different types of dreams? Technically, this could be tested by just looking for any difference in activity patterns associated with different dream content. In machine-learning language, this could be done using a cross-validated classification algorithm. If a classifier trained to discriminate activity patterns associated with known dream states can then make accurate predictions of new dreams, then it is safe to assume that there are reliable differences in activity patterns between the two conditions. However, this only tells you that activity in a specific brain area is different between conditions. In this study, they go one step further.

By training the dream decoder using only patterns of activity associated with the visual perception of actual images, they can also test whether there is a systematic relationship between the way dreams are presented, and how actual everyday perception is represented in the brain. This cross-generalisation approach helps isolate the shared features between the two phenomenological states. In my own research, we have used this approach to show that visual imagery during normal waking selectively activates patterns in high-level visual areas (lateral occipital complex: LOC) that are very similar to the patterns associated with directly viewing the same stimulus (Stokes et al., 2009, J Neurosci). The same approach can be used to test for other coding principles, including high-order properties such as position-invariance (Stokes et al., 2011, NeuroImage), or the pictorial nature of dreams, as studied here. As in our previous findings during waking imagery, Horikawa et al show that the visual content of dreams shares similar coding principles to direct perception in higher visual brain areas. Further research, using a broader base of comparisons, will provide deeper insights into the representational structure of these inherently subject and private experiences.

Many barriers remain for an all-purpose dream decoder

When the media first picked up this story, the main question I was asked went something like: are scientists going to be able to build dream decoders? In principle, yes, this result shows that a well trained algorithm, given good brain data, is able to decode the some of the content of dreams. But as always, there are plenty of caveats and qualifiers.

Firstly, the idea of downloading people's dreams while they sleep is still a very long way off. This study shows that, in principle, it is possible to use patterns of brain activity to infer the contents of peoples dreams, but only at a relatively coarse resolution. For example, it might be possible to distinguish between patterns of activity associated with a dream containing people or an empty street, but it is another thing entirely to decode which person, or which street, not to mention all the other nuances that make dreams so interesting.

To boost the 'dream resolution' of any viable decoding machine, the engineer would need to scan participants for much MUCH longer, using many more visual exemplars to build up an enormous database of brain scans to use as a reference for interpreting more subtle dream patterns. In this study, the researchers took advantage of prior knowledge of specific dream content to limit their database to a manageable size. By verbally assessing the content of dreams first, they were able to focus on just a relatively small subset of all the possible dream content one could imagine. If you wanted to build an all-purpose dream decoder, you would need an effectively infinite database, unless you could discover a clever way to generalise from a finite set of exemplars to reconstruct infinitely novel content. This is an exciting area of active research (e.g., see here).

Another major barrier to a commercially available model is that you would also need to characterise this data for each individual person. Everyone's brain is different, unique at birth and further shaped by individual experiences. There is no reason to believe that we could build a reliable machine to read dreams without taking this kind of individual variability into account. Each dream machine would have to be tuned to each person's brain.


Finally, it is also worth noting that the method that was used in this experiment requires some pretty expensive and unwieldy machinery. Even if all the challenges set out above were solved, it is unlikely that dream readers for the home will be hitting the shelves any time soon. Other cheaper, and more portable methods for measuring brain activity, such as EEG, can only really be used to identify difference sleep stages, not what goes on inside them. Electrodes placed directly into the brain could be more effective, but at the cost of invasive brain surgery.


For the moment, it is probably better just to keep a dream journal.

Reference:


Horikawa, Tamaki, Miyawaki & Kamitani (2013) Neural Decoding of Visual Imagery During Sleep, Science [here]

Tuesday, 16 April 2013

Statistical power is truth power

This week, Nature Reviews Neuroscience published an important article by Kate Button and colleagues quantifying the extent to which experiments in neuroscience may be statistically underpowered. For a number of excellent, and accessible summaries of the research, see here, here, here and this one in the Guardian from the lead author of the research.

The basic message is clear - collect more data! Data collection is expensive, and time consuming, but underpowered experiments are a waste of both time and money. Noisy data will decrease the likelihood detecting important effects (false negative), which is obviously disappointing for all concerned. But noisy datasets are also more likely to be over-interpreted, as the disheartened experimenter attempts to find something interesting to report. With enough time, and effort, trying lots of different analyses, something 'worth reporting' will inevitably emerge, even by chance (false positive). Put a thousand monkeys to a thousand typewriters, or leave an enthusiastic researcher alone long enough with a noisy data set, and eventually something that reads like a coherent story will emerge. If you are really lucky (and/or determined), it might even sound like a pretty good story, and end up published in a high-impact journal.

This is the classic Type 1 error, the bogeyman of undergraduate Statistics 101. But the problem of  false positives is very real, and continues to plague empirical research, from biological oncology to social psychology. Failure to replicate published results is the diagnostic marker of a systematic failure to separate signal from noise.

There are many bad scientific practices that increase the likelihood of false positives entering the literature, such as peeking, parameter tweaking, and publication bias, and there are some excellent initiatives out there to clean up these common forms of bad research practice. For example, Cortex has introduced a Registered Report format that should bring some rigour back to hypothesis testing, Psychological Science in now hoping to encourage replications and Nature Neuroscience has drawn up clearer guidelines to improve statistical practices.

These are all excellent initiatives, but I think we also need to consider simply increasing the margin of error. In a previous post, I argued that the accepted statistical threshold is far too lax. A 1-in-20 false discovery rate already seems absurdly permissive, but if we consider in all the other factors that invalidate basic statistical assumptions, then the true rate of false positives must be extremely high (perhaps 'Why Most Published Research Findings are False'). To increase the safety margin seems like an obvious first step to improving the reliability of published findings.

The downside, of course, to a more stringent threshold for separating signal from noise is that it demands a lot more data. Obviously, this will reduce the total number of experiments that can be conducted for the same amount of money. But as I recently argue in the Guardian, science on a shoestring budget can lead to more harm than good. If the research is important enough to fund, then it is even more important that it is funded properly. Spreading resources too thinly will only add noise and confusion to the process, leading further research down expensive and time-consuming blind alleys opened up by false positives.

So, the take home message is simple - collect more data! But how much more?

Matt Wall recently posted his thoughts on power analyses. These are standardised procedures for estimating the probability that you will be able to detect a significant effect, given a certain effect size and variance, for a given number of subjects. This approach is used widely for planning clinical studies, and is essentially the metric that Kate and colleagues use for demonstrate the systematic lack of statistical power in the neuroscience literature. But there's an obvious catch 22, as Matt points out. How are you supposed to know the effect size (and variance) if you haven't done the experiment? Indeed, isn't that exactly why you have proposed to conduct the experiment? To sample the distribution for an estimate of effect size (and variance)? Also, in a typical experiment, you might be interested in a number of possible effects, so which one do you base your power analysis on?

I tend to think that power analysis is best served for clinical studies, in which there is already a clear idea of the effect size you should be looking for (as it is bounded by practical concerns of clinical relevance). In contrast, basic science is often interested in whether there is an effect, in principle. Even if very small, it could be of major theoretical interest. In this case, there may be no lower bound effect size to impose, so without pre-cognition, it seems difficult to see how to establish the necessary sample size. Power calculations would clearly benefit replication studies, but it difficult to see how they could be applied for planning new experiments. Researchers can make a show of power calculations, by basing effect size estimations on some randomly selected previous study, but this is clearly a pointless exercise.

Instead, researchers often adopt rules of thumb, but I think the new rule of thumb should be: double your old rule of thumb! If you were previously content with 20 participants for fMRI, then perhaps you should recruit 40. If you have always relied on 100 cells, then perhaps you should collect data from 200 cells instead. Yes, these are essentially still just numbers, but there is nothing arbitrary about improving statistical power. And you can be absolutely sure that the extra time and effort (and cost) will pay dividends in the long run. You will spend less time analysing your data trying to find something interesting to report, and you will be less likely to send some other research down the miserable path of persistent failures to replicate your published false positive.


Tuesday, 12 March 2013

Book review: Hallucinations by Oliver Sacks

I read Hallucinations over the Christmas break, and have been meaning to post a book review ever since. Oliver Sacks will be discussing his book tomorrow at Warwick University, where he is currently a visiting professor. I have booked my seat, and am looking forward to it. I will post my review of his talk, and anything new I learn at the discussion soon.

Sunday, 24 February 2013

Research Briefing: Attention restores forgotten items to visual short-term memory

Our paper, just out in Psychological Science, describes the final series of experiments conducted by Alexandra Murray during her PhD with Kia Nobre and myself at the Department of Experimental Psychology, Oxford University. Building on previous research by Kia and others in the Brain and Cognition Lab, these studies were designed to test how selective attention modulates information being held in mind, in a format known as visual short-term memory (VSTM).

Typically, VSTM is thought of as a temporary buffer for storing a select subset of information extracted during perceptual processing. This buffer is typically assumed to be insulated from the constant flux of sensory input streaming continuously into the brain, allowing the most important information to be held in mind beyond the duration of sensory stimulation. This way, VSTM enables us to use visual information to achieve longer-term goals, helping to free us from direct stimulus-response contingencies (right).

Previous studies have shown that attention is important for keeping visual information in mind. For example, Ed Awh and colleagues have suggested that selective attention is crucial for rehearsing spatial information in VSTM, just like inner speech helps us keep a telephone number in mind. Our results described in this paper further suggest that attention is not simply a mechanisms for maintenance, but is also important for converting information into a retrievable format.

In long term-memory research, retrieval mechanisms are often considered as important to memory performance as the storage format. It is all well and good if the information is stored, but to what end if it cannot be retrieved? We think that retrieval is also important in VSTM - valuable information could be stored in short-term traces that are not directly available for memory retrieval. In this study, we show that attention can be directed to such memory traces to convert them into a format that is easier to use (i.e., retrieve). In this respect, attention can be used to restore information to VSTM for accurate recall.

We combined behavioural and psychophysical approaches to show that attention, directed to memory items about one second after they had been presented, increases the discrete probability of recall, rather than a more perceptual improvement in the precision of recall judgements (for relevant methods, see also here). This combination of approaches was necessary to infer a discrete state transition between retrievable and non-retrievable formats.

Next step? Tom Hartley asked on twitter: what happened to the unattended items in memory? We did not address this question in this study, and the current literature presents a mixed picture, some suggesting the attention during maintenance impairs memory for unattended items (see), whereas others find no such suppression effect (see). It is possible that differences in strategy could account for some of the confusion.

To test the effect on unattended items in behavioural studies, researchers typically probe memory for unattended items every so often. This presents a contradiction to the participant - sometimes uncued items will be relevant for task performance, therefore individuals need to decide on an optimal strategy (i.e., how much attention to allocate to uncued items, just in case...). A cleaner approach is to use brain imaging to measure the neural consequence for unattended items. The principal advantage is that you don't need to confuse your participants with a mixed message: attend to the cued item, even though we might ask you about one of the other ones!!

References:

Awh & Jonides (2001) Overlapping mechanisms of attention and spatial working memory. TICS (pdf)

Bays & Husain (2008) Dynamic shifts of limited working memory resources in human vision. Science (pdf)

Landman, Spekreijse, & Lamme (2003). Large capacity storage of integrated objects before change blindness. Vision Research (link).

Matsukura, Luck, & Vecera (2007). Attention effects during visual short-term memory maintenance: Protection or prioritization? Perception & Psychophysics (link).

Murray, Nobre, Clark, Cravo & Stokes (2013) Attention Restores Discrete Items to Visual Short-Term Memory. Psychological Science (pdf)




Saturday, 23 February 2013

Biased Debugging


We all make mistakes - Russ Poldrack's recent blog post is an excellent example of how even the most experienced scientists are liable to miss a malicious bug in complex code. It could be the mental equivalent of missing a single double negative in a 10,000 word essay, or a split-infinite that Microsoft word fails to detect or even a bald-faced typo underlined in red that remains unnoticed by the over-familiar eyes of the author.

In the case reported by Russ last week, although there was an error in the analysis, the actual result fit their experimental hypothesis and slipped through undetected. It was only when someone else independently analysed the same data, but failed to reproduce the exact result, that alarm bells sounded. Luckily, in this case the error was detected before anything was committed to print, but the warning is clear. Obviously, we need to be more careful, and cross-check our results more carefully.

Here, I argue that we also need to think a bit more carefully about bias in the debugging process. Almost certainly, it was no coincidence that Russ's undetected error also yielded a result that was consistent with the experimental hypothesis. I argue that the debugging process is inherently biased, and will tend to seek out false positive findings that conform to our prior hopes and expectations.

Data analysis is noisy


Writing complex customised analysis routines is crucial in leading-edge scientific research, but is also error prone. Perfect coding is as unrealistic as perfect prose - errors are simply part of the creative process. When composing a manuscript, we may have multiple co-authors to help proofread numerous versions of the paper, and yet even then we often find a few persistent grammatical errors, split infinitives, double negatives slip through the net. Analysis scripts, however, are less often so well scrutinised, line by line, variable by variable.

If lucky, coding errors just cause our analyses to crash, or throw up a clearly outrageous result. Either way, we will know that we have made a mistake, and roughly where we erred - we can then switch directly to debugging mode. But what if the erroneous result looks sensible? Just by chance, what if the spurious result supports your experimental hypothesis? What are the chances that you will continue to search for errors in your code when the results make perfect sense?

Your analysis script might contain hundreds of lines of code, and even if you do go through each one, we are notoriously bad at detecting errors in familiar script. Just think of the last time you asked someone else to read draft prose because you had become blind to typos in the text that you have read a million times before. By that stage, you know exactly what the text should say, and that is the only thing you can read any more. Unless you recruit fresh eyes from a willing proofreader, or your attention is directed to specific candidate errors, you will be pretty bad at seeing even blatant mistakes right in front of you.

Debugging is non-random


OK, analysis is noisy - so what? Data are noisy too, isn't it all just part of the messy business of empirical science? Perhaps, but the real problem is that the noise is not random. On the contrary, debugging is systematically biased to favour results that conform to our prior hopes and expectations, that is, our theoretical hypotheses.

If an error yields a plausible result by chance, it is far less likely to be detected and corrected than if the error throws up a crazy result. Worse, if the result is not even crazy, but just non-significant or otherwise 'uninteresting', then the dejected researcher will presumably spend longer looking for potential mistakes that could 'explain' the 'failed analysis'. In contrast, if the results looks just fine, why rock the boat? This is like a drunkard's walk that veers systematically toward wine bottles to the left, and away from police to the right.

More degrees of freedom for generating false positives


With recent interest in myriad bad practises that boost false positive rates far beyond the assumed statistical probabilities (e.g., see Alok Jha's piece in the Guardian), I suggest that biased debugging could also contribute to the proliferation of false positives in the literature, especially in the neuroimaging literature. Biased debugging is also perhaps more insidious, because the pull towards false positives is not as obvious in debugging as it is with cherry-picking, data peeking, etc. Moreover, it is perhaps less obvious how to avoid the bias in debugging practices. As Russ notes in his post, code sharing is a good start, but it is not sufficient - errors can remain undetected even in shared code, especially if not widely used. The best possible safeguard is independent reanalysis - to reproduced identical results using independently written analysis scripts. In this respect, it is more important to share the data rather than the analysis scripts, which should not be re-run with blind faith!


See also: http://www.russpoldrack.org/2013/02/anatomy-of-coding-error.html

Thursday, 17 January 2013

Research Briefing: Targeting "silent" brain areas with TMS


A major challenge in neuroscience is how to study brain processes that are securely encased within the skull. Over the last twenty years, there has been enormous progress in non-invasive brain imaging methods. In particular, functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) allow researchers to measure brain activity from outside the head.

Although brain imaging methods allow us to peer inside the head and watch the brain in action, we also need to be able to perturb brain function to understand more fully what observed brain activity is actually doing. We will never understand the brain by just watching it - we also need to be able to poke around to see what happens when certain processes are disrupted. In formal terms, we can only verify causality by disrupting brain activity and observing the consequences.

The most effective method for non-invasive brain disruption is transcranial magnetic stimulation (TMS). TMS is able to disrupt brain activity by delivering a focal magnetic pulse to the overlying scalp surface. The magnetic field passes through the scalp and skull, stimulating brain cells, thereby disrupting brain function.

TMS is the only method currently available in human neuroscience to disrupt specific brain areas and measure the consequence on brain function. TMS has been in use in labs across the world for more than 25 years, and sophisticated methods have been developed for targeting specific brain areas (see neuronavigation, pictured right). Nevertheless, it remain relatively unclear exactly how best to set stimulate intensity.

Setting the right stimulation level is essential for safe and effective use of TMS. Over-stimulation can cause adverse effects, such as seizure. From an experimental point of view, over-stimulation also reduces the focality of disruption, therefore complicating the interpretation of any effects. On the other hand, under-stimulation could compromise treatment in clinical settings, and lead to false negative results in research. Poor control over the stimulation intensity also compromises experimental comparisons between treatment conditions.

In a series of methodological studies performed with Chris Chambers and others, we previously explored the effect of skull thickness on brain stimulation. It is well known that the flux density of a magnetic field declines as a function of distance. As a direct consequence, if people have thicker skulls, they will require a higher intensity field at the scalp surface to activate underlying brain areas. To quantify this dependency, we varied TMS distance over motor cortex.


When TMS is applied to primary motor cortex, stimulation triggers a twitch in the muscle associated with the stimulated portion of the motor map (pictured left). This an extremely reliable and repeatable effect, and therefore provides a very useful tool for assessing the effect of TMS. We simply varied distance between the stimulation coil and the target brain region to characterise the relationship between distance and TMS effect (pictured right). From these initial studies, we suggested that TMS protocols could be usefully calibrated at motor cortex, and corrected for distance to derive a distance-independent estimate of cortical excitability. Distance-corrected levels could then be used to determine the appropriate stimulation intensity for 'silent' brain areas, such as non-motor brain areas for which there is no simple index of effective stimulation.

However, distance adjusted TMS still relies on the assumption that individual differences in response to TMS are due to variations in a general factor of cortical excitability. In this new study we tested this key assumption. We compared peoples' sensitivity to stimulation of motor cortex with stimulation of their visual cortex (indexed by a visual percept known as a phosphene). We found a systematic relationship between individual differences in sensitivity across stimulation sites, consistent with the idea that a common factor of cortical excitability might account for individual differences in the response to TMS.

In conclusion, this research suggests that TMS intensity can be calibrated to distance adjusted motor threshold, and applied to other brain areas. For further information, please see our paper here, or contact me directly.


References

Stokes, Barker, Dervinis, Verbruggen, Maizey, Adams & Chambers (2013) Biophysical Determinants of Transcranial Magnetic Stimulation: Effects of Excitability and Depth of Targeted Area. Journal of Neurophysiology, 109: 437– 444 [pdf]

Stokes, Chambers, Gould, English, McNaught, McDonald & Mattingley (2007) Distance-adjusted motor threshold for transcranial magnetic stimulation. Clinical Neurophysiology, 118(7): 1617-1625 [pdf]

Stokes, Chambers, Gould, Henderson, Janko, Allen & Mattingley (2005) A simple metric for scaling motor threshold based on scalp-cortex distance: application to studies using transcranial magnetic stimulation. Journal of Neurophysiology, 94(6): 4520-4527 [pdf]

Saturday, 5 January 2013

Helium and Neuroscience

Modern cognitive neuroscience critically depends on helium. The most advanced methods for non-invasive brain imaging, function magnetic resonance imaging (fMRI) and magnetoencephalography (MEG), operate at near absolute zero (~4° Kelvin). This operating temperature can only be maintained with liquid helium. Although helium is the second most abundant element in the universe, helium supplies are strictly limited on Earth.

Recently, global helium shortages have forced many MEG centres into temporary shut down. MRI facilities have so far been less affected, because they require less frequent helium re-fills. But if the situation was to get much worse, then even MRI centres will be forced to shut down. Cooling down the magnet at the heart of MRI can cause major structural damage, potentially requiring a complete refit.

Writing for the The Independent, science editor Steven Conner explains some of the key factors at play [here]. In an accompanying piece, I  provide some more specific details of how recent shortages have affected our research at the Oxford Centre for Human Brain Activity [here]. It is impossible to predict how neuroscience methods will have advanced by the time the world's supply has been depleted in the next 30 years or so, but let's hope we have found new methods for non-invasive brain imaging that don't depend on an unavailable element.

References:
The Independent: A ballooning problem: the great helium shortage
The Independent: Our research is on ice due to shortage of helium