A history of psi in the Ganzfeld

A history of psi in the Ganzfeld

by Andrew Endersby

Introduction to Part 1

The ganzfeld is a state of mild sensory deprevation. It was first devised by German psychologist Metzger in the late 1920’s. In it the subject is put into a situation where his hearing and vision are made uniform. Originally, the subject sat in front of a blank uniformly-lit wall that filled their field of vision. More recently (and in all the experiments into psi) the subject had half ping-pong balls placed over the eyes and light shone on the face to create a diffused homogenous visual stimulus. Audio input in the 1920’s was silence, but recently white noise (static) has been used.

With regards to psi experiments, once in this state, the subject is encouraged to describe what images he can see. Then afterwards he must chose which of several potential targets was the one being “sent” psychically. Most often the number of choices he or she has is four, which gives a mean chance expectation of 25%.

The exploration of psi in the ganzfeld has been one of the most successful aspects of parapsychology in the last thirty years. Since its beginnings in 1974 until today it continues to furnish significant results far beyond what you’d expect by chance. Meta-analyses have collected the results from different laboratories, and combined them to show that in the ganzfeld a 33% hit rate had been acheived as opposed to the 25% rate expected by chance.

Non-parapsychologists such as Rosenthal, a psychologist, Utts, a statistician, and Bem, a phyicist (amongst others) have been strongly swayed of the findings of these experiments. As one of the highest profile protocols in parapsychology, it is also one of the most examined. Skeptics such as C.E.M. Hansel, Hyman and Wiseman have all made attempts at damaging the case against psi in the ganzfeld, all to little avail.

There are three main elements of the argument in favour of the ganzfeld.

The first is Honorton’s meta-analysis of early ganzfeld experiments, covering 1974-1982. Although he restricted himself to 28 out of the then available 42 experiments, he did so because those 28 used the same scoring system. The results gave a 38% hit rate. He estimated that there needed to be 423 other studies scoring at chance to reduce this result to non-significance. In other words: “one psi ganzfeld session per hour for over 6 years, assuming 40-hour weeks and no vacations!” (Honorton, “Meta-analysis of psi ganzfeld research: a response to Hyman”, Journal of Parapsychology 49, 1985). Blackmore investigated the extent of unpublished studies, and in a survey she found that the 31 unpublished studies she found were not less successful than the published ones, and Hyman agreed that this “file-drawer” problem could not account for the results.

The second element is the PRL autoganzfeld trials carried out from 1982 to 1989. These experiments were largely automated to remove the flaws in randomisation and security that had dogged the earlier ganzfeld work. These 11 experiments and 354 trials scored 35%.

The third is the meta-analysis of Bem, Broughton and Palmer, published in 2001. This work was based on an earlier meta-analysis by Milton and Wiseman that failed to find an effect. Bem et al reordered the experiments in the first unsuccessful meta-analysis according to how much they adhered to the typical ganzfeld experiment. They demonstrated that the experiments that were most standard scored at a level that was significantly above chance. The hit rate for all experiments was 30%, and standard experiments scored 31%.

Sheldrake has reported the results of Honorton’s first meta-analysis as being “one thousand million to one against” (Lecture at the Royal Society of Arts, 2004), and similarly Radin has attempted to do a meta-analysis for all studies, claiming odds “… beyond a million billion to one” (The Conscious Universe, 1997).

So, the figures are ipressive, the “file drawer” cannot explain the findings, flaws were identified quickly enough that they can’t inflate the hit rate. Some of the finest skeptic minds have gone to work on it and come away empty handed. So what chance do I have: a non-scientist, writing a non-peer-reviewed article?

The purpose of these articles is to furnish the reader with more detailed information than has previously been available on-line, and also to point out (where possible) various things which puzzle me. Not being a scientist, it may be that my worries are unimportant, but I’ll mention them nevertheless. To be honest, I don’t expect any parapsychologist or scientist to reply to these articles. It’s just that, after five years researching the subject, I decided it was time to make some kind of conclusion.

Part one is a pretty straightforward guide to the earliest experiments, except (for the most part) those published only in the Journal of the American Society of Psychical Research, which I do not have access to). Many of these experiments are included in the first meta-analyses of Hyman and Honorton, which I’ll cover in more detail in Part two (next issue).

Introduction to Part 2

The Ganzfeld Debate, 1985-86.

So if I am to be hung by my own petard, it’ll be here. After keeping a distance in Part 1, I now enter a little more into the reporting of this early discussion of the ganzfeld database as a whole. Readers are forewarned that my sceptical views coloured my view of the debate, even if that includes me thinking that most of Hyman’s arguments against psi in the ganzfeld are dead-ends.

The statistics involved here are so dense that it is often tempting to take the softer option of giving up and moving on. Here people rely more on statistical means of interpreting the data more than before. In the past a simple head-count of significant experiments seemed enough, but now factor analyses, Stouffer zs and effect sizes d and h start to be used. In an attempt to keep things simple, I’ve edited out the less intuitive statistical measures, such as t numbers and effect sizes, leaving instead p numbers (where the decimal value equals a percentage probability, ie, p=0.05 is the same as 5% which is the same as 1 in 20) and z-scores (not so intuitive, since it’s based on the standard deviation which is not immediately obvious to any reader, but a rule of thumb is that 1.65 is significant, ie, equal to 1 in 20).

I try to shy away from any statistical analysis of my own. What I have done on occasion, however, is calculate an “equivalent hit rate” for those experiments that do not use the typical 25% mean chance expectation. I did this by taking the p number or z-score and calculating how many hits there would have to be for the same number of trials to equal the same p number or z-score.

Readers may ask themselves why I’ve gone to all the trouble to detail the comments by other parapsychologists since none of them add any data of their own, and mostly stick to observations on the state of play. It was a desire for completeness that made we want to give each contributor his or her dues. Even if it sometimes loses narrative impetus, I decided that since I was writing a history of a science and not a Hollywood blockbuster, I could sacrifice readability for a more thorough summary.

In trying to keep some kind of flow, I have taken the liberty of excising asides and the like from various quotes, hence the prevalence of […] in many places. For a fuller quote in context, please feel free to email me, although I believe I have done no damage to any of the writers’ arguments by my editing.

Introduction to Part 3

In Part 3 I talk about the PRL trials. An enormously important contribution to the gaznfeld work in parapsychology. Over 300 trials carried out in impeccable conditions, sanctioned by sceptics as being fraud-proof, but still showing an above chance result: 35%, which was close enough to the 38% score of Honorton’s previous meta-analysis (see Part 2) for it to be considered a successful replication. The results wer
e so good and the method so solid that they were published in a mainstream science journal, The Psychological Bulletin, as well as sparking a series of lengthy debates on usenet which were notable for (a) the participation of high-profile parapsychologists and (b) the uniformed state of the sceptics involved.

Some of the more academic criticisms aimed at the PRL trials seem unlikely. Wiseman, Smith and Kornbrot’s 1996 theory about sound leakage never really gets off the ground, as they admit. They blame their lack of conclusion on the “dearth of accurate information” from the PRL technicians, which seems pretty unfair given that the main experimenter (Honorton) had passed away and the laboratory had closed down by this time, leaving only assistants who could not be expected to know the construction and materials involved in the walls of the building where they once worked.

Personally, I find the video degradation theory most plausible. As the video clip is shown many times to the sender during the ganzfeld period, you would expect some degradation of image to take place. And although most subjects and experimenters didn’t see the same target set twice, there is this pattern of hitting which is exactly what you would expect if people (subjects or experimenters) were somehow basing their conclusions on how the image or sound had degraded. If you define “psi” as “communication of information in the absence of ordinary sensory channels” then the PRL certainly fall outside this definition, since video degradation certainly happens on VHS tapes with repeated plays and so this would qualify as a “ordinary sensory channel”.

Introduction to Part 4

Lastly, Part 4 covers the years from 1983-1990. The years seem to have “fallen off the back of the cart” as far as meta-analyses are concerned. These results are strongly in favour of the null hypothesis, but have never found themselves fully represented in the parapsychological literature. Storm & Ertel’s meta-analysis, which appeared in issue 127 of the Psychological Bulletin, collected 11 experiments from this period. But this analysis contained only 76% of the available number of studies (pre-1998) and did not include the sizeable effect of the 14 experiments left out of Honorton’s 1985 meta-analysis (see Part 2).

Introduction to Part 5-7

This is the last entry in my history of psi in the ganzfeld. This is a work in progress, and updates will be posted when I get new data, perhaps every year or so. It is possible to see when a new revision of an article is on-line by the letter on its filename changing. For example, you can see that parts 1, 2, 3 and 4 now have “b” as a suffix, indicating that this is their second version. In this case, it is mostly grammatical and narrative concerns that made me make alterations, but I’d like to draw your attention to the new part 3, which has been pretty extensively rewritten regarding the video degradation hypothesis in the light of new information.

Meanwhile, with parts 5 to 7 the story is brought up to date. I suppose it’s time for a conclusion or two.

I’ve been researching the ganzfeld for the past five years and to be honest, for the first four years I fell into the same trap as a lot of other skeptics and parapyschologists: to take the data at face value and build a case based on that. The most recent article on the subject was from Palmer in an article called “ESP in the Ganzfeld: analysis of a debate” (Journal of Consciousness Studies 2003) and while it stands as a fine and clearly written example of the commonly accepted state-of-play regarding the ganzfeld research, and it expertly covers the issues over the PRL data, it still allows the large amount of un-meta-analysed data to go unmentioned.

This seems to be a kind of File Drawer effect, but not in its classic sense: that unsuccessful experiments are either rejected from publication or not even submitted for publication. The field of parapsychology has been carefully even-handed in the publishing of contrary evidence since 1975 when Honorton, as the president of the Parapsychological Association, put into practice a policy of publishing all results. Additionally, the field of parapsychology is so small that it is difficult for any large (or even medium sized) experiment to be carried out at an academic institute without people being aware of it.

In this case, it wasn’t unpublished results, rather un-analysed results that were the key to understanding what was going on. As I started to research properly, I soon found that the amount of non-analyzed data was pretty vast. While there are several meta-analyses on the ganzfeld, none are exhaustive. It is perhaps because the periods they cover dovetail so neatly together (Honorton’s hit rate studies: 1974-1982, PRL trials: 1982-1989, Bem et al’s analysis: 1991-1999) that they had the impression that there’s nothing else to examine.

Perhaps the most recent meta-analysis is by Schlitz & Radin in their paper “Telepathy in the ganzfeld: State of the evidence” (Healing, Intention and Energy Medicine, 2002) which talks about collecting 929 hits out of 2,878 sessions up to late 2001, whereas my database contains about 7,000 sessions [1]. And while I researched this field, I also kept an eye on how mainstream statistics viewed meta-analytical methods.

The first and most common issue I found was the GIGO principle of Garbage In, Garbage Out. In other words, data from poor experiments can contaminate and skew the overall results of a meta-analysis. Also the question of which experiments to include and which to exclude is a difficult one and it should be explained in any analysis. That the pro-psi argument leaves out about 60% of the data without saying why should be cause for concern.

The idea of meta-analysis being proof in and of itself is hotly contested. In the Spring 2004 issue of the Journal of Parapsychology, Kennedy quotes Bailar as saying:

Bailer, “The promise and problems of meta-analysis”, New England Journal of Medicine, 1997, cited in Kennedy, JoP, 2004 “It is not uncommon to find that two or more meta-analyses done at about the same time by investigators with the same access to the literature reach incompatible or even contradictory conclusions. Such disagreement argues powerfully against any notion that meta-analysis offers an assured way to distill the “truth” from a collection of research reports. (p. 560)”

Also recent theories referring to funnel graphs to distinguish the existence of the file-drawer problem have been of interest. The theory states that a genuine effect will give rise to a funnel graph, with its axis around a particular level, as larger experiments give a more accurate idea of the effect being measured, like this:

example taken from Richard Palmer, “Detecting Publication Bias in Meta-analyses: A Case Study of Fluctuating Asymmetry and Sexual Selection”, The American Naturalist 1999

But if the funnel graph had gaps, especially in certain areas, that would indicate that certain results are not being published. Unfortunately the ganzfeld results, plotted this way, don’t really approximate a funnel shape:

While they cluster above zero (mean effect size r=0.22) [2], there’s no indication that the larger an experiment is, the closer it comes to an accurate measurement of any effect. The furthest outlier (that of 400 trials and an effect size r of 0.77) was from Smith, Tremmel & Honorton’s 1976 experiment which based its scoring on an equal distribution of 10 elements in each target over 1024 targets, but was carried out before the target set was complete (see Part 1). But then again, you have to bear in mind that the second largest experiment (Terry, 1976) was conducted by the same laboratory under similar conditions, yet found a marginally negative score.

At the end of my research I find a hit rate of between 28.6% and 28.9% depending on certain choices concerning which scoring methods to use on particular experiments. This doesn’t have quite the headline grabbing appeal of 1 in 3 instead of 1 in 4 but the hit rate is still highly significant for 6,700 sessions. However, this contains all experiments. Flawed or not, standard or not. There’s no doubt that this figure can be tweaked up or down according to ruling in or out certain experiments.

So my conclusion? Well, anyone who asserts that the ganzfeld has been proven by scientific standards is wrong. The field’s reliance on incomplete meta-analyses seems to be cause for concern. Why Honorton chose certain experiments for his 1985 ganzfeld analysis hasn’t really been addressed (see Part 2), nor has the lack of non-PRL data from 1983-1990 in meta-analyses (see Part 4), nor the failure of the pro-psi camp to apply Bem et al’s “standardness” criteria of 2000 to pre-1990 data (see Part 6), nor the poor results of 2000 onwards (see Part 7). In fact, neither skeptics nor non-skeptics have looked at these issues, which makes me wonder if I’m the only person worried by them.

Are these genuine concerns, or mere nitpicking in the face of overall statistical significance? Perhaps. As I’ve tried to make clear, I’m no authority on statistical issues. That the hit rate I’ve found should fall halfway between what the pro-psi camp claim (33%) and what skeptics want (25%) seems like poetic justice.


[1] The reason I give approximate numbers is because I’m still waiting on details of certain experiments conducted early in the ganzfeld’s career, and also because by choosing differing scoring methods for certain experiments it is possible to alter the number of trials.

[2] I reiterate that I have no formal training in statistics, and my figures are solely due to feeding the right formula into Excel and hoping for the best. Readers are encouraged to do their own statistical analysis of the data.

Author’s Note

I’d like to thank (and recommend) the Lexscien web site, without which I would never have been able to research this subject properly.


Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

Part 7