The first time I heard about Dean Radin’s book “The Conscious Universe” was late last year on an internet forum, where a discussion was taking place. One of the participants in the discussion used the book as evidence of paranormal phenomena. I was a bit skeptical of the claim, but thought that it might be interesting to take a look at the book. So I ordered it from my favourite online bookstore and waited for it to arrive.
The subject of the book is psi research, that is research concerning telepathy, clairvoyance and precognition. Radin claims that these phenomena are real and in the book he presents the evidence, which he thinks proves this.
The first thing to note about this book is that it aren’t a scientific report. It is readable by people without scientific training, though there are some technical words, so if expressions such as “confidence interval” make your head ache, then you might want to have a glass of aspirins within reach before opening this book.
The accessibility of the book has the consequence that the readers isn’t given details, and therefore actually can’t judge for themselves, whether the results that Radin claims have been obtained are for real or not. This means that it’s of paramount importance that the reader can trust that Radin reports everything truthfully and completely.
This led me to consider how trustworthy a reading of the book makes Radin seem. I’m just a layman in this connection: I probably know a bit more about statistics than the average working Joe, but I know way less than Radin. I’m not especially knowledgeable about psi research either, but a couple of the books on my bookshelf mentions the subject, and I’ve read a bit about it on the web.
So what can a layman like me find out about the trustworthiness of Radin’s presentation in “The Conscious Universe”, using just the books on his bookshelf and a bit of time searching with google? This is what I set out to discover, and in this article I will take you with me on a journey through the impressions I got from reading the book.
The journey starts with the impressions I got from the first few pages, these are actually minor points, but they caught my attention and they made me form a rough sketch of Radin. If you want to get the main content of this article then simply skip the next section.
One of the first things I noticed was that in the acknowledgement section of the book Hal Puthoff is mentioned as a role model for Radin. That name rung a bell – wasn’t Puthoff the scientology guy, whom the skeptic James Randi showed to conduct seriously flawed research? This led me to use Google to look for info on Puthoff, which confirmed my suspicion that Puthoff was a scientologist. I personally would be pretty skeptic of the ability for critical judgement of anyone who accepts the tales of scientology founder L. Ron Hubbard, and let’s face it, critical judgement should be essential for someone in psi research.
Even more important: Scientology represents its teachings as science even though scientist strongly disagrees (see for example The Skeptic’s Dictionary: Dianetics or Chris Owen: Hysterical Radiation and Bogus Science ), so by accepting scientology it seems you throw any scientific credibility down the drain. I also brought out my copy of James Randi’s book “Flim-Flam!”, and sure enough: Randi delivered a pretty harsh critique of Puthoff’s research (among other things Randi concludes on page 135 that “The record is clear: Targ and Puthoff just cannot be trusted to produce a factual report”).
I’m not in a position to decide if Randi is right and whether scientology really is nonsense, but if just one of these two things is correct, then I wouldn’t pick Puthoff as a role model for researchers.
Finishing the acknowledgements I turned to the preface, where Radin tells a little story of an encounter he had with two persons on a train. These two persons are stereotypes of the credulous New Ager and the hard-nosed and uniformed skeptic. What really struck me about this story was the way the two persons are described. To convey this let me quote some words and phrases that I found within the first half page:
- “Barked the man”
- “luminous eyes”
- “immense halo of hair”
- “Harry was an advertisement for Brooks Brothers”
- “she said pouting”
- “rolled his eyes”
- “voice dripping with sarcasm”
- “Harry’s smirk at life’s stupidity had permanently creased his forehead with an angry gash”.
This made me wonder why Radin felt it necessary to present such caricatures of people who hold different opinions from himself.
After the discussion between the two, Radin joins in and after just two statements, the New Ager’s face “wavered between awe and bewilderment”.
So all it takes is a few words from Radin and his ‘opponent’ looks upon him with an expression that wavers between awe and bewilderment – impressive indeed.
Going past the preface I got to the ‘real’ start of the book, where Radin tells the reader that “the acceptance of new ideas [in science] follows a predictable, four-stage sequence”. During the description of the four stages Radin manages to ridicule skeptics (by saying that in the fourth stage the critics of an idea end up saying that they thought of the idea first) – what he doesn’t manage, however, is to provide any support for the four-stage model. No attempt is made to relate it to the existing theories of science, and no examples from the history of science are given to illustrate it.
When I consider radical new ideas in science the first two theories I always think of, are the theories of relativity and quantum mechanics, and I don’t recall any critic of either of these theories ending up saying that they thought of them first (which is what Radin claim happens). This might of course be due to me not being knowledgeable enough about these issues.
It seems that to be taken seriously Radin should at least try to give the reader some sort of documentation or argument for his model of scientific development, but Radin fails to do this. What he doesn’t fail to do, however, is to use this unsupported model to argue that mainstream acceptance of psi is on its way (I assume that he means mainstream scientific acceptance).
In all fairness I have to mention that on page 233 and 234 Radin does gives some examples that might support his model, but this is hard to evaluate, since he doesn’t compare the examples to the model.
The right quote in the right place can be important to the weight given to an argument, and Radin uses a lot of quotes in the book. I have looked a bit closer at the first few of them to see if Radin’s use of quotes seems to be correct, that is, in reflecting the intention of the person who is being quoted.
On page 3 Radin talks about “an astonishing admission” from astronomer and skeptic Carl Sagan in the book “The Demon-Haunted World”, and quotes from the book, where Sagan states that he thinks that three named claims in the ESP field deserve serious study. What Radin doesn’t mention is that immediately following the quoted remark Sagan writes the following (page 302):
“I pick these claims not because I think they’re likely to be valid (I don’t), but as examples of contentions that might be true. The last three have at least some, although still dubious, experimental support. Of course I could be wrong.”
Please note the words “I don’t”, “might” and “dubious”, which makes the “admission” from Sagan take on a less “astonishing” appearance, than what Radin leads us to believe.
Going on to pages 4 and 5 I reached two quotes from professor of statistics Jessica Utts and professor of psychology Ray Hyman who have evaluated some psi research. Utts is quoted for the following:
“The statistical results of the studies we examined are far beyond what is expected by chance. Arguments that these results could be due to methodological flaws in the experiments are soundly refuted. Effects of similar magnitudes to those found in government-sponsored research … have been replicated at a number of laboratories across the world. Such consistency cannot be readily explained by claims of flaws or fraud…. It is recommended that future experiments focus on how to make it as useful as possible. There is little benefit to continuing experiments designed to offer proof”
Radin then says that “Surprisingly, the other principal reviewer, skeptic Ray Hyman, agreed”, and as documentation for this he quotes Hyman:
“The statistical departures from chance appear to be too large and consistent to attribute to statistical flukes of any sort…. I tend to agree with Professor Utts that real effects are occurring in these experiments. Something other than chance departures from the null hypothesis has occurred in these experiments.”
Please note the three dots in the Hyman quote. I thought that it might be interesting to see what Radin had left out, so I saddled my trusty steed “Google” and set out to search for the missing words. I found the report (where the quote is from) reproduced in several locations; one of them was the American Institutes of Research . It turned out that the following words are missing:
“Although I cannot dismiss the possibility that these rejections of the null hypothesis might reflect limitations in the statistical model as an approximation of the experimental situation”
One has to wonder why Radin chose to leave this out. It clearly seems to sow some seeds of doubt about the results. Hyman’s report also contains the following statement (thanks to Claus Larsen for bringing it to my attention):
“We disagree on key questions such as:
- Do these apparently non-chance effects justify concluding that the existence of anomalous cognition has been established?
- Has the possibility of methodological flaws been completely eliminated?
- Are the SAIC results consistent with the contemporary findings in other parapsychological laboratories on remote viewing and the ganzfeld phenomenon
The remainder of this report will try to justify why I believe the answer to these three questions is ‘no.’”
Now compare this quote with the Utt’s quote. When I compare them, ‘agreement’ isn’t the first word that pops into my mind. So it seems that Radin’s basis for the agreement is that Hyman tends to agree with Utts that something is going on, but he seems to disagree with her regarding everything else. This is further validated by the following quote taken from “The Evidence for Psychic Functioning”, by Ray Hyman :
“Indeed, I do not believe that ‘the current collection of data’ justifies that an anomaly of any sort has been demonstrated, let alone a paranormal anomaly. Although Utts and I — in our capacities as coevaluators of
the Stargate project — evaluated the same set of data, we came to very different conclusions.”
Based on these examples (though of course it would be preferable to check all Radin’s quotations out) it seems to me, that Radin’s way of choosing and interpreting quotes is a tad too creative, and it doesn’t give much confidence regarding the other quotes later in the book.
The first part of this article has been about the first few pages of the book. In the following sections I will look at some parts that I found interesting. The first thing I will turn my attention to is Radin’s description of an experiment performed during the TV broadcast, where the ruling in the O.J. Simpson case was announced (described on pages 166 and 167).
Before, during and after this broadcast five Random Number Generators (RNGs) were producing strings of random bits. The claim behind this is that when many people are focused on one thing (in this case the broadcast) then it influences stochastic processes like a RNG (this phenomenon is called field consciousness).
Radin includes a graph of the odds of the RNG output occurring by chance as a function of the time – this can be considered as a measure of how much the output deviates from what is expected. This graph is reproduced in a
modified form below (I have added some vertical lines to make it easier to assess the timing – it is taken from Dean Radin’s article “Where, when, and who is the self?”, but for our purposes it’s identical to the one in the book). It is seen that it has two large spikes, where something rather improbable happened. It is furthermore seen that the graph seems to contain a lot of smaller fluctuations superimposed on the main shape of the graph. I assume that these fluctuations are ‘statistical noise’, which is to be expected from a RNG.
Radin mentions three events in the broadcast, which he thinks are important (the beginning of the preshow, the beginning of the ‘main’ broadcast and the verdict announcement) and he relates these three events to the graph (and specifically marks them on the graph). Radin’s hypothesis is that a spike should be produced in the graph when something very interesting happens in the broadcast, and this seems to be the case. However I noticed a few odd things about Radin’s interpretation of the graph:
The event markings on the graph are placed at the tops of the smaller fluctuations superimposed on the main shape of the graph. This makes it seem as if Radin sees something significant in the statistical noise, which strikes me as being wrong, but then again, it is Radin who is knowledgeable about statistics, not me, so I might be barking up the wrong tree.
The event markings that Radin has made on the graph are supposed to correspond to the events he describes in the text. However, when I looked closer at the graph I saw that the time of the events marked on the graph didn’t match the times that Radin gave in the text!
The preshow is supposed to have started at 9:00, but on the graph it seems to be marked somewhere between 8:56 and 8:57. According to the graph the spike is starting to decline at 9:00, which doesn’t really jibe with Radin’s claim.
The second event is the start of the ‘main’ broadcast, and it is supposed to take place at 10:00, but on the graph the event seems to be marked somewhere between 9:58 and 9:59.
Radin gives the following description in connection with the last event (the announcement of the verdict):
“a few minutes later [that is later than 10:00] the order in all five RNGs suddenly peaked to its highest point in the two hours of recorded data precisely when the court clerk read the verdict.”
A few minutes later than 10:00 can be no sooner than 10:02, but on the graph Radin has marked the event as occurring around 10:00 and the spike on the graph seems to start it’s rapid disappearance at 10:02.
These things seem to indicate that Radin has altered the timing on the graph to make the data fit his theory.
One might think of several different explanations for the discrepancy. For instance the graph may simply have been offset a bit from the time-axis, but this doesn’t seem likely because the discrepancies are not of equal size. One might also entertain the possibility that we are seeing a case of precognition, but this possibility will invalidate some of Radin’s other experiments. An example of this is an experiment, where RNG output was recorded during the 1995 Academy Awards broadcast. In this experiment the broadcast was divided into high and low interest periods and the ‘unlikeliness’ of the RNG output during these periods were compared. If precognition really were important then the most unlikely output should occur before the high interest periods – that is in the low interest periods, but this isn’t the case.
A further point of interest is the fact that a third spike on the graph is much smaller than the two major spikes, but it represents output that are more unlikely than the outputs that Radin considers significant in a similar experiment (e.g. in an experiment where RNG output were recorded during a Superbowl broadcast). Radin doesn’t mention this spike, and according to Claus Larsen (see An Evening with Dean Radin, by Claus Larsen ) Radin doesn’t have an explanation for it. It seems a bit strange to me to simply ignore unexplained data of a magnitude that you consider significant in other studies. If I were in a bad mood I might have described this as selecting the data that fit your theory, and discarding the data that don’t. However I’m not in a bad mood, so I won’t say that.
The Quality of Psi Research
Psi research has received a lot of harsh criticism for a long time. In my opinion Radin seems to understate the merit of some parts of the critique, in that he doesn’t seem to acknowledge the fact that mistakes have been found in many experiments, where a skeptical look was allowed. This is a problem with a lot of the analyses that Radin performs. He collects data from other people’s reports and uses a statistical method called meta-analysis on this data, but since he wasn’t present at the experiments, he can’t know whether they were as tight as the reports might suggest. To illustrate this I will quote Susan Blackmore’s remark on ganzfeld research:
“These experiments, which looked so beautifully designed in print, were in fact open to fraud or error in several ways, and indeed I detected several errors and failures to follow the protocol while I was there. I concluded that the published papers gave an unfair impression of the experiments and that the results could not be relied upon as evidence for psi. Eventually the experimenters and I all published our different views of the affair”
This quote is taken from What Can the Paranormal Teach Us About Consciousness?, by Susan Blackmore , another quote shedding some light on this issue in connection with autoganzfeld experiments is the following taken from “The Evidence for Psychic Functioning”, by Ray Hyman :
“The experimenter, who was not so well shielded from the sender as the subject, interacted with the subject during the judging process. Indeed, during half of the trials the experimenter deliberately prompted the subject during the judging procedure.”
In connection with remote viewing (clairvoyance) James Randi gives us the following description in his book “Flim-Flam!” (page 147):
“The judging procedure had been well designed–on paper, that is. Judges were given a list of nine locations and a package of transcripts. Their job was to match the locations with the correct transcripts. It was done with great accuracy, and the case seemed proved. But when we find that three judges appointed by other officials at SRI failed to get good results with the matching procedure, we begin to get suspicious. Targ and Puthoff [Radin’s role model], however, found two who were sympathetic, and these two did just fine.”
Randi then goes on to describe how the list of locations were given in chronological order (the judges knew this), and information was available in the transcript, so that they could also be put into chronological order. For the experiment to have any value it was absolutely essential, that such information was not available to the judges, but according to Randi it was.
These quotes (if they can be trusted) illustrate the fact that experiments might contain flaws even though they look good in the reports, and it seems to me, that Radin doesn’t acknowledge this.
It is also interesting to note that Radin mentions the parapsychologist J.B. Rhine and his experiments several times, without telling the reader that it is now known that Rhine discarded negative data from his experiments. The Physicist Robert Park describes this in “Voodoo Science” with the following words (on page 42):
“Rhine believed that persons who disliked him guessed wrong to spite him. Therefore, he felt it would be misleading to include their scores.”
I wonder why Radin has chosen to write about minor things like the possibility of transferring information by marking envelopes (used to contain cards for the ESP test) with fingernails, instead of major issues like this case of fraud (which Rhine seems to have performed in good faith).
On page 218 Radin states that:
“if we were forced to dismiss scientific claims in all fields where there have been a few cases of experimenter fraud, we would have to throw out virtually every realm of science–since fraud exists in all human endeavors.”
Here I think that Radin misses the point of the critique, because what is interesting is not whether there are fraud and errors in psi research, but whether it is much more common in psi research than in other disciplines, and whether it accounts for the results that are obtained.
A related critique that Radin also seems to get wrong is illustrated on page 221. Radin tells us that in 1985 Charles Honorton analysed 28 studies. 9 of these had been “scrutinized” by Susan Blackmore and she found them to be “clearly marred” by “accidental errors” (though according to Radin she hasn’t demonstrated that these errors exist, but that is besides the point I’m trying to make). Radin’s comment to this is:
“after Blackmore’s allegedly ‘marred’ studies were eliminated from the meta-analysis, the overall hit rate in the remaining studies remained exactly the same as before. In other words, Blackmore’s criticism was tested and it did not explain away the ganzfeld results.”
I contend that Radin misses the point again, for if the 9 studies that were actually critically checked out were found to be marred, how do we then know that the other unchecked studies weren’t? Radin’s argument makes me think of a factory, where samples of the product are taken and checked for defects, and if a sample is found to be defect, the sample is thrown out, and everything else is shipped, since the faulty product has been removed everything else must be ok.
There are other cases where I’m not convinced by Radin’s defence against the critique. One example of this can be found on page 136 where Radin mentions two things that psi researches are criticized for:
- They don’t learn from the mistakes of their predecessors.
- Effect sizes tend to decrease in experiments of higher quality.
To deal with the first part of this critique Radin displays a graph on page 136, where the quality of dice tossing experiments (that is experiments where a test subject attempts to influence or predict the outcomes of a series of die tosses) are plotted as a function of time (and a similar graph for RNG experiments are given on page 142). This graph actually shows an increase in quality over time, and could at first seem to disarm the critique of not learning from past mistakes. Things are not that simple however. One of the first questions that occurred to me when I saw the graph was: How is the experimental quality measured? Ray Hyman makes the importance of this issue clear (“The Evidence for Psychic Functioning”, by Ray Hyman ):
“As far as I can tell, I was the first person to do a meta-analysis on parapsychological data. I did a meta-analysis of the original ganzfeld experiments as part of my critique of those experiments. My analysis demonstrated that certain flaws, especially quality of randomisation, did correlate with outcome. Successful outcomes correlated with inadequate methodology. In his reply to my critique, Charles Honorton did his own meta-analysis of the same data. He too scored for flaws, but he devised scoring schemes different from mine. In his analysis, his quality ratings did not correlate with outcome. This came about because, in part, Honorton found more flaws in unsuccessful experiments than I did. On the other I found more flaws in successful experiments than Honorton did. Presumably, both Honorton and I believed we were rating quality in an objective and unbiased way. Yet, both of us ended up with results that matched our preconceptions.”
Radin gives a description of thirteen criteria that are used to compute a single quality score. The description of each of the criteria is brief, however, and alternative criteria or evaluation procedures aren’t considered, so the reader can’t really make an informed judgement concerning the graph. Furthermore it seems to me that, any such evaluation must contain a subjective judgement (as is indicated by Hyman), which should make us take Radins conclusion with a grain of salt. If we ignore this problem we are still left with the problem pointed out in the Blackmore quote at the beginning of this section with experiments not being performed according to the descriptions in the reports.
Radin addresses the second criticism with the following words:
“We tested this argument by looking at the relationship between hit rates (in this case, averaged by year) and the study quality averaged per year. We found that the relationship was essentially flat, so the critique is not valid.”
I have a couple of objections to this argument. First of all why aren’t we allowed to see a graph of this? (There are tons of graphs in the book, and this seems to be an important issue). Why aren’t we told what the level of statistical significance is? (Most results in the book are accompanied by a remark about how ‘unlikely’ they are, and how much faith we therefore should put in them).
Furthermore if we look at the two quality graphs on pages 136 and 142, we see that the quality is nowhere near optimal levels. The graph on page 136 starts at a score of about 4 and ends up around 7, and as far as I’m able to tell this is on a scale from 0 to 13, so it seems that even at the end, the average study fails at 6 of the 13 quality criteria. This seems to leave quite a bit of room for errors. The graph on page 142 starts at around 3.5 and ends up around 5, and it seems that the scale for this graph is 0 to 16, which seems to indicate that the average experiment in the end fails at 11 out of 16 quality criteria. This doesn’t inspire much confidence. Furthermore the quality change from a bit below 3.5/16 to 5/16 has taken around 28 years, so the improvement has been very slow.
It is also interesting to compare the two graphs. If we do this, we see that the graph on page 142 shows experiments with substantially lower quality scores (and according to Radin the criteria used for them were similar). Now please note that the graph on page 136 is for dice tossing experiments, and the graph on page 142 is for RNG experiments. Furthermore note that Radin states that RNG experiments are the successors of the dice tossing experiments (see pages 138 and 212). Combining these observations we see that the newer type of experiments seems to be of lower quality than the old type (5/16 versus 7/13), which they are supposed to be an improvement of. This seems to indicate that the critique is at least partly correct.
Another point of interest regarding the criticism that psi effects diminish with increasing experiment quality, is the fact that on a graph on page 106, Radin shows the effect sizes for several different types of experiments. Two of these are ESP card tests and high security ESP card tests. The ‘normal’ card tests have an effect size of roughly 65%, while the high security card tests have an effect size of roughly 55% (compared to the 50% chance expectation), and the 95% confidence intervals for these two figures don’t overlap. This shows that when the security in the card tests was increased the effect size decreased dramatically.
On page 102 (in connection with remote viewing) Radin present another defence against the critics that strikes me as being weird:
“in test after test, psi performance among a small group of selected individuals far exceeded performance among unselected volunteers. This was an important observation, because if design problems accounted for successful experiments–as critics often assumed–then the selected group would not have been able to perform consistently better than unselected volunteers.”
Why not? It doesn’t seem unlikely to me that a few participants (selected by the experimenters) exploited (consciously or unconsciously) some flaw in the experiments, while the others didn’t. In the case of fraud it seems even more likely that a few participants selected by the experimenters would yield the best results. If I were a magician trying to perform some magic trick that needed the help of other persons, then I would very much prefer to select these persons myself. It might also be the case that these test subjects had been cheating from the very beginning and was selected by the scientists who were impressed by their rate of success.
Page 102 contains more interesting information. Among other things Radin tells the reader that results become better, if the test subjects are allowed to describe what they perceive freely, instead of being forced to select between a few discrete possibilities. This means that better results are obtained when subjective judging is needed, than when a simple objective test is used. When I relate this to James Randi’s comments about Targ and Puthoff’s judging procedures above, I start getting suspicious, and it seems that the high security experiment type gives the lowest effects, in accordance with the claims of the critics.
Continuing on page 102 we find that results also get better if feedback is given during the experiments. This also sounds my alarms, since by giving feedback you risk giving the test person information that he shouldn’t have access to (either verbally or through body language), and again it seems that the tightness of the experimental security is negatively related to the measured effect sizes.
Going on to page 240, we see that Radin has another interesting way of dealing with his critics:
“nothing about the work of either Geller or Randi is described in this book. They are actually so irrelevant to the scientific evaluation of psi that not a single experiment involving either person is included among the thousand studies reviewed in the meta-analyses.”
So all Radin has to do is to say that the critics are irrelevant, and then the critique can be ignored. Well, I have to admit that this is so much more time saving than having to argue that something is wrong with the critique that Randi delivers.
In connection with this quote it might be interesting to note that the other irrelevant person (the claimed psychic Uri Geller) was actually the central subject of some tests and a scientific paper by Radin’s role model Hal Puthoff.
On a related note I want to discuss the design of a few of the experiments that Radin was involved in. First I want to focus on field consciousness experiments. That is experiments of the same kind as the O.J. Simpson broadcast experiment described above, where output from RNGs are monitored during periods of time when a lot of people are focused on the same thing.
On page 165 Radin describes an experiment based on a comedy show, where two members from Radin’s research staff brought a notebook computer and a RNG. During the show one of the researchers divided the show into high and low interest periods, and it was checked whether the RNG output was more ’unlikely’ during the high interest periods than during the low interest periods. This turned out to be the case, but what worries me is that the researcher who made the division into the two categories might have had access to the RNG output. Radin doesn’t state that she didn’t and this seems strange since it is very important to the credibility of the experiment, because experimenter bias could invalidate the experiment if she had this access. This also seems to be a problem in a similar experiment performed during the 1995 Academy Awards broadcast, because here it appears that Radin had one of the two RNGs used for the experiment, with him in his home, where he was alone.
Experimenter bias could also be a problem with another aspect of the 1995 Academy Awards experiment. Here Radin and an assistant divided the broadcast into high and low interest periods. I’m very puzzled as to why Radin didn’t choose to let someone else do this, to make sure that criticisms about data manipulation on this count could be avoided.
When I look at all the experiments that Radin has conducted concerning RNG measurements of field consciousness (at least those presented in the book), I note that Radin seems to change the experiment design between each experiment: Sometimes an experiment is divided into high interest and low interest periods, sometimes it isn’t, and at other times a more complex interest index is used. It also seems that sometimes the accumulated odds are considered, and sometimes they aren’t. I wonder why the experiment design was changed in these ways, since it makes them harder to compare, and thereby makes claims of reproducibility less convincing. It also makes them more vulnerable to critique, such as speculation that the experiment design was selected after the measurements in order to obtain the most favourable results.
I’ll leave the field consciousness experiments behind for a while and go on to pages 175 to 189 where Radin deals with psi and gambling. In the chapter contained in these pages Radin presents lots of graphs showing correlations between the lunar cycle and payouts from casinos and lotteries. That might be interesting, but the first question that I would have attempted to answer in this connection would not have been: Is payout related to the moon? But instead: Is the payout on slot machines higher than expected (because it is influenced by psi)? Why is Radin suddenly interested in correlations with the lunar cycle instead of seeing whether psi is at work? How can he know that he isn’t testing astrology instead of psi? Why wasn’t he interested in the lunar cycle in all the other types of experiments? If I were of a more suspicious nature I might have proposed that Radin focused on the lunar cycle, because there were no effects like the ones claimed in the other types of experiment, but I ain’t, so I won’t.
It seems that Radin doesn’t really think that the Moon directly influences payouts, but is instead correlated with factors that do influence the payouts. Two factors are proposed:
- The geomagnetic field (GMF).
If Radin believes that these factors actually are decisive, then I can’t understand why all the graphs (except one) relating to these things are expressed in terms of the lunar cycle instead of GMF and gravity. This problem gets even worse since sometimes GMF is positively correlated (page 187) to the lunar cycle, and sometimes it is negatively correlated (e.g. page 182). This seems to be a very roundabout fashion of tackling the problem of a connection between GMF and psi.
The connection to gravity is (as mentioned above) tested via the lunar cycle; the reasoning behind this is the following (page 180):
“One way to do this was to conduct an experiment every day over the lunar cycle, because the sun-moon system predictably changes the gravitational forces (i.e., tidal forces) felt on earth.”
This seems to me to be an inefficient way of measuring gravitational effect. I would suggest performing the experiments during different times of the day, because the gravitational influences of the Sun-Moon system varies much more over a day than it does over a lunar cycle. As we all know the changing of the tides is a daily phenomenon not a monthly one. Therefore it should have been possible to measure much greater effects if the experiment had been designed in the way I propose.
The tests for correlation with the GMF are designed to test for correlation between the lunar cycle and casino payouts. Radin’s explanation of this is that the lunar cycle is correlated with the GMF. However this is problematic since sometimes the correlation is positive and sometimes it is negative. I must admit to being baffled by Radin’s decision to use the lunar cycle to test for payout-GMF correlations. If I were to test for a correlation between phenomena X and Y, I would test for a correlation between them directly. Instead Radin selects to find a phenomena Z that seems to correlate with X in a very complex manner, and then test for correlation between Y and Z. This seems to complicate matters a lot, and it makes it harder to make reasonable conclusions.
It also seems that if the GMF is really important for psi, then the location of laboratories should be important for psi experiments, since as far as I know the GMF varies around the Earth. Solar eruptions should also be taken into account since they seem to be the strongest influence on the GMF (see Committee on Solar and Space Physics: Space Weather: A Research Perspective – thanks to Mogens Winther for this link).
Furthermore it seems to me that the chapter which Radin calls “Psi in the Casino” should have been called “The Moon in the Casino”, since it mostly contains tests for correlations between payouts and the lunar cycle. I fail to see why this is evidence for psi. Why isn’t it evidence for astrology or the hypothesis that the geomagnetic field influences slot machines?
I also wonder why in one case a correlation between a phenomenon and the lunar cycle indicates a connection between the phenomenon and gravity, and in other experiments a correlation between a phenomenon and the lunar cycle indicates a connection between the phenomenon and GMF. This seems very inconsistent to me.
As I mentioned in the beginning, I’m sure that Radin knows quite a lot more about statistics than I do. Nevertheless there are some statistical issues that I want to address. The first thing is to note that most of Radin’s results are based on a statistical method called meta-analysis where a lot of experiments are analysed together. The use Radin makes of this technique has been severely criticized, for example by Ray Hyman (“The Evidence for Psychic Functioning” ) who states that:
“The major point I would make, however, is that drawing conclusions from meta-analytic studies is like having your cake and eating it too.”
And statistician and astrophysicist Jeffrey Scargle addresses the use of meta-analysis in “The Conscious Universe” in this way:
“The possibility that these results are spurious and due to publication bias is considered, and then rejected [by Radin] because the FSFD calculations yield huges values for the putative file drawer. It is my opinion that, for the reasons described here, these values are meaningless and that publication bias may well be responsible for the positive results derived from combined studies.”
Here Scargle uses the phrase “combined studies” instead of meta-analysis, and by “publication bias” he means that often only studies giving a positive result are reported, while studies giving a negative results stay in the file drawer. This is illustrated by the following quote taken from James Randi’s book “Flim-Flam!” (page 143):
“Hundreds of experiments that were done by SRI in testing Price, Geller and Swann were never reported. Instead, tests with favorable results were selected, in spite of their poor control and heavily biased ambiguity, to be published as genuine scientific results despite strenuous objections from more serious and careful scientists.”
I’m in no position to judge whether Radin is right or not, but Scargle’s arguments sound reasonable to me.
Leaving the meta-analysis issue unresolved, I want to go on to another statistical concept, called a confidence interval. If you have some measurements for some property, you can compute the average value, but you can’t be sure that the average value you get is actually equal to the real value. Lets take an example:
Imagine that you want to find out whether Americans are on average taller than Germans. To do this you measure the heights of 1000 Americans and compute the average of their heights. After that you measure the heights of 1000 Germans and compute the average of those heights. Now you can’t be sure that the computed average values are equal to the true averages of the populations, so therefore you use a statistical method to make an interval of heights for each of the averages. This method makes sure that you are 95% sure that the population average is within this interval. These two intervals are called 95% confidence intervals. If the intervals overlap we can’t be confident that Americans are taller than Germans, but if the American average is higher than the German one and the intervals don’t overlap we can be reasonably confident that on average Americans are taller than Germans.
Radin uses 95%-confidence intervals on most of his graphs (and this is the standard practice), but for some reason he switches to 65%-confidence intervals in some graphs (e.g. pages 120-124 and 182-185), which means that we can be less confident of the results. It seems odd that Radin chooses to be inconsistent in this way, and it is interesting to note that he uses 65%-confidence intervals on the graphs where the differences between the numbers being compared aren’t very big. By using only 65%-confidence intervals Radin makes the results look more convincing than they actually are. For many of these graphs we simply can’t be confident that there is anything but chance differences if we use the overlapping 95%-confidence interval criteria described above, and on page 35 Radin himself states his agreement with this criteria. So we conclude that quite a few of the graphs that Radin uses fail to give us any result we can be confident about, according to the criteria that Radin himself agrees to, and furthermore it seems that he tries to make the results look more convincing by using 65% instead of 95%.
Even worse than the fact that 65%-confidence intervals are used, is the fact that some of the graphs don’t have any confidence intervals at all (e.g. pages 181 and 187), and it is therefore impossible to tell whether anything interesting is to be gleaned from them.
While we’re talking about the graph on page 181 (which shows a connection between the lunar cycle and ESP ability) I would like to mention that there are simply no data for one or two days around full moon, and this seems very strange since these days should be the most important ones in the experiment. So why are these days left out? And doesn’t it significantly reduce the value of the graph?
I would also like to point out that the graph showing a correlation between the lunar cycle and casino payouts shows a positive correlation. The casino data (on which the graph is based) are from 1991 to 1994. A couple of pages after this graph Radin presents a graph for payouts from a lottery. On this graph the correlation between the payout and the lunar cycle is negative, that is the opposite of the result from the casino graph. This doesn’t bother Radin, however, and he tells us that this is because during 1993 when the lottery data were obtained, the lunar cycle was positively correlated with the GMF, which is the opposite of the normal situation. This information struck me as pretty odd, since 1993 is one of the years included in the casino study, so why are the correlations of opposite sign if the two studies are from overlapping time frames? I have to wonder why Radin doesn’t notice this problem.
The field consciousness experiments, which I have talked about in connection with experiment design and the O.J. Simpson experiment, raise further issues. First of all I would like to mention the graph on page 166, where Radin plots accumulated RNG odds as a function of time for both high and low interest periods of RNG recordings from a comedy show (that is one curve for the high interest periods and one for the low interest periods). The high interest curve slowly rises from low odds to rather unlikely results. Radin states that:
“Large fluctuations like this may occur by chance in shorter random sequences, but progressive accumulation of such odds over longer sequences is not as likely. This is why the long-term trend of the data as they are accumulated within each condition is of interest, rather than momentary fluctuations”
The low interest curve stays at low odds all the time, but it is much shorter than the high interest curve (apparently there were more interesting than uninteresting parts), and the high interest curve stays nearly as low as the low interest graph during all the duration of the low interest curve. So how can Radin really conclude that there is a difference between the odds obtained on the two curves?
The above quote is also interesting in another way, because Radin states that momentary fluctuations may occur by chance, but in the case of the O.J. Simpson broadcast experiment, momentary fluctuations are all we get, and Radin sees these as very important. On the comedy show graph Radin disregards a spike of around 10 minutes duration that reaches odds 1 to 1000, but on the O.J. graph Radin uses a spike with a duration of about 8 minutes reaching odds around 1 to 300 (note that the scale on the graph is logarithmic) as evidence. But according to his own argument this spike should be ignored, and the O.J. graph therefore loses its significance.
Is Psi a Consistent Phenomenon?
I think that it is very interesting to examine whether psi seems to be a consistent phenomena, and Radin makes some remarks about this. For example in connection with experiments where a person is trying to influence the tossing of a die or a RNG, Radin notes that (page 141):
“We see that the dice study results and the RNG study results are remarkably similar, suggesting that the same mind-matter interaction effects have been repeatedly observed”
However I fail to see why we should expect the same effect sizes in these two types of experiment. A RNG is, after all, radically different from a die. But let’s humour Radin and accept that we should expect similar effect sizes in different experiments. If we do this it seems extremely odd that the Superbowl field consciousness experiment gets results that have odds nearly 100 times lower than the 1995 Academy Awards field consciousness experiment, and why is the 1996 Academy Awards field consciousness experiment results also much lower than those from 1995?
In connection with an experiment where RNG data was recorded during an Olympics opening ceremony, Radin states that all of the ceremony were of high interest, but nevertheless the odds were very low in the first half of the broadcast, while in the O.J. Simpson experiment significant results are argued to be achieved at once. Why does it take two hours to obtain really unlikely results in the Olympics experiment, while they were obtained instantaneously in the O.J. Simpson experiment?
Similarly the fact that the odds change between high and low interest periods in the Academy Awards experiment seems to indicate that results should be immediate, but as mentioned this isn’t the case for the Olympics experiment. And why don’t the odds drop to chance levels at once after the ceremony ends, when this seems to happen in the other experiments?
We can also compare the results from the Superbowl experiments with the Olympics experiment. In the Superbowl experiment very low odds are obtained. Radin states that this is because there really aren’t very big differences between high and low interest periods during a Superbowl broadcast, because it is fast moving all the time and even the commercials are supposedly interesting. So we can conclude that if everything is interesting then large odds won’t be obtained. In the Olympics broadcast, however, all of the broadcast is deemed to be interesting, but very large odds are obtained nevertheless.
Why doesn’t Radin point out that these results seem to be inconsistent?
There are other inconsistencies in Radin’s results and arguments. In connection with the field consciousness experiments Radin states (on page 161) that attempts to focus on influencing the RNGs can destroy the effect on the RNG. But then I would like to know how the experiments where a human is to intentionally alter the output of a die throw or a RNG are possible. Because in these experiments the participants do in fact focus. These two things seem to contradict each other, but Radin doesn’t seem to notice this.
On page 173 Radin writes that he has assumed that the global consciousness effect is “non-local, meaning that it would not drop off with distance.” Combine this with the fact that 40 people can produce significant effects by attending a comedy show, and we reach the conclusion that RNGs should always give significant effects, because it is hard to imagine that there isn’t some interesting event going on with at least 40 people somewhere in the world at any given time. So the RNGs should be outputting significant results all the time, but this is clearly not the case, so Radin’s assumption seems to be contradicted.
Furthermore, if field consciousness is non-local then why doesn’t it seem to influence the other psi experiments where a test subject attempts to influence a die or a RNG? If non-local field consciousness is to be taken seriously, then it seems to me that it should be taken into account in these other experiments.
It is also interesting to consider the consequences of the experiment that Radin seems to believe establishes a connection between gravity and ESP potential. The gravity fluctuations considered in this experiment are the changes due to the lunar cycle. But, as argued before, gravity changes more during the Earth’s rotation than it does if one looks at the same time of day during a lunar cycle. So if the lunar cycle has any influence, then the time of the day should a have much larger influence on psi ability than the lunar cycle. This should then be considered a very important factor in every psi experiment, but it seems that this is never considered. The geographical location of a laboratory should also be significant since the gravity is stronger at the Equator than at the poles, but this factor also seems to be neglected. Similar issues arise in connection with the lunar cycle and GMF.
As a final remark about consistency I would like to point out that on page 102 Radin says that in connection with remote viewing a few test participants seem to be doing much better than the rest, and this is also the case for precognition (see page 115), but in connection with mind-matter interactions Radin writes that there are “no “star” performers”. Of course this might have some perfectly natural explanation, but it doesn’t strike me as a feature of a consistent phenomenon.
I don’t suffer from the delusion that my scribblings here will make anyone reject the research presented in “The Conscious Universe” – this has never been my intention and in my opinion “The Conscious Universe” is an interesting book, whose subject area deserves closer inspection. What I have tried to do in this article is to show that there seem to be some discrepancies in the book, which I think are worth keeping in mind when evaluating it. I also make no claim to have found the absolute truth about anything, but what I have written is the result of my simple attempts at careful reasoning. Since human reasoning has never been perfect (and my non-perfection has been demonstrated often enough) I would really appreciate it if anyone who finds a fault in this article, would contact me and correct my mistake – I would also appreciate any other comment you might have.
Thanks to Claus Larsen for encouraging me to write this article and to Dann Simonsen for giving me a thorough list of corrections for the first version it.