The first time I heard about Dean Radin’s book "The Conscious Universe" was late last year on an internet forum, where a discussion was taking place. One of the participants in the discussion used the book as evidence of paranormal phenomena. I was a bit skeptical of the claim, but thought that it might be interesting to take a look at the book. So I ordered it from my favourite online bookstore and waited for it to arrive.
The subject of the book is psi research, that is research concerning telepathy, clairvoyance and precognition. Radin claims that these phenomena are real and in the book he presents the evidence, which he thinks proves this.
The first thing to note about this book is that it aren’t a scientific report. It is readable by people without scientific training, though there are some technical words, so if expressions such as "confidence interval" make your head ache, then you might want to have a glass of aspirins within reach before opening this book.
The accessibility of the book has the consequence that the readers isn’t given details, and therefore actually can’t judge for themselves, whether the results that Radin claims have been obtained are for real or not. This means that it’s of paramount importance that the reader can trust that Radin reports everything truthfully and completely.
This led me to consider how trustworthy a reading of the book makes Radin seem. I’m just a layman in this connection: I probably know a bit more about statistics than the average working Joe, but I know way less than Radin. I’m not especially knowledgeable about psi research either, but a couple of the books on my bookshelf mentions the subject, and I’ve read a bit about it on the web.
So what can a layman like me find out about the trustworthiness of Radin’s presentation in "The Conscious Universe", using just the books on his bookshelf and a bit of time searching with google? This is what I set out to discover, and in this article I will take you with me on a journey through the impressions I got from reading the book.
The journey starts with the impressions I got from the first few pages, these are actually minor points, but they caught my attention and they made me form a rough sketch of Radin. If you want to get the main content of this article then simply skip the next section.
First Impressions
One of the first things I noticed was that in the acknowledgement section of the book Hal Puthoff is mentioned as a role model for Radin. That name rung a bell – wasn’t Puthoff the scientology guy, whom the skeptic James Randi showed to conduct
seriously flawed research? This led me to use Google to look for info on
Puthoff, which confirmed my suspicion that Puthoff was a scientologist. I
personally would be pretty skeptic of the ability for critical judgement of
anyone who accepts the tales of scientology founder L. Ron Hubbard, and let’s
face it, critical judgement should be essential for someone in psi research.
Even more important: Scientology represents its teachings as science even though
scientist strongly disagrees (see for example The Skeptic's Dictionary: Dianetics
or Chris Owen: Hysterical Radiation and Bogus Science
), so by accepting scientology it
seems you throw any scientific credibility down the drain. I also
brought out my copy of James Randi’s book "Flim-Flam!", and sure enough: Randi
delivered a pretty harsh critique of Puthoff’s research (among other things
Randi concludes on page 135 that "The record is clear: Targ and Puthoff just
cannot be trusted to produce a factual report").
I’m not in a position to decide if Randi is right and whether scientology really is nonsense, but if just one of these two things is correct, then I wouldn’t pick Puthoff as a role model for researchers.
Finishing the acknowledgements I turned to the preface, where Radin tells a little story of an encounter he had with two persons on a train. These two persons are stereotypes of the credulous New Ager and the hard-nosed and uniformed skeptic. What really struck me about this story was the way the two persons are described. To convey this let me quote some words and phrases that I found within the first half page:
This made me wonder why Radin felt it necessary to present such caricatures of people who hold different opinions from himself.
After the discussion between the two, Radin joins in and after just two statements, the New Ager's face "wavered between awe and bewilderment". So all it takes is a few words from Radin and his ‘opponent’ looks upon him with an expression that wavers between awe and bewilderment – impressive indeed.
Going past the preface I got to the ‘real’ start of the book, where Radin tells the reader that "the acceptance of new ideas [in science] follows a predictable, four-stage sequence". During the description of the four stages Radin manages to ridicule skeptics (by saying that in the fourth stage the critics of an idea end up saying that they thought of the idea first) – what he doesn’t manage, however, is to provide any support for the four-stage model. No attempt is made to relate it to the existing theories of science, and no examples from the history of science are given to illustrate it.
When I consider radical new ideas in science the first two theories I always think of, are the theories of relativity and quantum mechanics, and I don’t recall any critic of either of these theories ending up saying that they thought of them first (which is what Radin claim happens). This might of course be due to me not being knowledgeable enough about these issues.
It seems that to be taken seriously Radin should at least try to give the reader some sort of documentation or argument for his model of scientific development, but Radin fails to do this. What he doesn’t fail to do, however, is to use this unsupported model to argue that mainstream acceptance of psi is on its way (I assume that he means mainstream scientific acceptance).
In all fairness I have to mention that on page 233 and 234 Radin does gives some examples that might support his model, but this is hard to evaluate, since he doesn’t compare the examples to the model.
Creative Quoting?
The
right quote in the right place can be important to the weight given to an
argument, and Radin uses a lot of quotes in the book. I have looked a bit
closer at the first few of them to see if Radin’s use of quotes seems to be correct,
that is, in reflecting the intention of the person who is being quoted.
On page 3 Radin talks about "an astonishing admission" from
astronomer and skeptic Carl Sagan in the book "The Demon-Haunted World", and
quotes from the book, where Sagan states that he thinks that three named
claims in the ESP field deserve serious study. What Radin doesn’t mention is
that immediately following the quoted remark Sagan writes the following (page
302):
Please note the words "I don’t", "might" and "dubious",
which makes the "admission" from Sagan take on a less "astonishing"
appearance, than what Radin leads us to believe.
Going on to pages 4 and 5 I reached two quotes from professor of
statistics Jessica Utts and professor of psychology Ray Hyman who have
evaluated some psi research. Utts is quoted for the following:
Radin then says that "Surprisingly, the other principal reviewer,
skeptic Ray Hyman, agreed", and as documentation for this he quotes Hyman:
Please note the three dots in the Hyman quote. I thought that it might be
interesting to see what Radin had left out, so I saddled my trusty steed
"Google" and set out to search for the missing words. I found the report (where
the quote is from) reproduced in several locations; one of them was the American
Institutes of Research One has to wonder why Radin chose to leave this out. It clearly
seems to sow some seeds of doubt about the results. Hyman's report also contains
the following statement (thanks to Claus Larsen for bringing it to my
attention):
Now compare this quote with the Utt's quote. When I compare them,
‘agreement’ isn’t the first word that pops into my mind. So it seems that Radin’s basis for the agreement is that Hyman tends to agree with Utts that something is going on, but he seems to disagree with her regarding
everything else. This is further validated by the following quote taken from "The Evidence for Psychic Functioning", by Ray Hyman Based on these examples (though of course it would be preferable to
check all Radin’s quotations out) it seems to me, that Radin’s way of choosing and
interpreting quotes is a tad too creative, and it doesn’t give much
confidence regarding the other quotes later in the book.
Judgement Day
The first part of this article has been about the first few pages of
the book. In the following sections I will look at some parts that I found interesting.
The first thing I will turn my attention to is Radin’s description of an
experiment performed during the TV broadcast, where the ruling in the O.J.
Simpson case was announced (described on pages 166 and 167).
Before, during and after this broadcast five Random
Number Generators (RNGs) were producing strings of random bits. The claim behind this is that when many people are focused on one
thing (in this case the broadcast) then it influences stochastic processes like
a RNG (this phenomenon is called field consciousness).
Radin includes a graph of the odds of the RNG output occurring by
chance as a function of the time – this can be considered as a measure of how
much the output deviates from what is expected. This graph is reproduced in a
modified form below (I have added some vertical lines to make it easier to
assess the timing – it is taken from Dean Radin's article "Where, when, and who is the self?",
but for our purposes it’s identical to the one in the book). It is seen that it
has two large spikes, where something rather improbable happened. It is
furthermore seen that the graph seems to contain a lot of smaller fluctuations
superimposed on the main shape of the graph. I assume that these fluctuations
are ‘statistical noise’, which is to be expected from a RNG.
Radin mentions three events in the broadcast, which he thinks are
important (the beginning of the preshow, the beginning of the ‘main’ broadcast
and the verdict announcement) and he relates these three events to the graph
(and specifically marks them on the graph). Radin's hypothesis is that a spike should be produced in the graph when something very
interesting happens in the broadcast, and this seems to be the case. However I
noticed a few odd things about Radin’s interpretation of the graph:
The event markings on the graph are placed at the tops of the
smaller fluctuations superimposed on the main shape of the graph. This makes it
seem as if Radin sees something significant in the statistical noise, which
strikes me as being wrong, but then again, it is Radin who is knowledgeable
about statistics, not me, so I might be barking up the wrong tree.
The
event markings that Radin has made on the graph are supposed to correspond to
the events he describes in the text. However, when I looked closer at the graph
I saw that the time of the events marked on the graph didn’t match the times
that Radin gave in the text!
The preshow is supposed to have started at 9:00, but on the graph it seems to be marked somewhere between 8:56 and 8:57. According to the graph the spike is starting to decline at 9:00, which doesn’t really jibe with Radin’s claim.
The second event is the start of the ‘main’ broadcast, and it is
supposed to take place at 10:00, but on the graph the
event seems to be marked somewhere between 9:58 and 9:59.
Radin gives the following description in connection with the last
event (the announcement of the verdict):
A few minutes later than 10:00 can be no sooner than 10:02, but on the graph Radin has marked the event as occurring around 10:00 and the spike on the graph seems to start it’s rapid disappearance at 10:02.
These things seem to indicate that Radin has altered the timing on
the graph to make the data fit his theory.
One might think of several different explanations for the
discrepancy. For instance the graph may simply have been
offset a bit from the time-axis, but this doesn’t seem likely because the
discrepancies are not of equal size. One might also entertain the possibility
that we are seeing a case of precognition, but this possibility will invalidate
some of Radin’s other experiments. An example of this is an experiment, where
RNG output was recorded during the 1995 Academy Awards broadcast. In this
experiment the broadcast was divided into high and low interest periods and the
‘unlikeliness’ of the RNG output during these periods were compared. If
precognition really were important then the most unlikely output should occur
before the high interest periods – that is in the low interest periods, but
this isn’t the case.
A further point of interest is the fact that a third spike
on the graph is much smaller than the two major spikes, but it represents
output that are more unlikely than the outputs that Radin considers significant
in a similar experiment (e.g. in an experiment where RNG output were recorded
during a Superbowl broadcast). Radin doesn’t mention this spike, and according
to Claus Larsen (see An Evening with Dean Radin, by Claus Larsen The Quality of Psi Research
Psi research has received a lot of harsh criticism for a long time.
In my opinion Radin seems to understate the merit of some parts of the critique,
in that he doesn’t seem to acknowledge the fact that mistakes have been found in
many experiments, where a skeptical look was allowed. This is a problem
with a lot of the analyses that Radin performs. He collects data from other
people’s reports and uses a statistical method called
meta-analysis on this data, but since he wasn’t present at the
experiments, he can’t know whether they were as tight as the
reports might suggest. To illustrate this I will quote Susan Blackmore’s remark
on ganzfeld research:
This quote is taken from What Can the Paranormal Teach Us About Consciousness?, by Susan Blackmore In
connection with remote viewing (clairvoyance) James Randi gives us the
following description in his book "Flim-Flam!" (page
147):
Randi then goes on to describe how the list of locations were given
in chronological order (the judges knew this), and information was available in
the transcript, so that they could also be put into chronological order. For the
experiment to have any value it was absolutely essential, that such information
was not available to the judges, but according to Randi it was.
These quotes (if they can be trusted) illustrate the fact that
experiments might contain flaws even though they look good in the reports, and
it seems to me, that Radin doesn’t acknowledge this.
It is also interesting to note that Radin mentions the
parapsychologist J.B. Rhine and his experiments several times, without telling the reader
that it is now known that Rhine discarded negative data from his experiments. The Physicist Robert Park describes
this in "Voodoo Science" with the following words (on page 42):
I wonder why Radin has chosen to write about minor things like the possibility
of transferring information by marking envelopes (used to contain cards for the
ESP test) with fingernails, instead of major issues like this case of fraud
(which Rhine seems to have performed in good faith).
On page 218 Radin states that:
Here I
think that Radin misses the point of the critique, because what is interesting
is not whether there are fraud and errors in psi research, but whether it is
much more common in psi research than in other disciplines, and whether it
accounts for the results that are obtained.
A related critique that Radin also seems to get wrong is illustrated
on page 221. Radin tells us that in 1985 Charles Honorton analysed 28 studies.
9 of these had been "scrutinized" by Susan Blackmore and she found them
to be "clearly marred" by "accidental errors" (though according
to Radin she hasn’t demonstrated that these errors exist, but that is besides
the point I’m trying to make). Radin’s comment to this is:
I
contend that Radin misses the point again, for if the 9 studies that were
actually critically checked out were found to be marred, how do we then know
that the other unchecked studies weren’t? Radin’s argument makes me
think of a factory, where samples of the product are taken and checked for
defects, and if a sample is found to be defect, the sample is thrown out, and
everything else is shipped, since the faulty product has been removed
everything else must be ok.
There are other cases where I’m not convinced by Radin’s defence
against the critique. One example of this can be found on page 136 where Radin
mentions two things that psi researches are criticized for:
To deal with the first part of this critique Radin displays a graph
on page 136, where the quality of dice tossing experiments (that is experiments where a test subject attempts to influence or predict the outcomes of a series of die tosses) are plotted as a
function of time (and a similar graph for RNG experiments are given on page
142). This graph actually shows an increase in quality over time, and could at
first seem to disarm the critique of not learning from past mistakes. Things
are not that simple however. One of the first questions that occurred to me
when I saw the graph was: How is the experimental quality measured? Ray Hyman
makes the importance of this issue clear ("The Evidence for Psychic Functioning", by Ray Hyman Radin
gives a description of thirteen criteria that are used to compute a single
quality score. The description of each of the criteria is
brief, however, and alternative criteria or evaluation procedures aren’t
considered, so the reader can’t really make an informed judgement concerning the
graph. Furthermore it seems to me that, any such evaluation must contain a subjective judgement (as is indicated by Hyman), which should make us take Radins conclusion with a grain of salt. If we ignore this problem we are still left with the problem pointed out
in the Blackmore quote at the beginning of this section with experiments not
being performed according to the descriptions in the reports.
Radin
addresses the second criticism with the following words:
I have a
couple of objections to this argument. First of all why aren’t we allowed to
see a graph of this? (There are tons of graphs in the book, and this seems to be
an important issue). Why aren’t we told what the level of statistical
significance is? (Most results in the book are accompanied by a remark about how
‘unlikely’ they are, and how much faith we therefore should put in them).
Furthermore if we look at the two quality graphs on pages 136 and
142, we see that the quality is nowhere near optimal levels. The graph on page
136 starts at a score of about 4 and ends up around 7, and as far as I’m able
to tell this is on a scale from 0 to 13, so it seems that even at the end, the
average study fails at 6 of the 13 quality criteria. This seems to leave quite
a bit of room for errors. The graph on page 142 starts at around 3.5 and ends
up around 5, and it seems that the scale for this graph is 0 to 16, which seems
to indicate that the average experiment in the end fails at 11 out of 16
quality criteria. This doesn’t inspire much confidence. Furthermore the quality
change from a bit below 3.5/16 to 5/16 has taken around 28 years, so the
improvement has been very slow.
It is also interesting to compare the two graphs. If we do this, we
see that the graph on page 142 shows experiments with substantially lower
quality scores (and according to Radin the criteria used for them were
similar). Now please note that the graph on page 136 is for dice tossing
experiments, and the graph on page 142 is for RNG
experiments. Furthermore note that Radin states that RNG experiments are the
successors of the dice tossing experiments (see pages 138 and 212). Combining
these observations we see that the newer type of experiments seems to be of
lower quality than the old type (5/16 versus 7/13), which they are supposed to
be an improvement of. This seems to indicate that the critique is at least
partly correct.
Another point of interest regarding the criticism that psi effects
diminish with increasing experiment quality, is the fact that on a graph on
page 106, Radin shows the effect sizes for several different types of
experiments. Two of these are ESP card tests and high security ESP card tests.
The ‘normal’ card tests have an effect size of roughly 65%, while the high
security card tests have an effect size of roughly 55% (compared to the 50%
chance expectation), and the 95% confidence intervals for these two figures
don’t overlap. This shows that when the security in the card tests was
increased the effect size decreased dramatically.
On page 102 (in connection with remote viewing) Radin present
another defence against the critics that strikes me as being weird:
Why not? It doesn’t seem unlikely to me that a few participants
(selected by the experimenters) exploited (consciously or unconsciously) some
flaw in the experiments, while the others didn't. In the case of fraud it seems even more likely that a
few participants selected by the experimenters would yield the best results. If
I were a magician trying to perform some magic trick that needed the help of
other persons, then I would very much prefer to select these persons myself. It might also be the case that these test subjects had been cheating from the very beginning and was selected by the scientists who were impressed by their rate of success.
Page 102 contains more interesting information. Among other things
Radin tells the reader that results become better, if the test subjects are
allowed to describe what they perceive freely, instead of being forced to
select between a few discrete possibilities. This
means that better results are obtained when subjective judging is needed, than
when a simple objective test is used. When I relate this to James Randi’s
comments about Targ and Puthoff’s judging procedures above, I start
getting suspicious, and it seems that the high security experiment
type gives the lowest effects, in accordance with the claims of the critics.
Continuing on page 102 we find that results also get better if feedback is given during the experiments. This also sounds
my alarms, since by giving feedback you risk giving the test person information
that he shouldn’t have access to (either verbally or through body language), and again it seems that the tightness of the
experimental security is negatively related to the measured effect sizes.
Going on to page 240, we see that Radin has another interesting way
of dealing with his critics:
So all Radin has to do is to say that the critics are irrelevant,
and then the critique can be ignored. Well, I have to admit that this is so much
more time saving than having to argue that something is wrong with the critique
that Randi delivers.
In connection with this quote it might be interesting to note that
the other irrelevant person (the claimed psychic Uri Geller) was actually the central
subject of some tests and a scientific paper by Radin's role model Hal Puthoff.
Experiment Design
On a related note I want to discuss the design of a few of the experiments that Radin
was involved in. First I want to focus on field consciousness experiments.
That is experiments of the same kind as the O.J. Simpson broadcast experiment
described above, where output from RNGs are monitored during periods of time
when a lot of people are focused on the same thing.
On page 165 Radin describes an experiment based on a comedy show,
where two members from Radin’s research staff brought a notebook computer and a
RNG. During the show one of the researchers divided the show into high and low
interest periods, and it was checked whether the RNG output was more
’unlikely’ during the high interest periods than during the low interest
periods. This turned out to be the case, but what worries me is that the
researcher who made the division into the two categories might have had access
to the RNG output. Radin doesn’t state that she didn’t and this seems strange
since it is very important to the credibility of the experiment, because
experimenter bias could invalidate the experiment if she had this access. This
also seems to be a problem in a similar experiment performed during the 1995
Academy Awards broadcast, because here it appears that Radin had one of the two
RNGs used for the experiment, with him in his home, where he was alone.
Experimenter bias could also be a problem with
another aspect of the 1995 Academy Awards experiment. Here Radin and an
assistant divided the broadcast into high and low interest periods. I’m
very puzzled as to why Radin didn’t choose to let someone else do this, to make
sure that criticisms about data manipulation on this count could be avoided.
When I look at all the experiments that Radin has conducted
concerning RNG measurements of field consciousness (at least those presented in
the book), I note that Radin seems to change the experiment design
between each experiment: Sometimes an experiment is divided into high interest
and low interest periods, sometimes it isn’t, and at other times a more complex
interest index is used. It also seems that sometimes the accumulated odds are
considered, and sometimes they aren’t. I wonder why the experiment
design was changed in these ways, since it makes them harder to compare, and
thereby makes claims of reproducibility less convincing. It also makes them
more vulnerable to critique, such as speculation that the experiment design was
selected after the measurements in order to obtain the most favourable results.
I’ll leave the field consciousness experiments behind for a while
and go on to pages 175 to 189 where Radin deals with psi and gambling. In the chapter
contained in these pages Radin presents lots of graphs showing correlations between the lunar
cycle and payouts from casinos and lotteries. That might be interesting, but
the first question that I would have attempted to answer in this connection
would not have been: Is payout related to the moon? But instead: Is the payout
on slot machines higher than expected (because it is influenced by psi)? Why is
Radin suddenly interested in correlations with the lunar cycle instead of
seeing whether psi is at work? How can he know that he isn’t testing astrology
instead of psi? Why wasn’t he interested in the lunar cycle in all the other
types of experiments? If I were of a more suspicious nature I might have proposed
that Radin focused on the lunar cycle, because there were no effects like the
ones claimed in the other types of experiment, but I ain’t, so I won’t.
It seems
that Radin doesn’t really think that the Moon directly influences payouts, but is instead correlated with factors that do influence
the payouts. Two factors are proposed:
If Radin
believes that these factors actually are decisive, then I can’t
understand why all the graphs (except one) relating to these things are
expressed in terms of the lunar cycle instead of GMF and gravity. This problem
gets even worse since sometimes GMF is positively correlated (page 187) to the
lunar cycle, and sometimes it is negatively correlated
(e.g. page 182). This seems to be a very roundabout fashion of tackling the
problem of a connection between GMF and psi.
The
connection to gravity is (as mentioned above) tested via the lunar cycle; the
reasoning behind this is the following (page 180):
This
seems to me to be an inefficient way of measuring gravitational effect. I would
suggest performing the experiments during different times of the day, because
the gravitational influences of the Sun-Moon system varies much more over a day
than it does over a lunar cycle. As we all know the changing of the tides is a daily phenomenon not a monthly one. Therefore it should
have been possible to measure much greater effects if the experiment had been
designed in the way I propose.
The
tests for correlation with the GMF are designed to test for correlation between
the lunar cycle and casino payouts. Radin’s explanation of this is that the
lunar cycle is correlated with the GMF. However this is problematic since
sometimes the correlation is positive and sometimes it is negative. I must
admit to being baffled by Radin's decision to use the lunar cycle to test for
payout-GMF correlations. If I were to test for a correlation between phenomena
X and Y, I would test for a correlation between them directly. Instead Radin selects to find a phenomena Z that seems to correlate with X in a very complex
manner, and then test for correlation between Y and Z. This seems to complicate
matters a lot, and it makes it harder to make reasonable conclusions.
It also
seems that if the GMF is really important for psi, then the location of
laboratories should be important for psi experiments, since as far as I know
the GMF varies around the Earth. Solar eruptions should also be taken into
account since they seem to be the strongest influence on the GMF (see Committee on Solar and Space Physics:
Space Weather: A Research Perspective Furthermore it seems to me that the chapter which Radin calls "Psi
in the Casino" should have been called "The Moon in the Casino", since it
mostly contains tests for correlations between payouts and the lunar cycle. I
fail to see why this is evidence for psi. Why isn’t it evidence for astrology
or the hypothesis that the geomagnetic field influences slot machines?
I also wonder why in one case a correlation between a phenomenon and
the lunar cycle indicates a connection between the phenomenon and gravity, and
in other experiments a correlation between a phenomenon and the lunar cycle
indicates a connection between the phenomenon and GMF. This seems very
inconsistent to me.
Statistics
As I mentioned in the beginning, I’m sure that Radin knows quite a
lot more about statistics than I do. Nevertheless there are some statistical
issues that I want to address. The first thing is to note that most of Radin's
results are based on a statistical method called meta-analysis where a lot of
experiments are analysed together. The use Radin makes of this technique has
been severely criticized, for example by Ray Hyman ("The Evidence for Psychic Functioning" And statistician and astrophysicist Jeffrey Scargle Here
Scargle uses the phrase "combined studies" instead of meta-analysis, and by
"publication bias" he means that often only studies giving a positive result
are reported, while studies giving a negative results
stay in the file drawer. This is illustrated by the following quote taken from
James Randi’s book "Flim-Flam!" (page 143):
I’m in no position to judge whether Radin is right or not, but
Scargle’s arguments sound reasonable to me.
Leaving the meta-analysis issue unresolved, I want to go
on to another statistical concept, called a confidence interval. If you
have some measurements for some property, you can compute the average value,
but you can’t be sure that the average value you get is actually equal to the
real value. Lets take an example:
Imagine that you want to find out whether Americans
are on average taller than Germans. To do this you measure the heights of 1000 Americans and compute the average of their heights. After
that you measure the heights of 1000 Germans and compute the average of those
heights. Now you can’t be sure that the computed average values are equal to
the true averages of the populations, so therefore you use a statistical method
to make an interval of heights for each of the averages. This method makes sure that you are 95% sure that the population average is within this interval. These two
intervals are called 95% confidence intervals. If the intervals overlap we
can’t be confident that Americans are taller than
Germans, but if the American average is higher than the German one and the
intervals don’t overlap we can be reasonably confident that on average Americans are taller than Germans.
Radin uses 95%-confidence intervals on most of his graphs (and this
is the standard practice), but for some reason he switches to 65%-confidence
intervals in some graphs (e.g. pages 120-124 and 182-185), which means that we
can be less confident of the results. It seems odd that Radin chooses to be
inconsistent in this way, and it is interesting to note that he uses
65%-confidence intervals on the graphs where the differences between the
numbers being compared aren’t very big. By using only 65%-confidence intervals
Radin makes the results look more convincing than they actually are. For many of
these graphs we simply can’t be confident that there is anything but chance
differences if we use the overlapping 95%-confidence interval criteria
described above, and on page 35 Radin himself states his agreement with this
criteria. So we conclude that quite a few of the graphs that Radin uses fail
to give us any result we can be confident about, according to the criteria that
Radin himself agrees to, and furthermore it seems that he tries to make the results
look more convincing by using 65% instead of 95%.
Even worse than the fact that 65%-confidence intervals are used, is
the fact that some of the graphs don’t have any confidence intervals at all
(e.g. pages 181 and 187), and it is therefore impossible to tell whether anything interesting is to be gleaned from them.
While we’re talking about the graph on page 181 (which shows a
connection between the lunar cycle and ESP ability) I would like to mention
that there are simply no data for one or two days around full moon, and this seems very strange since these days should be
the most important ones in the experiment. So why are these days left
out? And doesn’t it significantly reduce the value of the graph?
I would also like to point out that the graph showing a correlation
between the lunar cycle and casino payouts shows a positive correlation. The
casino data (on which the graph is based) are from 1991 to 1994. A couple of
pages after this graph Radin presents a graph for payouts from a lottery. On
this graph the correlation between the payout and the lunar cycle is negative,
that is the opposite of the result from the casino graph. This doesn’t bother Radin,
however, and he tells us that this is because during 1993 when the lottery
data were obtained, the lunar cycle was positively correlated with the GMF, which is the opposite of the normal situation. This information struck me as pretty odd, since 1993 is one of the years included in the casino study, so why are the correlations of opposite sign if the two studies are from
overlapping time frames? I have to wonder why Radin doesn’t notice this
problem.
The field consciousness experiments, which I have talked about in
connection with experiment design and the O.J. Simpson experiment, raise
further issues. First of all I would like to mention the graph on page 166,
where Radin plots accumulated RNG odds as a function of time for both high and
low interest periods of RNG recordings from a comedy show (that is one curve
for the high interest periods and one for the low interest periods). The high
interest curve slowly rises from low odds to rather unlikely results. Radin
states that:
The low interest curve stays at low odds all the time, but it is
much shorter than the high interest curve (apparently there were more
interesting than uninteresting parts), and the high interest curve stays nearly
as low as the low interest graph during all the duration of the low interest
curve. So how can Radin really conclude that there is a difference between the
odds obtained on the two curves?
The above quote is also interesting in another way, because Radin
states that momentary fluctuations may occur by chance, but in the case of the
O.J. Simpson broadcast experiment, momentary fluctuations are all we get, and
Radin sees these as very important. On the comedy show graph Radin disregards a
spike of around 10 minutes duration that reaches odds 1 to 1000, but on the O.J.
graph Radin uses a spike with a duration of about 8 minutes reaching odds
around 1 to 300 (note that the scale on the graph is logarithmic) as evidence. But according
to his own argument this spike should be ignored, and
the O.J. graph therefore loses its significance.
Is Psi a Consistent Phenomenon?
I think that it is very interesting to examine whether psi seems to
be a consistent phenomena, and Radin makes some remarks about this. For example
in connection with experiments where a person is trying to influence the
tossing of a die or a RNG, Radin notes that (page 141):
However
I fail to see why we should expect the same effect sizes in these two types of
experiment. A RNG is, after all, radically different from a die. But let’s humour
Radin and accept that we should expect similar effect sizes in different
experiments. If we do this it seems extremely odd that the Superbowl field
consciousness experiment gets results that have odds nearly 100 times lower
than the 1995 Academy Awards field consciousness experiment, and why is the
1996 Academy Awards field consciousness experiment results also much lower than
those from 1995?
In connection with an
experiment where RNG data was recorded during an Olympics opening ceremony,
Radin states that all of the ceremony were of high interest, but nevertheless
the odds were very low in the first half of the broadcast, while in the O.J.
Simpson experiment significant results are argued to be achieved at once. Why
does it take two hours to obtain really unlikely results in the Olympics
experiment, while they were obtained instantaneously in the O.J. Simpson
experiment?
Similarly
the fact that the odds change between high and low interest periods in the
Academy Awards experiment seems to indicate that results should be immediate,
but as mentioned this isn’t the case for the Olympics experiment. And why don’t the odds drop to chance levels at once after
the ceremony ends, when this seems to happen in the other experiments?
We can also compare the results from the Superbowl experiments with
the Olympics experiment. In the Superbowl experiment very low odds are
obtained. Radin states that this is because there really aren’t very big
differences between high and low interest periods during a Superbowl broadcast, because it is fast moving all
the time and even the commercials are supposedly interesting.
So we can conclude that if everything is interesting then large odds won’t be
obtained. In the Olympics broadcast, however, all of the broadcast is deemed to
be interesting, but very large odds are obtained nevertheless.
Why doesn’t Radin point out that these results seem to be inconsistent?
There
are other inconsistencies in Radin’s results and arguments. In connection with
the field consciousness experiments Radin
states (on page 161) that attempts to focus on influencing the RNGs can destroy
the effect on the RNG. But then I would like to know how the experiments where a
human is to intentionally alter the output of a die throw or a RNG are possible.
Because in these experiments the participants do in fact focus.
These two things seem to contradict each other, but Radin doesn’t seem
to notice this.
On page 173 Radin writes that he has assumed that the global
consciousness effect is "non-local, meaning that it would not drop off with
distance." Combine this with the fact that 40 people can produce
significant effects by attending a comedy show, and we reach the conclusion
that RNGs should always give significant effects, because it is hard to imagine
that there isn’t some interesting event going on with at least 40 people somewhere in the
world at any given time. So the RNGs should be outputting significant results
all the time, but this is clearly not the case, so Radin's assumption seems to
be contradicted.
Furthermore, if field consciousness is non-local then why doesn’t it
seem to influence the other psi experiments where a test subject attempts to
influence a die or a RNG? If non-local field consciousness is to be taken
seriously, then it seems to me that it should be taken into account in these
other experiments.
It is also interesting to consider the consequences of the
experiment that Radin seems to believe establishes a connection between gravity
and ESP potential. The gravity fluctuations considered in this experiment are
the changes due to the lunar cycle. But, as argued before, gravity changes more
during the Earth’s rotation than it does if one looks at the same time of day during a lunar cycle. So if the lunar cycle has any influence, then the time
of the day should a have much larger influence on psi ability than the lunar
cycle. This should then be considered a very important factor in every psi
experiment, but it seems that this is never considered. The geographical
location of a laboratory should also be significant since the gravity is
stronger at the Equator than at the poles, but this factor also seems to be
neglected. Similar issues arise in connection with the lunar cycle and GMF.
As a final remark about consistency I would like to point out that
on page 102 Radin says that in connection with remote viewing a few test
participants seem to be doing much better than the rest, and this is also the
case for precognition (see page 115), but in connection with mind-matter
interactions Radin writes that there are "no "star" performers". Of course this might have some perfectly natural explanation, but it doesn’t strike
me as a feature of a consistent phenomenon.
Parting Comments
I don’t suffer from the delusion that my scribblings here will make
anyone reject the research presented in "The Conscious Universe" – this has
never been my intention and in my opinion "The Conscious Universe" is an
interesting book, whose subject area deserves closer inspection. What I have
tried to do in this article is to show that there seem to be some
discrepancies in the book, which I think are worth keeping in mind when
evaluating it. I also make no claim to have found the absolute truth about
anything, but what I have written is the result of my simple attempts at
careful reasoning. Since human reasoning has never been perfect (and my
non-perfection has been demonstrated often enough) I would really appreciate it
if anyone who finds a fault in this article, would contact me and correct my
mistake - I would also appreciate any other comment you might have. Acknowledgements Thanks to Claus Larsen for encouraging me to write this article and to Dann Simonsen for giving me a thorough list of corrections for the first version it."I pick these claims not because I think they’re likely to be valid
(I don’t), but as examples of contentions that might
be true. The last three have at least some, although still dubious,
experimental support. Of course I could be wrong."
"The statistical results of the studies we examined are far beyond what is expected by chance. Arguments that
these results could be due to methodological flaws in the experiments are
soundly refuted. Effects of similar magnitudes to those found in
government-sponsored research ... have been replicated at a number of
laboratories across the world. Such consistency cannot be readily explained by
claims of flaws or fraud.... It is recommended that future experiments focus on
how to make it as useful as possible. There is little benefit to continuing
experiments designed to offer proof"
"The statistical departures from chance appear to be too large
and consistent to attribute to statistical flukes of any sort.... I tend to
agree with Professor Utts that real effects are occurring in these experiments.
Something other than chance departures from the null hypothesis has occurred in
these experiments."
.
It turned out that the following words are missing:
"Although I cannot dismiss the possibility that these rejections
of the null hypothesis might reflect limitations in the statistical model as an
approximation of the experimental situation"
"We disagree on key questions such as:
The remainder of this report will try to justify why I believe the answer to
these three questions is ‘no.’"
:
"Indeed, I do not believe that ‘the current collection of data’
justifies that an anomaly of any sort has been demonstrated, let alone a
paranormal anomaly. Although Utts and I -- in our capacities as coevaluators of
the Stargate project -- evaluated the same set of data, we came to very
different conclusions."
"a few minutes later [that is
later than 10:00] the order in all five RNGs suddenly peaked to its highest point
in the two hours of recorded data precisely when the court clerk read the
verdict."
)
Radin doesn’t have an explanation for it. It seems a bit strange to me to
simply ignore unexplained data of a magnitude that you consider significant in
other studies. If I were in a bad mood I might have described this as selecting
the data that fit your theory, and discarding the data that don’t. However
I’m not in a bad mood, so I won’t say that.
"These
experiments, which looked so beautifully designed in print, were in fact open
to fraud or error in several ways, and indeed I detected several errors and
failures to follow the protocol while I was there. I concluded that the
published papers gave an unfair impression of the experiments and that the
results could not be relied upon as evidence for psi. Eventually the
experimenters and I all published our different views of the affair"
, another quote shedding some light on this issue in connection with
autoganzfeld experiments is the following taken from "The Evidence for Psychic Functioning", by Ray Hyman
:
"The
experimenter, who was not so well shielded from the sender as the subject,
interacted with the subject during the judging process. Indeed, during half of the
trials the experimenter deliberately prompted the subject during the judging
procedure."
"The judging procedure had been well designed–on paper, that is.
Judges were given a list of nine locations and a package of transcripts. Their
job was to match the locations with the correct transcripts. It was done with
great accuracy, and the case seemed proved. But when we find that three judges
appointed by other officials at SRI failed to get good results with the
matching procedure, we begin to get suspicious. Targ and Puthoff [Radin’s
role model], however, found two who were sympathetic, and these two did
just fine."
"Rhine believed that persons who disliked him guessed wrong to spite him.
Therefore, he felt it would be misleading to include their scores."
"if we were forced to dismiss
scientific claims in all fields where there have been a few cases of
experimenter fraud, we would have to throw out virtually every realm of
science–since fraud exists in all human endeavors."
"after Blackmore’s allegedly
‘marred’ studies were eliminated from the meta-analysis, the overall hit rate
in the remaining studies remained exactly the same as before. In other
words, Blackmore’s criticism was tested and it did not explain away the
ganzfeld results."
):
"As far as I can tell, I was the first person to do a
meta-analysis on parapsychological data. I did a meta-analysis of the original
ganzfeld experiments as part of my critique of those experiments. My analysis
demonstrated that certain flaws, especially quality of randomisation, did
correlate with outcome. Successful outcomes correlated with inadequate
methodology. In his reply to my critique, Charles Honorton did his own
meta-analysis of the same data. He too scored for flaws, but he devised scoring
schemes different from mine. In his analysis, his quality ratings did not
correlate with outcome. This came about because, in part, Honorton found more
flaws in unsuccessful experiments than I did. On the other I found more flaws
in successful experiments than Honorton did. Presumably, both Honorton and I
believed we were rating quality in an objective and unbiased way. Yet, both of
us ended up with results that matched our preconceptions."
"We
tested this argument by looking at the relationship between hit rates (in this
case, averaged by year) and the study quality averaged per year. We found that
the relationship was essentially flat, so the critique is not valid."
"in test after test, psi
performance among a small group of selected individuals far exceeded
performance among unselected volunteers. This was an important observation,
because if design problems accounted for successful experiments–as critics
often assumed–then the selected group would not have been able to perform consistently
better than unselected volunteers."
"nothing about the work of either
Geller or Randi is described in this book. They are actually so irrelevant
to the scientific evaluation of psi that not a single experiment involving
either person is included among the thousand studies reviewed in the
meta-analyses."
"One
way to do this was to conduct an experiment every day over the lunar cycle,
because the sun-moon system predictably changes the gravitational forces (i.e.,
tidal forces) felt on earth."
– thanks to Mogens Winther for this
link).
)
who states that:
"The major point I would make, however, is that drawing
conclusions from meta-analytic studies is like having your cake and eating it
too."
addresses the use of meta-analysis in "The Conscious Universe" in this way:
"The possibility that these results are spurious and due to
publication bias is considered, and then rejected [by Radin] because the
FSFD calculations yield huges values for the putative file drawer. It is my
opinion that, for the reasons described here, these values are meaningless and
that publication bias may well be responsible for the positive results derived
from combined studies."
"Hundreds
of experiments that were done by SRI in testing Price, Geller and Swann were
never reported. Instead, tests with favorable results were selected, in spite
of their poor control and heavily biased ambiguity, to be published as genuine
scientific results despite strenuous objections from more serious and careful
scientists."
"Large fluctuations like this may occur by chance in shorter
random sequences, but progressive accumulation of such odds over longer
sequences is not as likely. This is why the long-term trend of the data
as they are accumulated within each condition is of interest, rather than
momentary fluctuations"
"We see that the dice study results and the RNG study results are
remarkably similar, suggesting that the same mind-matter interaction effects
have been repeatedly observed"