Testing a non-existent claim

by Mark Tidwell

A commentary on a paper published by T.J. Robertson (TJR) and Archie E. Roy (AER) in the Journal of the Society for Psychical Research:

A Preliminary Study of the Acceptance by Non-Recipients of Mediums’ Statements to Recipients.
Robertson and Roy, JSPR.
Vol 65.2 No 863 April 2001 pg 91-106


This first paper (of two, the second to be published here) does not set out to provide evidence of mediumship, rather it attempts to falsify a skeptical hypothesis. This hypothesis is defined by the authors as:

“Statements made by mediums to recipients are so general that they could as readily be accepted by non-recipients.”

(Recipients are those to whom a medium has addressed a number of ostensibly relevant statements. Non-recipients are any of those to whom the statements have not been addressed.)

These statements were gathered by the authors over a two year period from a variety of sources including public meetings and controlled sessions with smaller groups, made by 10 mediums to 44 recipients and included 407 non-recipients. Some sessions were tape-recorded, one was professionally taped by the BBC and in these and all other examples, the statements were recorded by TJR on a notepad.

The authors suggest that it is probable that in some of the smaller sessions, the mediums did not know any of the potential sitters, as TJR brought the mediums to audiences that were unknown to either author as that they had been assembled by a third party who did not know who the medium would be. The authors’ state that the mediums’ statements were not offered to the audience as a whole, but in all cases were directed at a single person.

The readings were then transcribed into one-line statements that were then grouped into sets of statements that reflected the totality of a reading to an individual recipient. Sets of statements were given to recipients and non-recipients who were asked to check off the statements that they felt applied to them (non-recipients were instructed to treat the sets as if they were given to them by the medium). The 44 total sets of statements were distributed among 407 non-recipients to be scored. A ratio is then created by dividing the number of statements that apply to the scorer by the total number of statements scored to create the fraction f.

A statistical analysis of the data follows which scores the fraction of correct statements as previously determined by the recipients and non-recipients. The authors then subject their data to additional analysis to account for the “weighting” of the data to determine its degree of generality or specificity. The advantage of this weighting procedure, it is claimed, is that unlike previous works, it does not depend on the subjective opinion of the sitters or the investigators, but rather is determined by statistical analysis.

The authors report that the fraction of statements accepted by the 44 recipients from the 44 sets of statements addressed to them averages 0.65 and that the median value of the fraction of statements accepted by the 407 non-recipients is 0.30. Based on this difference, the authors determine that the probability of the results being due to chance is less than 1 in 10,000 million.

Initial concerns
The hypothesis

The very premise of this study is a troubling one. The author’s state that they are asking the question:

“Is it true that the statements made by the medium to a sitter or a member of a meeting could be accepted just as readily by anyone?”

It is described as a skeptical hypothesis (or a “naïve” skeptical hypothesis). Yet I am unable to find any authoritative reference ascribed to skeptics that state this hypothesis, naively or otherwise.

It is commonly suggested by skeptics that cold-readers frequently begin readings by offering very general statements. Then when a sitter is chosen, (usually by volunteering that the statements apply to them), the medium will begin to shape their subsequent statements based on the sitters’ responses. As more information is derived by feedback and by visual and/or audio cues, the mediums’ statements become more specific.

No skeptical critic to my knowledge has ever suggested that statements derived by cold-reading will be so general that they will apply to anyone. That is counter-intuitive. The very nature of cold-reading requires that statements will become more specific to the person being addressed. What skeptics do suggest, is that these universally acceptable statements will occur when there is no room allowed for feedback whatsoever between the medium and the sitter.

The authors seem to be aware of this, as they allow that feedback, (intentional or otherwise), may influence the mediums’ statements. But if their hypothesis allows for this type of feedback, why is it mentioned as a concern? If it does not allow for feedback, why weren’t controls included to prevent it?

Lack of controls

The authors describe the smaller group meetings as controlled sessions, but few controls are described. The only control in these cases mentioned was to assemble the groups of sitters by a third party who did know which medium would be used. We are not told how the audience was selected, or by whom, or how the recipients were chosen to be read. Did the potential sitters record their personal information prior to the meeting? Was anything offered for confirmation to the investigators? Some of the sessions took place in public areas where controls were presumably not initiated at all, yet these readings are included as well. Only the sessions that the authors consider as controlled should have been included in the study.

Record keeping

Accuracy in reporting readings is extremely important, yet this task was left in most cases to one of the researchers writing on a notepad. Though surely every effort was made to be accurate, human hearing and memory is not always reliable. A simple tape recorder would have sufficed to insure the fidelity of reporting.

What “applies??”

The recipients and non-recipients both are asked to tick off the statements that apply to them, but no criteria is given for what “applies”. No transcripts of any of the readings are given, so we cannot judge how well the statements of the mediums are transcribed and then translated into the statements that make up the set. Consider the hypothetical mediums’ statement that I have invented below:

“Your father died in a car accident.”

Which of the following truths might be considered to “apply” to this statement?

  • The father is not dead and was never in an accident.
  • The father is not dead, but was in a near fatal car accident the year before he died.
  • The father is dead, but not by a car accident.
  • The father is dead, not by a car accident, but was in one earlier in life.
  • The father committed suicide by carbon monoxide poisoning when he locked himself in his car.
  • The father died in a car collision.

Must a statement be 100% accurate to “apply” or will a partially correct statement be considered a hit? This is not discussed, so it is difficult for the reader to determine how accurate this measure was. This is a significant issue as this potentially highly subjective choice on the part of the scorers, forms the basis from which all data is statistically examined afterwards.

I am not qualified to judge or even describe the nature of the statistical analysis performed in this study. I will leave that to more learned heads than mine. However I can offer some limited observations. (I stand ready to being corrected on these issues.) Note the scores reported above. The average score of recipients is compared to the median score of the non-recipients. This is inappropriate. The two do not carry equal weight in statistics. Typically, median values are considered to be more representative than averages are. This is even more curious since, based on my own crude estimation, using the median value for both would work to the advantage of falsifying the stated hypothesis.

The authors report an average of 0.65 statements applying to the recipients responses and a median of .30 statements applying to the non-recipients. However I estimate that the median value for recipients is closer to 0.80, which widens the gap between the two groups. Regardless of the inconsistency, the strength of statistical analysis is only as good as the data that it is based on. For reasons listed above and for others that I will describe shortly, the data derived in this study is heavily flawed.

Specific concerns

The other, more specific problems that I have with this study are the same as those that the investigators bring to bear. They term these “normal factors” as opposed to paranormal factors. These include:

  1. A “different attitude” between the recipient and the non-recipient.
    Essentially, rater bias. The recipient knows that they have been read and may be eager to validate their experience by scoring their results higher. Conversely the non-recipients know that they have not been read and may therefore not consider the scoring process as seriously. This is a very serious flaw already before the statements are introduced into the weighting process. The authors state, “The problem of assessing this factor’s influence on the results is formidable and it might seem impossible to devise a means of obtaining a solution.” It’s hardly impossible to obtain a solution to this, but it is important to try. This factor is not adjusted for in the analysis.
  2. A different cultural background.
    When selecting non-recipients to score for recipients the authors attempted to place them within the same socio-cultural categories. Rural Scotsmen non-recipients for rural Scotsmen recipients, East-enders for East-enders, etc. This was to ensure that statements of the medium that may have been specific to a cultural type might be more appropriate scored within that group. This, in my opinion, actually works against the offered definition of the generalization hypothesis. If the hypothesis states that the mediums statements might apply to anyone, then anyone it should be, and therefore, attempts to homogenize the experimental group will run counter to the proposed test. The authors suggest that future studies may expand this cultural base to determine how this may contribute.
  3. Deduction of information from recipient’s appearance.
  4. Deduction of information from recipient’s body language initially and/or verbal responses during the proffering of statements by the medium.
    The inference here is that the mediums may unconsciously assimilate this information from verbal and non-verbal cues from the sitter. Even unconsciously, the medium may infer a great degree of information from the way the recipients present themselves. Their clothing, their hairstyle, their voice, their reactions to inquiries, etc. may all contribute to an impression of their personality which the medium might incorporate into their reading.
  5. Deliberate cheating on the part of the medium.
    Basically, cold or hot reading. The authors dismiss this as a serious possibility based on their relationships with the mediums as they have been “known to both investigators for a number of years.” This is a very distressing statement. Trust has no place in scientific investigation, especially when investigating a field that has been fraught with accusations of deceit. The average reader does not know Messr’s Robertson and Roy, nor do they know the mediums used or even their names. I suggest no reason to doubt the honesty of the investigators, but the trustworthy scientist is the one that does not ask for trust. The trustworthy scientist provides adequate controls to ensure that trust is not required.


The authors offer the following conclusions to which I have added my comments.

  1. It is possible to devise an objective and quantifiable procedure to study a broad spectrum of mediums’ statements to recipients.
    Agreed. But this study does not represent it. There are too many variables, most of which the authors acknowledge, but do not compensate for.
  2. The data collected by applying this procedure are capable of being assessed statistically.
    Agreed. But if the data is flawed, then any subsequent statistical analysis will be flawed as well.
  3. An objective method of weighting a medium’s statements is realisable, enabling a criterion assessing the generality of a medium’s statements to be obtained.
    Agreed, the weighting process itself seems objective, but it is based on subjective judgements of the sitters. This is an issue that needs to be addressed. Perhaps by converting the statements into simple binary questions?
  4. The application of the procedure described in the present paper reveals a large and significant gap between the degree of acceptance of mediums’ statements by the recipients and the degree of acceptance of such statements by non-recipients.
    Strongly disagree. Although the gap is undeniably large, these numbers rely on too many variables that have not been accounted for. Garbage in, garbage out. The authors seem to be aware of these deficiencies and propose to address them in future studies.

Until then, this paper stands as a representative of everything that is wrong with the current style of scientific investigation of mediumship abilities, namely multiple inadequate controls and subjectively determined criteria.

I will address the authors proposed changes in part II, to be published in February 2003.