[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RECENT MANUAL MEASUREMENTS OF OA AND OAA
On Mon, 23 Jan 2006, Phil Davis wrote:
> It would be much more constructive if Stevan spent time trying
> to find problems in their methodology and analysis...
As I said, the discrepancies between our test of the robot's
accuracy and Goodman et al's prompted us to try to find the basis
for the discrepancy, and we think we have found it:
The robot sends ISI reference queries to several search engines
and then tests up to the first 60 hits to see if any of them is
OA, stopping and returning "OA" as soon as its algorithm judges
that a hit is OA, and returning "NOA" if none of the (up to) 60
hits is OA.
The right way to check the robot's accuracy is to save all the
hits, and hand-check a sample of all of them for a subset that
the robot judged "OA" and a subset the robot judged "NOA". What
we instead did in our own small test sample was to do a search by
hand for a subsample of 100 references that the robot had judged
to be OA and 100 references it had judged NOA (in Biology).
Goodman et al. did the same for a sample about three times as big
in Biology, as well as in Sociology.
All three tests found very different accuracies. The reason now
seems clear: When one hand-checks the accuracy of a device, this
has to be on the *device*'s sample, not a different sample. All
of us had used a different sample (and even different search
engines). The right test of the robot's accuracy requires
hand-checking the (up to) 60 hits that the robot actually sampled
and processed and judged OA or NOA. We are now re-doing both the
searches and the tests, saving the hits for doing this
hand-checking.
In other words, all three tests were biassed against the robot --
being based on different samples, from different sources, united
only by whether or not the robot had judged the reference item to
have an OA version somewhere among the (up to) 60 hits in the
*first* sample. We had not noticed the bias earlier, because our
test had yielded such a strong accuracy despite the (unnoticed)
bias.
As I said before, I am glad Goodman et al. did the further test,
whose much weaker result alerted us to the fact that something
was amiss. We think we have found what was amiss, and it was not
in the robot's accuracy but in our test of the robot's accuracy.
Stay tuned for the results for both Biology and Sociology, which
are being completely re-done by the robot, but this time saving
all the hits; the robot accuracy test will be available soon for
a still larger subsample of these same data. We are also saving
all the hits (for all of Biology and Sociology, not just this
larger sample), so anyone else can hand-check them if they wish.
Stevan Harnad
> At 08:41 PM 1/22/2006, you wrote:
>>Before anyone gets too excited about the tiny Goodman et al. test
>>result, may I suggest waiting a couple of weeks, when we will be
>>reporting the results of a far bigger and more accurate test of
>>the robot's accuracy?
>>
>>Those who (for some reason) were hoping that the robot would
>>prove too inaccurate and that the findings on the OA advantage
>>would prove invalid may be disappointed with the outcome. I can
>>already say that overinterpretations of the tiny Goodman et al.
>>test as showing that the OA/OAA findings to date are "worthless"
>>are rather overstated even on the meagre evidence to date,
>>especially since two thirds of the published findings on the OA
>>citation advantage are not even robot-based!.
>>
>>(This shrillness also seems to me to be trying to make rather
>>much out of having actually done rather little!)
>>
>>As to the separate issue of how to treat the OA journal article
>>counts (as opposed to the counts for the self-archived non-OA
>>journal articles): We count it all, of course, but only use the
>>non-OA journal article counts in calculating the OA advantage,
>>because those are (necessarily) within-journal ratios, and
>>citation ratios of zero and infinity are meaningless. Think about
>>it.
>
> [SNIP]
>
>>Stevan Harnad