I’ll let the poll (prior post) run for a while but as it winds down I wanted to explain why I posted it:
In the past, I’ve often run into scientists who, when defending their published or other research, respond something like this:
“Yeah those data (or methods) might be wrong but the conclusions are right regardless, so don’t worry.”
And I’ve said things like that before. However, I’ve since realized that this is a dangerous attitude, and in many contexts it is wrong.
If the data are guesses, as in the example I gave, then we might worry about them and want to improve them. The “data are guesses” context that I set the prior post in comes from Garland’s 1983 paper on the maximal speeds of mammals– you can download a pdf here if this link works (or Google it). Basically the analysis shows that, as mammals get bigger, they don’t speed up as a simple linear analysis might show you. Rather, at a moderate size of around 50-100kg body mass or so, they hit a plateau of maximal speed, then bigger mammals tend to move more slowly. However, all but a few of the data points in that paper are guesses, many coming from old literature. The elephant data points are excessively fast in the case of African elephants, and on a little blog-ish webpage from the early 2000s we chronicled the history of these data– it’s a fun read, I think. The most important, influential data plot from that paper by Garland is below, and I love it– this plot says a lot:
I’ve worried about the accuracy of those data points for a long time, especially as analyses keep re-using them– e.g. this paper, this one, and this one, by different authors. I’ve talked to several people about this paper over the past 20 years or so. The general feeling has been in agreement with Scientist 1 in the poll, or the quote above– it’s hard to imagine how the main conclusions of the paper would truly be wrong, despite the unavoidable flaws in the data. I’d agree with that statement still: I love that Garland paper after many years and many reads. It is a paper that is strongly related to hypotheses that my own research seeks out to test. I’ve also tried to fill in some real empirical data on maximal speeds for mammals (mainly elephants; others have been less attainable), to improve data that could be put into or compared with such an analysis. But it is very hard to get good data on even near-maximal speeds for most non-domesticated, non-trained species. So the situation seems to be tolerable. Not ideal, but tolerable. Since 1983, science seems to be moving slowly toward better understanding of the real-life patterns that the Garland paper first inferred, and that is good.
But…
My poll wasn’t really about that Garland paper. I could defend that paper- it makes the best of a tough situation, and it has stimulated a lot of research (197 citations according to Google; seems low actually, considering the influence I feel the paper has had).
I decided to do the poll because thinking about the Garland paper’s “(educated) guesses as data” led me to think of another context in which someone might say “Yeah those data might be wrong but the conclusions are right regardless, so don’t worry.” They might say it to defend their own work, such as to deflect concerns that the paper might be based on flawed data or methods that should be formally corrected. I’ve heard people say this a lot about their own work, and sometimes it might be defensible. But I think we should think harder about why we would say such things, and if we are justified in doing so.
We may not just be making the best of a tough situation in our own research. Yes, indeed, science is normally wrong to some degree. A more disconcerting situation is that our wrongs may be mistakes that others will proliferate in the future. Part of the reasoning for being strict stewards of our own data is this: It’s our responsibility as scientists to protect the integrity of the scientific record, particularly of our own published research because we may know that best. We’re not funded (by whatever source, unless we’re independently wealthy) just to further our own careers, although that’s important too, as we’re not robots. We’re funded to generate useful knowledge (including data) that others can use, for the benefit of the society/institution that funds us. All the more reason to share our critical data as we publish papers, but I won’t go off on that important tangent right now.
In the context described in the latter paragraph and the overly simplistic poll, I’d tend to favour data over conclusions, especially if forced to answer the question as phrased. The poll reveals that, like me, most (~58%) respondents also would tend to favour data over conclusions (yes, biased audience, perhaps- social media users might tend to be more savvy about data issues in science today? Small sample size, sure, that too!). Whereas very few (~10%) would favour conclusions, in the context of the poll. The many excellent comments on the poll post reveal the trickier nuances behind the poll’s overly simplistic question, and why many (~32%) did not favour one answer over the other.
If you’ve followed this blog for a while, you may be familiar with a post in which I ruminated over my own responsibilities and conundrums we face in work-life balance, personal happiness, and our desires to protect ourselves or judge/shame others. And if you’ve closely followed me on Twitter or Facebook, you may have noticed we corrected a paper recently and retracted another. So I’ve stuck by my guns lately, as I long have, to correct my team’s work when I’m aware of problems. But along the way I’ve learned a lot, too, about myself, science, collaboration, humanity, how to improve research practice or scrutiny, and the pain of errors vs. the satisfaction of doing the right thing. I’ve had some excellent advice from senior management at the RVC along the way, which I am thankful for.
I’ve been realizing I should minimize my own usage of the phrase “The science may be flawed but the conclusions are right.” That can be a more-or-less valid defence, as in the case of the classic Garland paper. But it can also be a mask (unintentional or not) that hides fear that past science might have real problems (or even just minor ones that nonetheless deserve fixing) that could distract one away from the pressing issues of current science. Science doesn’t appreciate the “pay no attention to the person behind the curtain” defence, however. And we owe it to future science to tidy up past messes, ensuring the soundness of science’s data.
We’re used to moving forward in science, not backward. Indeed, the idea of moving backward, undoing one’s own efforts, can be terrifying to a scientist– especially an early career researcher, who may feel they have more at risk. But it is at the very core of science’s ethos to undo itself, to fix itself, and then to move on forward again.
I hope that this blog post inspires other scientists to think about their own research and how they balance the priorities of keeping their research chugging along but also looking backwards and reassessing it as they proceed. It should become less common to say “Yeah those data might be wrong but the conclusions are right regardless, so don’t worry.” Or it might more common to politely question such a response in others. As I wrote before, there often are no simple, one-size-fits-all answers for how to best do science. Yet that means we should be wary of letting our own simple answers slip out, lest they blind us or others.
Maybe this is all bloody obvious or tedious to blog readers but I found it interesting to think about, so I’m sharing it. I’d enjoy hearing your thoughts.
Coming soon: more Mystery Anatomy, and a Richard Owen post I’ve long intended to do.
Thanks. Glad to have discovered your fascinating blog. Lots to explore and puzzle over! Regards from Thom at the immortal jukebox.
To say that Garland’s paper contains flawed data may be too strong. I believe that it contains data that are roughly correct with imprecisions that do not affect the average results, probably because errors upwards and downwards would tend to cancel each other.
What might be required is a theoretical validation of Garland’s conclusions. Such validation should be done in all levels of the theory, ranging from a set of simple equations (some of which are still lacking in the literature) to more or less complex biomechanical simulations. In the meantime it would hurt to have updated measures for top speed of large animals such as rhinos, giraffes, hippos, and medium sized: lions, tigers, ostriches, and also small animals, such as domestic cats, hares, etc.
Measurements for large animals should be done with relative urgency. We don’t have certainty whether those species will be around a hundred years in the future.
Regards,
Mauricio
Thanks Mauricio, well the “flawed data” phrasing is certainly a matter of opinion and degree, and we’re in firm agreement that more good data are needed. But the point of the post was not that Garland’s paper is bad because of those data– quite to the contrary– but rather that it is a dangerous attitude to think that if data are of poor or uncertain quality, that if the conclusions are still roughly right we should not be concerned.
We’re supposed to be in science to generate knowledge (data, methods, conclusions) that others can use, and to knowingly put bad data out into the literature, or to dismiss known, practically avoidable/fixable errors in data as unimportant if we don’t think they’d influence the conclusions, can be a very dangerous attitude that may lead to future studies having flawed data and conclusions. We may be confident that our data and conclusions are OK, but that does not mean that the our data are OK enough for other studies to trust, and we should be mindful of that.