I'll make the assumption that astrophysics is science. That shouldn't be terribly surprising, given both that folks tend to talk about it as being science, often as being a particularly beautiful branch of science, and that I've studied it myself. But starting from that assumption suggests that either astrophysics is not science after all, or that many of the complaints about climatology are ill-founded. At least in the sense that if they should be taken seriously (speakers rejecting the idea that climatology is science) then astrophysics, and quite a lot of other sciences, should be rejected as well. Now I do know some folks who do reject astrophysics and some other sciences, so some people probably do mean that.
But let's think a bit more about what reproducibility means. I took it up lightly earlier, regarding pretty much just climate data set reproducibility. Please excuse the narrow focus as just attention to my own area. I'll broaden scope some here.
On one extreme end of notions of reproducibility is the idea that anybody, anywhere, should be able to reproduce your results -- or else you aren't doing science. I don't think that's actually been held to be a requirement for science at any time in history, so such folks are arguing for a change in how science is done. Maybe they're right; let's think about it. As with many ponderings, this goes for a while ....
In astrophysics, one of the things you do is collect data. The equipment can be fairly minor -- amateur astronomers contribute to variable star observing, which is then used for professional research. But often the equipment is major -- the Hubble Space Telescope, major ground-based observatories, exotic observatories for taking neutrino observations, and so on. Such instruments are far outside the ability of individuals to pay for. Since many people want to use these few instruments, few actually get to do so. Certainly I would not be able to get time on the Hubble to reproduce observations that I was interested. And it'd be fun to try to reproduce the observation program that lead to the conclusion that the universe is expanding at an accelerating rate.
Do we, then, conclude that conclusion is not science because you and I (bright and interested people) cannot reproduce the observing program ourselves? Seriously, there's no way they're going to let us have the hours (thousands of hours?) of Hubble time to explore distant supernovae. Same as they're not going to let us take over the Large Hadron Collider at CERN, or the accelerators at Fermilab or Argonne (which even have the advantage of being close to where I grew up) to look at whether I think the experiments really support the existence of quarks. Nor will we get to charter an ocean drilling ship (closer to my fields) to reproduce paleoclimate results. Not even to drill our own core on Greenland, where some interesting science is to be had, and where the two major cores in hand (GISP and GRIP) show some problematic differences. (That's all well-known to people in the field, and they have some good ideas as to what they'd do next, if they got to go out again and drill such long cores.) All areas that would have to be called 'not-science' if the standard were that anybody, anywhere had to have access to the equipment to reproduce the experiments.
Some will hold that hard line, in which case little science has ever been done. That looks strikingly contrary to what we see in the world around us. So let's consider stepping down the requirement some. That, even if the equipment to collect the data doesn't have to be available to everybody, the data themselves do have to be available to everybody.
But what does 'available' mean? Astrophysics collects at least petabytes (1000 terabytes, 1,000,000 gigabytes) of data. Climate, thanks to satellites, does as well. Particle physics experiments throw away vast amounts of data in order to trim their storage down to that order. And so on through, no doubt, many fields these days. The data exist, but where will you put it? And who is supposed to copy it for you? It'd take eons to download 1 petabyte to my desk at home, and I'd have to by 1000 disks of 1 terabyte each (even at $100/terabyte, that's $100,000 -- plus making sure I had enough power to my house to spin them up) to store them. Or to have the data source copy everything for me (are they working for free? $100 per disk drive, plus someone's time to verify that the data were copied correctly and so forth ...). Either way, even though the data are available -- for a price -- that price might still be too high for us as individuals to pay, or for the source we're trying to get the data from to be able to provide it. Again, all large data set science would have to be ruled 'not science'.
Suppose, though, that we're concerned only with some topic that the data volume is modest -- a matter of gigabytes. I have downloaded that much to home myself. Sometimes I can be patient. Does that mean the data are really 'available'? For me, since I already understood the data format, yes. But data formats are many and varied. People familiar with GRIB aren't fond of HDF, and vice versa. While I've worked out how to use GRIB, I don't have that familiarity with HDF. Astrophysics has its own formats, and particle physics has (probably) still others. Each field has good, and some not so good, reasons for using the data formats they do. Certainly every format has its fans. But it means that just having a copy of the data file on your desk does not mean the data are truly 'available' to you. You also need to be able to decode the data in to some usable form. So here's another opportunity to declare an area 'not science' -- the data formats aren't easy for the non-professional to use. Again, this would affect most science that's been done since computers came in to use.
Maybe you have a friendly data provider and they also give you a program to decode the data. Are you home free? If you've read this far, you know how this goes. Of course not. The decoder is a program and runs only on certain computers, or is commercial and you have to buy it, or you can get the program itself, but you have to buy a commercial product to run it, or .... And, of course, the commercial software only runs on certain computers. (I've encountered all these at work as well.) I understand, from my reading in Science and Nature, that this problem -- of commercial software being used, and required, in doing scientific studies is particularly acute in biology. And it raises, again, concerns about reproducibility. Do we conclude that biology is not science? I don't think so, but there are certainly concerns about how the science will progress when relying on black boxes provided by companies that may not be in business 5 years from now.
Some data have restrictions. Hank Roberts in a comment mentioned this regarding the Arctic Ocean. It's much more general than just Arctic Ocean concerns. Biotechnology firms want to protect their assets -- which may include genes, how to conduct experiments, .... Nuclear physics not only has commercial concerns, but national safety concerns. If you want to set up nuclear experiments in your basement, you'll probably have some visitors from national or international security agencies.
Even things as 'minor' as observations of the sea surface temperature or air temperature have their commercial concerns. Ships do collect temperature (sea and air) observations. Many share that data with their national weather services. Cargo ships tend not to be concerned about other cargo companies knowing where their ship is (as far as I know). But that's not at all true of fishing vessels. There's substantial commercial advantage to knowing where the fish are -- and knowing where your competitors' ships are can be a clue to that. Consequently it can be very difficult to get fishing vessel captains to reveal their location information -- and a temperature without a location is no good for science or weather prediction. On top of that, in the 1980s and 1990s, many national weather services (Canada, New Zealand, European Union nations, ...) were partially privatized. One of the aspects of privatization was, surprise, charging -- some people -- for data that had formerly been free to them.
Moving along to programs ... even theoretical work in astrophysics, as for physics, many areas of biology, climatology, and so on, uses computers to test theoretical ideas. More straightforwardly, modelling requires computers. All prior comments about programs to decode data apply to us being able to run theoretical and modelling programs. On top of that, all comments about major facilities applies to the major programs. If you want to run one of the major galaxy interaction programs, or the evolution of the early universe, etc., you need not only the source code (and libraries, and ...), but a big enough computer. One standing truth is that no matter how big and fast today's computers are, scientists and engineers (I first heard this from one of my Electrical Engineering professors) will want to solve a problem slightly bigger than what those computers can handle. So, although the computer on my desk at home is extraordinarily powerful compared to what I had 20 years ago, it is still dwarfed by the supercomputers used today for the major astrophysical programs. Ditto, of course, climate programs, fluid dynamic programs for aircraft design, and on through science and engineering. As for the major observing instruments, time on the world's biggest computers is limited. If you have to be able to run the program yourself, in exactly the configuration of the research paper before something is 'science', then nothing more involved than can be run on any average home computer can be considered 'science'. For climate modeling, that would mean something like the EdGCM is the limit of what could be considered scientific modeling -- a climate model with about 800 km resolution in the horizontal, and only 9 layers in the vertical. Modern climate modeling is more like 100 km in the horizontal and 64 layers in the vertical -- for just the atmosphere. Then it adds active ocean of similar resolution, and so on.
It seems a strange idea to cripple science down to what home amateurs can reproduce themselves. Certainly it's against the history of science. In our tour of where some of the obstacles are with that idea, I also didn't see much where there was an obvious reason that we should reject as science astrophysics that uses larger data sets than the few Gb I might download, and models small enough for me to use. Nor that we should throw out much of biology and nuclear physics where there are restrictions on the data and on the conduct of experiments. But maybe some readers see why science should be limited in these ways.
Rambling longer usual. I'm prompted in this by wondering what pedant-general might mean in response to the original article there, which links to my earlier note about data set reproducibility. I've asked the question there as to what the commenter means. I've seen other people mean some of the things I considered extreme above, so that alone is worth a bit of thought.
I do still think that the widest possible degree of reproducibility is a good and important goal. I've also worked with people who were working to make data distribution faster and easier (automatic reformatting, letting you extract only the subset you want, providing data analysis tools online, ...) so as to get around things which used to be major obstacles. For programs, I do encourage, and try to practice, writing them so as to be easily transported. And so on. While the expanding scale of science has made for new problems, new tools usually show up to cope with the expanded scale.
Plus, of course, I like the idea of being able to reproduce important, and interesting to me, experiments or analyses from the history science. And I'd like everybody else to be able to as well. More fun to do my own versions of things, but I do like history, so some historic reproductions are nice.
In the mean time, though, it seems for now only climatology is being attacked over its 'nonreproducibility'; but the fundamental argument applies throughout science and engineering. If climatology is to be trashed for this, then all the rest of science and engineering follows in time.
You are Here »
Home
»
doing science
»
What should be reproducible?
0
What should be reproducible?
Category → What should be reproducible? » doing science » Grumbine Science