| Back to Walden
Two Fan Site homepage
by John Shannonhouse[This was originally a posts on a listserv, copied with permission]Hello, I have talked with Richard and others and thought a lot about what consitutes good science and bad science across a large number of fields (both "hard" and "soft"). IMO, some fields have a high tolerance for bad science and/or give too much credit to conclusions that would be considered preliminary or tentative in other fields. I don't like semantic confusion, so I looked up the word, "empirical," to make sure we are all on the same page. As it turns out, I had been told (wrongly) that empirical meant observation of data independent of theory, philosophy, bias, etc. in controled, reproduced experiments. In fact, controls are not necessary to be empirical. I wish to substitue "scientifically rigorous" for "empirical" in my last post. By scientifically rigorous, I mean that the observations are recorded in an unbiased manner and always include as complete a description as possible of conditions under which the observations were made. Experiments need to be done with all possible controls. If for some reason a control is not or cannot be done, its absence should be pointed out (the failure to mention such caveats* is the largest failing I see in experiments done in my field; unfortunately, researchers are often hammered pretty hard for caveats they mention, so mentioning them is discouraged; I always mention them and have suffered considerably as a result). Results must be reproducible to be believable. Sometimes the nature of the experiment prevents reproduction (e.g., it is very expensive). IOW, scientifically rigorous is empirical, controled, reproduced and includes all caveats. Being rigorous is distinct from being robust. Rigor refers to method and means carefully or properly done. A robust result means that it is the same or nearly the same in every trial or case (reproducible, having a very low p-value**, consistent with other data). In the case of experiments where controls or reproductions cannot be done, the researcher must either always mention the caveats or can rely on a convergence of evidence (see below). *Caveat- from the Latin caveat emptor - "may the buyer beware." This is any reason why a conclusion cannot be entirely trusted. It is often a control that has not been done for whatever reason or that an experiment has not yet been (or may never be) reproduced. It is sometimes a flaw in the experiments performance (e.g., the temperature was 2^C higher than it was supposed to be, but assuming that it doesn't make a difference, here is my conclusion). **p-value- a statistical term for the probability of randomly producing a result that varies from the expected result by that much or greater. For example, if you flip a coin 4 times and only get heads once, does that mean it is more likely to come up tails? The p-value of this result is 0.325, or a 32.5% chance of getting one or fewer heads. In most fields, p-values must be <=0.05 to be considered statistically significant. Keep in mind that a p-value of >0.05 may still be a real result and a p-value of <0.05 might be a result of random chance, though. p-values are probabilities, not absolutes.
<<Only the extreme physical-science ends of the science spectrum (particularly mathematics and to a lesser extent physics) live close to empirical evidence and logical conclusions. As one passes through chemistry to biology to the social sciences, the "scientific method" changes radically and becomes "much softer," less empirical and logical. There are so-called "proofs" in math; then there is strong "statistical" evidence in some areas of physics; but by the time one gets all the way along the spectrum to the social sciences (even the "firmer" social sciences such as experimental psychology), one is often talking about p <.1 or <p .05.>>The need to use p-values does not make a conclusion less rigorous. They make a good measure of how robust a conclusion is. If a data set came out supporting or refuting a conclusion based on chance (which often happens with such high p-values), then it should not be reproducible. My science (genetics) relies heavily on statistics to come to many conclusions. After reproducing an experiment, having large trials, etc., the p-values can get down to <0.001 (which means a less than 0.1% chance of seeing this data by chance alone, meaning that it is quite safe to say the correlation is real). This principle applies in clinical trials of a drug to show effectiveness and can apply in some "softer" sciences described below. When I was ripping on military scientists, clinical psychologists and geneticists in my last post, it was not about how robust their conclusion were (which is what Richard seems to be talking about). It was about either being sloppy with controls, ignoring scientifically rigourous experimental results and clinging to their expert opinions and stating opinions without mentioning that there is no direct evidence to support them (there is often indirect evidence, but this is usually on par with historical evidence). I had one professor who was great at mentioning caveats to what the prevailing opinions were (the "party line" he called them). In the clinical sciences, results are sometimes less robust than a researcher would like (not enough to be accetable to a straight biologist, perhaps). That is no reason why they cannot be rigorously done (what I meant by empirical in my last post).
<<"Evidence" in "softer" social sciences such as archeology and anthropology, and also economics, political science, etc. is even more imaginative--very different indeed from anything remotely acceptable in the physical sciences. And in the clinical "sciences" such as medicine and many aspects of psychology, even clinical "vignettes" and "informed opinions" are considered to have weight. And this is still "science.">>And this still causes confusion and more tangible problems. IMO, several sciences can be run in a more rigorous fashion than they are (even in hard sciences). There are some who would say murderers have walked away scot free because of an "informed opinion" without experimental evidence or mention of caveat. E.g., the doctor who says "He did not die as a result of poison. I have seen a large number of similar cases and this is an allergic reaction." when the more correct thing to say would be "It could have been poison, but it looks more like an allergic reaction. Here is why it looks more like an allergic reaction..." because evidence is consistent with poison, but more resembles allergies. This sort of expert opinion would allow an investigator or jury to consider other lines of evidence rather than believing poison is impossible, which brings me to: Convergence of the evidence - in order to be valid, a hypothesis must be consistent with all known data. Even when a hypothesis cannot be tested experimentally, it can often become upgraded to a theory with sufficient convergence of evidence. A hypothesis may make predictions that can be "tested" by comparing evidence from nonlaboratory conditions. Checking a hypothesis this way may require working from many areas of expertise. This falls under falsifiability (see below). Several hypotheses can explain the same phenomenon. They can be ordered from better to worse hypotheses using two principles:
Convergence of evidence is used when experiments cannot be done. It is the origin of the theory of evolution, various theories in cosmology, the earth sciences and other sciences.
Now for something different:
The above examples were taken from "Why People Believe Weird Things" by Michael Shermer. It gives an excellent outline of two debates that are scientifically resolved, but still burn on because of the side with untenable positions use these tactics (creationists v. evolutionists, holocaust deniers v. historians). It also contains describptions of common fallacies that lead to erroneous beliefs.
Here are some others I have seen:
<<No theory can be ruled out so long as it has some kind of evidence to back it up.>>It can be ruled out if it is falsified. There is evidence, for example, that protein is the genetic material. We know that protein is not the genetic material, though. There are other interpretations of the data supporting protein as the genetic material, and different experiments have shown that genetic material can be transmitted with out protein, ergo that hypothesis can be ruled out. There are famous examples of some evidence supporting all kinds of hypotheses that are known to be false:
However, as I said, a hypothesis must be consistent with all evidence, not just some. None of the above examples are consistent with all evidence, so we know they are false. That is the power of falsification. Well, I have written enough for now. I should get back to work.
John Shannonhouse
Previous Post Follows:Hello,
On 11 April 1999, Gale wrote: <<The results of a science court are (a) a set of agreed statements, (b) a set of opinions from the technical panel on statements on which there is disagreement by the parties, (c) supporting arguments by the interested parties, and (d) a summary by the moderator.>>Well, this sounds interesting. I have to state my opinion: There needs to be an itemized list of all scientifically rigorous evidence in the report from a "science court." Scientifically rigorous evidence should carry more weight than expert opinions. There are examples of scientifically rigorous evidence and expert opinions disagreeing even after clear-cut results are published. I know of studies in clinical psychology that clearly show that expert opinion can be of dubious value (e.g., studies show that treatments where treament time and cost are determined by the expert judgement of therapists are no more effective than treatments where guidelines cap treament time and cost). There are certainly no shortage of examples in genetics where long-held opinions across the field were overthrown by scientifically rigorous evidence (e.g., histocompatability is not determined by many genes as was originally assumed, but is based almost entirely on a single gene: the MHC locus). There is one example I can think of where many geneticists have not acknowledged that recent scientifically rigorous evidence flies right in the face of some expert opinions they are stating (and it was a simple experiment that should have been done long ago, too). Without either scientifically rigorous evidence (or a convergence of many lines of indirect evidence), expert opinions are just guesses. They may be better guesses than lay guesses, but they are still guesses. BTW, stating that evidence is scientifically rigorous does not make it so. I had a discssion yesterday when someone stated ex cathedra that it had been empirically determined that a soldier was no longer effective in combat after a certain length of time. I was suprised that I had not heard about any controled experiments on the subject, or even that the military had EVER done a controled experiment about a factor affecting combat effectiveness (as far as I can tell from my roommates officer's manuels, they normally rely on expert opinion supported by circular arguements and circumstantial evidence). When I questioned him about it, I found out that only one type of serviceman was "tested" (fighter pilots) and it was generalized to all type of combat servicemen (but not, interestingly enough, to non-combat servicemen), that no controled experiments had been done (the "experimental group" was Allied airmen and the "control group" was Axis airmen) and at least one other factor he mentioned could account for the difference in performence (I immediately thought of two others). The experts (military officers) had decided by fiat that this was the main factor accounting for the difference between the two groups and called it an empirical determination (this is typical of military "science").
<<We would select the issues, such as nuclear energy in space, find parties who strongly advocate different positions on the issues, find a technical panel>>What science questions are you planning on answering? This approach sounds like it can rapidly mutate into political/economic issues. Science questions might be:
(2) If so, what specifically are the harmful effects of nuclear energy in space? (3) What facts are not known that needs to be known and what experiments can be done to find out those facts? (4) What can be done to minimize or eliminate harmful effects of nuclear energy in space? What specifically do the measures do to minimize the harmful effects and to what extent do they reduce the harmful effects? (5) Is a given effect harmful?
Economic and political questions are:
Notice the difference between the two sets of questions. The first set, with the possible exception of #5, are all objective questions. They are about data, not what should be done. The second set's questions are subjective. Their answers are directives about what to do and the distinction between facts and opinions can esily become muddled.
John Shannonhouse Back to Walden Two Fan Site homepage This site was created by ex-member Nexus.
but does not necessarily represent the opinons or policies of Twin Oaks Community. |