Episode 94: Error and Uncertainty in Science
Recap: What a scientist means by "accuracy" and "precision" are usually different from common, every-day use. The process of formal error analysis and their propagation to a final formal uncertainty applied to a final result is also generally not well understood. This episode discusses all of these, focusing on different types of error in scientific data and experiments.
Puzzler for Episode 94: The equatorial circumference of the earth is 21600 nautical miles, 24901.55 miles, or 40075.16 km. What is the circumference of the moon in nautical miles?
Answer to Puzzler from Episode 93: There was no puzzler in episode 93.
Q&A: This episode's question comes from Ken S. who asked about asteroid Apophis and it hitting Earth or not. That NASA was giving certain odds for it hitting in 2036, 2037, and 2068, and then those odds changed based on the flyby of Apophis in 2013. After that flyby, NASA was listing the odds at practically zero that it would hit in 2036, and Ken wanted to know how NASA could calculate that and whether they really knew what they claimed they knew.
Ken asked this back in July, and I've held onto it for this episode because it all has to do with measurement errors in determining orbits. Instead of answering in a specific way based on all the types of error discussed in this episode, I'm going to explain it as a "for instance."
Let's say you observe a hot air balloon floating by. You make three separate measurements, all 1 second apart. Based on those, you extrapolate a path. You might be able to predict where it will be in 30 seconds pretty well. But, where it will be in 30 minutes is going to have a relatively large uncertainty because the uncertainties in the measurements you made, while small on the short term, grow larger in the long term. But, if you make another measurement in 10 minutes, you have just shrunk your uncertainty and can extrapolate to longer periods of time with smaller amounts of uncertainty.
The same goes for orbits. While you want a lot of observations of an object, you also want them to be spread over a long range of time. You especially want to be able to observe it after it will be perturbed a large amount, like by a relatively close encounter with a planet, like what happened with Apophis earlier this year. This allows you to better constrain the orbit, knowing the orbital elements to greater accuracy and more significant figures, and allow you to project the orbit further into the future with a reasonable idea of where it will be.
But, you still have an inherent uncertainty from only knowing the orbital elements to a finite level. Not only that, but after a point, our knowledge of fundamental physics and uncertainties associated with that will also play a role. We only know the Gravitational constant to about 6 significant figures, and the last two are uncertain, meaning the relative uncertainty is about 1 part in 10,000. So, if we can observe Apophis enough to know orbital elements to better accuracy, then the uncertainty in gravity will start to affect how well we can know the orbit at an arbitrary time in the future.
What this all boils down to is that the very precise measurements of Apophis taken during its close flyby six months ago allowed scientists to constrain the orbit well enough that the uncertainty ellipse of its orbit, and where it will be in 2036, is such that there is less than 1 in a billion chance of it hitting, which is why it is no longer listed on watch lists for that year, though Michael Horn is still trying to scare people by saying that Billy Meier says it'll hit. In 2068, the probability is still a tad high, with one chance in about 260,000. With more measurements, the uncertainty ellipse will be better constrained, it will shrink, and either the impact probability will go up if it shrinks towards Earth, or go down as it shrinks away from Earth.
- Audio Clips Used
- Relevant Posts on my "Exposing PseudoAstronomy" Blog
Claim: As with the last episode, this is more of a basic science one that does not deal with any specific claim, but the basics of it really come into play in practically EVERY episode of this podcast. What I'm going to cover is the difference between error and uncertainty in science, what the difference is between accuracy and precision, and what the different types of errors in science are and how they contribute to estimations of uncertainty.
To do this, I'm going to first describe an experiment, one that was actually a lab that I taught as part of an introductory astronomy class several years ago. It was called the "Eratosthenes Challenge," named for the Ancient Greek man, Eratosthenes, who was one of the first people to estimate the circumference of Earth. He did this by measuring shadow lengths, at the same time, in two cities separated by a known distance.
For the lab, we did it in reverse: Students walked between the campus observatory, which was at a known latitude, and Baseline Road in Boulder, CO, which runs on the 40° line of latitude. Based on the latitude difference, the percentage of a circle that difference was, and how far they measured that they had to walk, they were to calculate the circumference of Earth. We calibrated their step size by walking to an American football field on campus and seeing how many steps it took them to walk from one end of the field to the other, which was 100 yards.
After the lab was all said and done, we combined everyone's data from two lab sections and got a circumference of 25,000±2,000 miles, or about 40,000±3,000 km. Note that this is the POLAR circumference, which is currently estimated to be 39,941 km, or about 24,818 miles.
The questions to answer in this episode are: What is the accuracy of our data, what is the precision of our data, what are the sources of error and the kinds of error, and what is the uncertainty? A lot of these are, unfortunately, like the word "theory" and "hypothesis" -- they're used interchangeably by most people but mean very specific things to scientists.
Accuracy is a measurement of how close a measured value is to the "true" (or generally accepted) value. If there is no known value yet, then it is impossible to measure accuracy. For example, if we did not know the circumference of Earth, then the Eratosthenes Challenge lab cannot have a measured accuracy. Since we do know Earth's circumference, then I can qualitatively state - as in, give a non-numerical statement - the accuracy of our work was pretty good since the aggregate results match the known value to within our uncertainty.
Precision is something different, and it can give you a measurement of uncertainty and error. Precision is how well multiple measurements agree with each other. For example, if one student got a value of 25,000 miles and another student got a value of 1,000 miles and another got a value of 42,000 miles, then those would not be very precise because the values differ significantly. If instead, three students got values for Earth's circumference of 25,000, 23,000, and 26,000 miles, then those would be more precise.
To use a different, perhaps more common analogy, one can think of a target that you shoot 6 arrows at. If your arrows are all very close together, regardless of where they hit, then that is very precise. If they are all over the place but still would AVERAGE to hit in the middle of the target, then they are not precise but are considered to be very accurate.
It's a bit unintuitive, that you can be precise but not accurate, or accurate but not precise, but that's how the definitions in science for these terms are.
With that in mind, the idea of experimental error comes into play. An error is considered to be anything that makes your result less accurate and/or precise, and there are several different kinds of error. There are really two primary kinds of error in any experiment, though there are two additional sub-types.
The first main kind of error is called a "systematic error," and it happens when there is something that, no matter how many times you repeat the experiment, you are still going to be off in the same direction by some amount. In the Eratosthenes Challenge, an example of systematic error would be that we could not start exactly from the telescope dome that has the latitude known, we had to start a little in front of the building. That is going to affect our results because there will be a certain offset - an error - that cannot be changed no matter how many times we repeat the experiment. In the archery with a target example, my archery when I took the class in college always had a systematic error -- no matter how I moved my body or how I moved my sighting ring, I was always a bit to the left of center; I was highly precise, but I had low accuracy. If you've heard of a "bias" in data, the more formal name for them is a systematic error.
The second kind of error is called "random error," and it's random error that we try to decrease by repeating experiments over and over and over and over and over again. In the Eratosthenes Challenge, perhaps the clearest source of random error is variable step size since that was our measurement unit. Not only do people not take exactly the same stride length, but hills and valleys will vary the degrees of latitude that your stride takes you because of the slope. But, the idea of the random error is that it is random, and that over time and multiple repeats, those random errors will average out, and you can get a more accurate result. Not only that, but by taking the results from multiple people, you an hopefully get an even more accurate result.
Many times, scientists are most aware of the random errors and least aware of systematic errors. It is the systematic errors, especially, that lead to different results between research groups that they fight over, or that independent reviewers during the peer review process try to identify that might throw off your results.
For example, there was the announcement a few years ago by CERN that the Large Hadron Collider appeared to show that neutrinos could travel faster than light. They tried to reduce all the random errors they could, search for any systematic errors and correct for them, but in the end, they still appeared to find that neutrinos could travel faster than light. Later, they found that the problem was a cable that wasn't plugged in all the way, and the difference in length in plugging it in all the way negated the effect. That is a systematic error because no matter how many times they repeated the experiment, they were still off by a certain amount. This is why systematic errors are perhaps the worst kind because they can be the hardest to find or consider, no matter how much you think you've removed or corrected for them.
With those in mind, I told you that there are two additional sub-types of error. The first is "reading error," also known as "measurement error." This is where any measuring tool has a finite amount of granularity or fineness. For example, my kitchen food scale reads to the 1 gram level. Therefore, assuming everything else is perfect, I can only know the weight of something to ±1 gram, and that is my measurement or reading error. In the Eratosthenes Challenge, there are two sources of measurement error: The measurement unit was footsteps, so you always at least have a ±1 footstep error, and the position of the observatory and Baseline road were to 0.000001° latitude, which corresponds to about ±0.1 meters, or ±4-5 inches. In many experiments, reading errors are the smallest source of error.
The final sub-type is "calibration error," which is, strangely enough, how well something is calibrated. In the Eratosthenes Challenge, to get a circumference of Earth in real units like miles or kilometers instead of an individual student's footsteps, we went to the American football field and each student measured how many steps they took to walk 100 yards between goal lines. Any random, measurement, or systematic errors that fed into that measurement of number of footsteps per 100 yards gives you the calibration error.
As a more mundane example, this idea of calibration error is really how well the measuring instrument is made, so if you have a crappy Chinese knockoff scale it might have a fairly high calibration error, whereas if you have a good ol' 'Mer'can-made scale, it might have a fairly low calibration error, meaning you can trust the numbers it gives you more. These are usually quoted by the manufacturer. As another example of this, in episode 82 I talked about designing a hyperdimensional physics experiment with an Accutron watch. When originally manufactured, those watches had very tiny calibration errors, but over the last several decades and after the many modifications that Richard has had done to it, the calibration errors are likely much higher.
All of the errors combine and propagate through the experiment to give you a final uncertainty. In MOST circumstances in real science labs, the uncertainty is almost entirely made of random errors. Calibration errors should be tiny because you should be using good equipment. Same with reading errors. You should try to think of all systematic errors you can and correct for them so they don't play a role. So, in most cases, you are mostly limited by random scatter, also called "noise," in your own measurements of something because of innumerable things that you cannot correct for but can be diminished by repeating the experiment over and over again.
In very formal, very tightly controlled physics experiments where you are trying to do things like determine the mass of the electron, every single source of error is propagated through the experiment to give a formal confidence level on the end result. Usually this is expressed as Gaussian or "bell-curve" uncertainties, where 1 standard deviation means that's where you expect about 68% of the data to fall, and 2 is about 95%, and so on.
A different way of thinking about it would be that you arrive at a result, and give a 1-sigma uncertainty, and that means you are 68% certain that the true value, or the accuracy of your result, is in that range. If you've been able to reduce your errors, then that range is going to be very small. If you have large errors, then that uncertainty range will be very large.
In the Eratosthenes Challenge lab, after combining results from two different lab classes for a total of 41 different measurements, assuming that we were dominated by random error, we got a value of 25,000±2,000 miles, or 40,000±3,000 km. That "±" is the uncertainty, 1 standard deviation, given the range of values (the "random error") from 41 students. And, since that range does encompass the true value, I can say that our results, overall, were accurate.
Another part of what I just did, implicitly, is round the results to two significant figures. That's because I don't think the results are really precise enough, from this lab, to really measure the value to better than 1 part in 100. There are very formal rules for significant figures, which I never learned, but really I just try to say, "use common sense."
I once had a student turn in a lab where they had to calculate the mass of Jupiter using Kepler's Laws and moons orbiting Jupiter. The answer they gave me for the mass of Jupiter was -0.0523461234 grams. Besides making absolutely no sense to not only have a negative mass but also have a mass of less then 1/10th of a gram, the student gave me every digit their calculator spit out. And yet, there was no way in that lab for them to be able to have that kind of accuracy, to 1 part in a billion.
One could look at all the digits that programs are more than happy to give you. For example, the actual average of the values from the Eratosthenes Challenge lab was 24,688.4±2,458.9 miles. Given that the true polar circumference is 24,818 miles, one might be tempted to use three significant figures and say that our average was just 100 miles off from the true value, or 4 significant figures and say we were 130 miles off. One could say that, though I think that you have to be careful about doing so because the lab methods can't yield accuracies to four or five significant figures.
But with that in mind, I have an obligatory Coast to Coast AM clip. The researcher is Stan Deyo. George Noory reads his bio as stating that he is a "research physicist [and] author ... [who] has worked undercover for the FBI, as well as holding top-secret security clearance ... [and worked in an] exclusive black project that specialized in flying saucer technology, and he is an expert when it comes to Earth changes." Stan is a frequent guest, and he was on the show on May 2, 2008, to discuss Nevada Earthquakes: [Clip from Coast to Coast AM, May 2, 2008, Hour 1, starting 05:55]
"As I was zooming down in from the altitude that they have the map on Google Earth, I realized I was looking at a regular pattern, like a grid square, where these earthquakes occurred. Right over the ... Reno [NV, USA], where the quakes were occurring. And I thought, now THAT must be an anomaly, some kind of computer sensor error or something. Surely they wouldn't have earthquakes that are regular in square patterns. I mean, like uh, 19 squares? You know, perfect squares with the same sides? No, it couldn't be. So I started going all over the planet looking at other Google Earth plots, there was about 1500 earthquakes to look at, and I scanned the hotspots and I couldn't find any of them with the same characteristics. So I went back and then I started looking at each individual dot, zoomed down close where that earthquake was, and in that grid, on the vertices in that grid ..., it's like a tic-tac-toe board, it's where the lines cross in it, is where these earthquakes all occurred at regular intervals. And so I started clicking on each one and found out it wasn't one earthquake in each of those places, it was sometimes 30 or 40 or 50 in that one spot! And so I thought, 'Wow,' and I said, 'Which ones of these vertices, in this regular grid, has these enormous numbers of earthquakes?' ... It was like the majority of these earthquakes that got triggered at these gridmarks were along a spot that they wanted to hit ... there's intelligent design behind it is what I'm trying to say. Someone has done something to make this earthquake series happen, and they've been pounding the heck out of these grid marks to make this bigger earthquake occur in-between these vertices. Once I realized that, it's obvious obvious obvious something is wrong here."
After that, the host and Stan went on to speculate about weather warfare and exotic technology to cause earthquakes.
As is hopefully obvious to you all, given the context of this quote in this episode, this is a very obvious case of rounding and significant figures. On Earth, if you round the latitude and longitude location of something to, say, just the 1/1,000th of a degree, then your resolution is only 36 feet, or about 11 meters. If you round to the closest minute, which is 1/60th of a degree, and don't include seconds, which is 1/3,600th of a degree, then your resolution is only 1.1 miles, or 1.8 kilometers. Now, in fairness, I seem to recall that Stan came on many, many months later and just as a side-note mentioned that this was the case, that he had been wrong.
BUT, it illustrates very well how this stuff plays out and why it really does matter. And how, if you're already somewhat conspiratorial, you may not look for the obvious explanation.
And, the same is really true for this entire discussion: If you don't have a good understanding of how science is done, of how your own measurements may not be as certain as you think, then your results can be incredibly off the mark and, combined with an already conspiratorial or pseudoscientific mindset, can easily lead to many of the topics I discuss on this podcast.
Provide Your Comments:
Comments to date: 1. Page 1 of 1.
Ivo Kisselov Sofia
10:40am on Friday, October 23rd, 2015