## Bed Time Stories

First of all, shouts to my “peeps” (meine Leuts!) who faithfully read this crap. Especially Bess, who gets her name mentioned right back. But you’re seriously gonna struggle through this one. I don’t know where I got it.

Sometimes when I jump into bed, instead of falling asleep, my mind just starts rattling off all the crazy random stuff it’s been turning over all day. Or week. Tonight is one of those nights, and I figured I’d let it out, since it’s more of a pondering point than a thought. It’s a little musing in the arena of statistics (hey, I loved Discrete Mathematics in college!). Here’s the story.

### Background

I took my little GPS receiver with me to Europe, and as a result, I got some pretty nifty logs. All over the places I went, I’ve got an overlay for a Google map that shows me exactly where I was and when. I suppose to some, that’s kind of creepy, but moving on.

Because GPS was developed by the military for, um, military purposes, it’s actually quite an incredible system. Wikipedia has a lengthy article on the technical workings of GPS, but the key thing is this: most civilian GPS receivers in the last few years are accurate to a few meters - about 16ft. If you’re really geek, the Wikipedia article also explains in depth why civilian GPS measurements are inherently inaccurate.

The GPS receiver then generally knows exactly where it is at any given moment in time - you just have to ask it. So along with the receiver, I brought a logger. The logger’s job was to record where the GPS says it was at any given moment, and this is where we get to the statistics. Since it would be very technically challenging to record absolutely every positional data point the receiver can produce, we have to sample it, much like a digital audio recording is actually a sampling of an analog signal.

### The Experiment

I setup my logger to poll the receiver at regular intervals. Depending on where we were going and what we were doing, I varied that poll rate from one second to five minutes. Obviously the more frequent polling times produce more accurate logs, but they also result in significantly more data. The flight from Atlanta to Zürich, for example, at a one-second poll rate would have produced over 34,200 data points! But since we were traveling in a relatively predictable manner, I scaled it back to just a few hundred data points.

Okay, I’m getting to the point, I promise.

Generally, this means that a GPS receiver can report its current location with an error of up to 16 feet. And that 16 feet could be in any direction, as if the receiver were in the center of a 32-foot diameter Bubble Boy hamster ball. But the reverse of that statement is also true: a receiver may report its location with an error as little as zero feet - it could be dead on the money. I observed this effect in practice by looking at my GPS logs superimposed on a map. Sometimes the log point was exactly where I remember standing, and sometimes it was a ways off.

I used my GPS sometimes to measure how far I had traveled that day as well, sometimes walking as much as six or seven miles in one day. But the idea that the GPS is often inaccurate by a meter or more got me to thinking. With all that information in mind, consider this:

### The Question

While on an excursion, would the GPS location samples, over time, average out to a spot-on measurement of distance traveled? Would the accuracy of the GPS samples increase with more frequent samplings or less frequent samplings? Would the samplings average more accurately over time, grow infinitely more exaggerated, or would the error level off in time, reaching a point of diminishing inaccuracy? Finally, what impact would my route have on the GPS’s measurements? Would a straight line path (such as an overseas flight) produce more or less accurate averages than a route that randomly meanders through the streets of an open market?

My comments after the break, but please, leave yours!

### The Hypothesis

My theory is that there are quite a number of factors involved. I think we can safely agree that GPS receivers have what I will call a Range of Error (ROE). A receiver can have at most a 0% margin of error, therefore the ROE varies from zero at one end to some known value at the other (16 ft.). This gives us a number line to plot sample points on, using their known margin of error as the value (0-16). If we assume a standard deviation and distribution, we can expect the mean to fall around eight feet of error, with about 50% of the points in the middle two inner two quartiles (4-12 ft. error range). So we will assume for the next bit that each GPS location is off by eight feet.

The next bit of assumption depends on the path we take. If our trip is a purely linear, straight line progression from point A to point B, then we can conjecture that the longer our trip is, the more inaccurate the distance measured by the GPS. This is because in a linear path, there are exactly two positions that the GPS can err to that are either a location we have already passed through or a location we will soon reach. These two points can be negated since they effectively cancel each other out (in theory, in a large enough sample size). We can probably assume that the majority of samplings, however, will not result in one of those two points. For every sample that does not fall dead on our true path, the sampled location is then “out of the way” - our path’s distance must be increased to be routed through that point. Much like the two short sides of a triangle together are always longer than the longs side by itself, our total distance measured for the trip is likely to increasingly deviate by a small amount from the actual distance. On a short route, this might amount to a few inches at best. On a longer route, however, this could add up to a much larger amount, possibly reaching a statistically significant error.

Although conversely, you could argue that because the error increases linearly with the actual distance, the margin of error is than constant. We could then determine if the margin of error is statistically significant for any trip (assuming, for example, that an error of one mile on a 100 mile trip is no less acceptable than an error of one foot on a 100 foot trip).

However, if we are measuring the distance traveled for a non-linear path (a random, for example), then we can suggest that every possible error point has an equally opposite error point that would cancel it out, thus negating all error entirely. For example, assume a trip that followed an arc from point A to point B. The erroneous GPS location is just as likely to err 16 ft. to the inside of the arc as it is to the outside of the arc. Erring on the inside would result in a path that is marginally shorter than the actual path. Erring on the outside would produce a path that is marginally longer than the actual path. Hence, a random path will most likely reduce it’s margin of error in measuring distance traveled as the trip gets progressively longer!

### The End

So that was a most ridiculous thought, and now that I’ve gotten it out, you too can wonder what the hell my brain runs on, because I doubt it’s running on sanity!

-- Weather When Posted --

Location: Atlanta, De Kalb-Peachtree Airport

Temperature: 42.8°F, Humidity: 57%