Hey You… Take a Sad Estimator and Make it Better: The Rao Blackwell Theorem

Image Credit: Bureau of Land Management, CC BY 2.0, Image Cropped

A common goal of ecologists is to understand the population abundance of a particular species. We might be looking for the California condor as part of assessing how well the recovery project is going. This requires some field work, going out to a variety of sites and counting animals that we see. How do we choose which sites to go to?  Even in the era of camera traps, we still need to know where to put our extra set of eyes. It would be a shame to have a particular camera not get any action due to an unlucky placement. We don’t have infinite time and money after all!

If a species is fairly prevalent, a random sample of sites might let us see plenty of animals. However, we know species distributions are rarely even across a given region. More likely the species is a bit more rare in certain areas (especially for our critically endangered condor) and/or individuals tend to cluster together. A random sample could lead us to a bunch of unfruitful site visits, despite the fact that the species is quite common in other areas close to those sites.

Adaptive Sampling

But what if we had a little information about the uneven distribution of condors? Adaptive sampling methods allow us to incorporate information about the structure that we’ve observed so far to help us decide where to sample next.

We head out, starting with a small random sample of sites to visit. For every site that we see at least one condor, we also sample all of the site’s neighbors. We keep doing this until we fail to see a condor at any of the previous sites’ neighbors. At that point we’ve reached the end of the cluster. We refer to sites that don’t have any condors but who are in the neighborhood of a site that does as an “edge unit.”

Now we have more information about sites where condors are actually present, but it comes at a cost. The sample mean abundance or even the mean of cluster means can be biased under this type of adaptive sampling design (since it’s no longer completely random). What do we do with this data now?

Luckily there are other estimators of abundance which account for this bias out there. (Want the details? Check out this review.) A simple estimator takes advantage of the fact that our sampling started with a small random sample. We could consider the sample mean of this starting sample as a simple estimator of condor abundance. But we went through all this trouble to collect additional data using a new sampling method. Can we do better than this simple approach?

The Rao-Blackwell Theorem

Now it’s time to switch gears and learn about some statistics theory. I promise it’ll be (relatively) painless. There is an important theorem that tells us how to improve estimators of a particular parameter of interest. In our case, the theorem will help us find a better way to estimate abundance than taking the sample mean of our starting sample. Sign us up!

The Rao-Blackwell theorem tells us that if we have an estimator, then we can obtain a new estimator that is never worse than the original. How do we do that? We take the conditional expectation of the original estimator given a sufficient statistic T. This becomes our new, Rao-Blackwellized, estimator.

That sounds great, but what is a sufficient statistic? Informally, a function of the data T is sufficient if we can’t learn anything more about the parameter of interest from the distribution of the data if we already know what T is. For example, if we are trying to estimate the population mean, we could do so using only the sample mean as our T. We wouldn’t need any of the original sample data to make a decision about our estimate, hence the sample mean is sufficient.

Calyampudi Radhakrishna Rao and David Blackwell, whom the theorem is named after (Image Credits: Prateek Karandikar, George, M. Bergman, CC BY-SA 4.0)

What do we mean by an estimator being better? The new Rao-Blackwellized estimator will have a mean-squared error that is less than or equal to the original. In fact, more general versions of the theorem even let us pick our favorite loss function (as long as it only has global optima, which many commonly used ones do), and this is still true. Score!

In our adaptive sampling example, the sufficient statistic is the set of unique observations, labeled with their site ID. In our data collection process we might revisit particular sites if they are neighbors of multiple condor sightings. We don’t need information about the double counting to help us estimate average abundance, hence the unique observations are sufficient.

Investigation of the benefits of adaptive sampling over random sampling show that the efficiency gains (less work for more information) depend on whether the within-network variance is large enough. Since we often expect large variability in the abundance of a species even within clusters of sites, this is good news for ecologists.

Why is this theorem so important? It basically means that if we design an estimator, even if it’s a wild guess, we always have a concrete way to improve it. This is especially helpful if finding even a starting point for an estimator is hard. Think about how complicated our new design is; we need all the help we can get.

Let’s close with some info on the people behind this theorem. Both of the theorem’s namesakes are powerhouses in the field of statistics. Calyampudi Radhakrishna Rao is an Indian-American mathematician and statistician who has won a variety of awards including the prestigious National Medal of Science (he also has a bound named after him). Read more about him here. David Blackwell was an American mathematician and statistician with his own set of accolades including being a member of the National Academy of Sciences (he was the first African American to be included). Learn more about him from the transcript of his oral history.

Have a quantitative term or concept that mystifies you? Want it explained simply? Suggest a topic for next month →  @sastoudt