A simple form of The Central Limit Theorem:

For identically independent distributed random vectors ; let .

If , .

When I first learn the central limit theorem, I’m always curious why the sum of a large number of random variables is distributed approximately as a normal/Gaussian distribution. The answer is simple, that’s where normal distribution comes from. Normal distribution is found when we first try to find the limitation of the sum of a large number of random variables. In other words, normal distribution is defined as the limit distribution of a large number of random variables.

Firstly, using the i.i.d. condition, it’s easy to check that and .

Before we move on, let’s review the basic idea of entropy. The entropy of words is . Entropy is a measure of uncertainty. The less information we have about something, the larger its entropy will be.

Now, ignore the central limit theorem for a while, let’s image what the limit distribution of a large number of random variables can be.

Note that is a sample mean. The more we “mean”, the more individual characters will be lost. Mean will hide individual characters, make individuals indistinctive, and hence reduce the information we have. So it’s natural that the more we “mean”, the larger its entropy will be.

OK, that’s enough. From the point above, we can expect that the limit distribution has a large entropy. In other words, the normal distribution should have a large entropy.

The fact is that:

**Theorem 8.6.5** ([1], P254)

Let the random vector have zero mean and covariance . Then , with equality iff .

**Theorem 8.8.6** ([1], P255, Estimation error and differential entropy)

For any random variable and estimator , , with equality if and only if is Gaussian and is the mean of .

**Corollary:** Given side information and estimation , it follows that .

**8a.6** ([2], P532) as a Distribution with Maximum Entropy

The multivariate normal distribution has the maximum entropy

() subject to the condition that mean and covariance are fixed.

For given mean and covariance, the Gaussian distribution is the maximum entropy distribution. “It gives the lowest log likelihood to its members on average. That means Gaussian distribution is the safest assumption when the true distribution is unknown” ([3]). That also explains why many people tend to “abuse” Gaussian assumption so much.

**References:**

[1] Elements Of Information Theory, Second Edition, Thomas M. Cover, Thomas M. Cover, 2006

[2] Linear Statistical Inference and its Applications, Second Edition, C. Radhakfushna Rao, 2002

[3] A Short Introduction to Model Selection, Kolmogorov Complexity and Minimum Description Length (MDL), Volker Nannen, 2003

Bruce, is it possible to prove the Cramer-Rao Limit using calculus of variations?

Maybe, I don’t know. I know only a little about calculus of variations. There is an old book on the applications of calculus of variations of Statistics (Variational Methods in Statistics by Jagdish S. Rustagi, 1976). For C-R bound, most proofs I’ve seen is based on Cauchy-Schwarz inequality.

This website really has all of the information and facts I needed about this subject and didn’t

know who to ask.