Let , be iid random vector with density or probability function , where is the unknown parameter and suppose the true value of is .
The likelihood function:
The logarithm likelihood function:
Note that these two functions are actually joint probability functions of the iid data ; we call them “likelihood” functions instead of “probability” functions because we now consider them as functions of the unknown parameter .
The maximum likelihood estimation is found by maximize the likelihood function, and we will show the idea behind this procedure.
Jessen inequality states that for a concave function , for any random variable . is concave, so under ,
That is to say, , since .
Now a natural criteria for finding is to find the parameter which maximizes .
However, the function we want to maximize is unknown because has something to do with the unknown .
But note that only appears in the mean operator , we can overcome this problem by using sample mean operator instead.
Finally, we reach that we can find a estimator by maximize .
There will be a loss when using sample mean to replace the real one, and the quality of this estimator depends on the law of large numbers.
(Update: 2012/Feb/17) We want to maximize , or equivalently, letting
The above formula can be interpreted as we want
1. , or as large as possible,
2. , or as small as possible.
These two points implies that we hope our estimation can make the possibility of the observed data as large as possible, and at the same time, we want some kind of stability of that possibility, i.e., we hope the rate of change in – that is , as small as possible.
A Course in Large Sample Theory(Lecture notes), Xianyi Wu
A Course in Large Sample Theory, Thomas S. Ferguson