Stochastic o and O symbols


Stochastic o and O symbols are the basic symbols for Asymptotic Statistics or Large Sample Theory.

(i) A_{n}=o_{p}(B_{n}): if |\frac{A_{n}}{b_{n}}|\stackrel{P}{\to}0.

sequence of random variables A_{n} is of smaller order in probability than a sequence B_{n}.

In particular, A_{n}=o_{p}(1), “small oh-P-one”, if and only if A_{n}\stackrel{P}{\to}0; so A_{n}=o_{p}(B_{n}) means A_{n}=Y_{n}B_{n} and Y_{n}\stackrel{P}{\to}0.

Example: X=o_{p}(1) means X\stackrel{P}{\to}0, and X=o_{p}(n^{-1/2}) means n^{1/2}X\stackrel{P}{\to}0, or X goes to 0 faster than \frac{1}{n^{1/2}} in probability(such as X=\frac{1}{n}).

(ii) A_{n}=O_{p}(B_{n}) : if given \epsilon>0, there exists a constant M=M(\epsilon) and an integer n_{0}=n_{0}(\epsilon) such that P(|A_{n}|\leq M|B_{n}|)\geq1-\epsilon for all n>n_{0}.

sequence A_{n} is be of order less than or equal to that of B_{n} in probability.

In particular, A_{n}=O{}_{p}(1), “big oh-P-one”, if for any\epsilon>0, there exists a constant M and an integer n_{0} such that P(|A_{n}|\leq M)\geq1-\epsilon for all n>n_{0}, A_{n} is said to be bounded in probability(or tight); so A_{n}=O_{p}(B_{n}) means A_{n}=Y_{n}B_{n} and Y_{n}=O_{p}(1).

It’s easy to see from the definition that O_{p}(1)=O_{p}(C) for any constant 0<C<\infty.

(iii) A_{n}\asymp_{p}B_{n}: if given \epsilon>0, there exist constants 0<m<M<\infty and an integer n_{0} such that P[m<|\frac{A_{n}}{B_{n}}|<M]\geq1-\epsilon for all n>n_{0}.

sequence A_{n} is said to be of the same order as B_{n} in probability.

Some facts:

o_{p}(1)+o_{p}(1)=o_{p}(1): If X_{n}\stackrel{P}{\to}0 and Y_{n}\stackrel{P}{\to}0, then Z_{n}=X_{n}+Y_{n}\stackrel{P}{\to}0. (example of continuous-mapping theorem)

o_{p}(1)+O_{p}(1)=O_{p}(1)

O_{p}(1)o_{p}(1)=o_{p}(1): If the sequence \{Y_{n},n=1,2,\cdots\} is bounded in probability and if \{C_{n}\} is a sequence of random variables tending to 0 in probability, then C_{n}Y_{n}\stackrel{P}{\to}0.

(1+o_{p}(1))^{-1}=O_{p}(1)

o_{p}(R_{n})=R_{n}o_{p}(1)

O_{p}(R_{n})=R_{n}O_{p}(1)

o_{p}(O_{p}(1))=o_{p}(1)

Lemma: Let R be a function defined on domain in \mathcal{R}^{k} such that R(0)=0. Let X_{n} be a sequence of random vectors with values in the domain of R that converges in probability to zero. Then, for every p>0,

(i) if R(h)=o(||h||^{p}) as h\to0, then R(X_{n})=o_{p}(||X_{n}||^{p});

(ii) if R(h)=O(||h||^{p}) as h\to0, then R(X_{n})=O_{p}(||X_{n}||^{p});

Result: For a random variable S, S=ES+O_{p}(\sqrt{Var(S})).

Proof:

We only needs to prove that (S-ES)/\sqrt{Var(S)}=O_{p}(1) or equally, for any\epsilon>0, there exists a constant M and an integer n_{0} such that P(|(S-ES)/\sqrt{Var(S)}|\leq M)\geq1-\epsilon for all n>n_{0}.

Let NS=(S-ES)/\sqrt{Var(S)},

According to Markov inequality, P(|NS|\leq M)\geq ENS^{2}/M^{2}=[E(S-ES)^{2}/Var(S)]/M^{2}=1/M^{2}\to0 as M\to\infty.

From the proof above we know that for any normalized random variable NS=(S-ES)/\sqrt{Var(S)}, we have NS=O_{p}(1), or NS is bounded in probability- the reason is natural, if any random variable is not bounded, either its mean is too large(E(S_{n})\to\infty) or it varies too much(Var(S_{n})\to\infty), and normalization will eliminate those two possibilities. On the other hand, for a specified random variable S, if ES<\infty and Var(S)<\infty, then S=ES+O_{p}(1), especially, when ES=0, S=O_{p}(1) .

Example: from center limit theorem we know that \sqrt{n}(\bar{X}-EX)\to N(0,DX), then we have

//\sqrt{n}(\bar{X}-EX)=N(0,DX)+o_{p}(1)=\sqrt{DX}O_{p}(1)+o_{p}(1)=O_{p}(1)//,

//\bar{X}=EX+O_{p}(1)\times n^{-1/2}=EX+O_{p}(n^{-1/2})//.

P(\frac{\sqrt{n}(\bar{X}-EX)}{DX}>M )\to P(Z>M),\ Z\sim N(0,1). P(Z>M) can be smaller than \forall\epsilon as long as M is large enough, so \frac{\sqrt{n}(\bar{X}-EX)}{DX}=O_p(1), or \bar{X}=EX+O_{p}(n^{-1/2}).

The weak law of large numbers states that \bar{X}\stackrel{P}{\to}EX, so we have

\bar{X}-EX=o_{p}(1).

(Update: 2012/Feb/17) Similarly, let X_{n} be a sequence of random vectors, using Markov inequality P(|X_{n}|>M)\leq\frac{E|X_{n}|^{k}}{M^{k}}, we have

1. If there is a number k>0 such that E|X_{n}|^{k} is bounded, then X_{n}=O_{p}(1);
similarly, if E|X_{n}|^{k}\leq ca_{n}, where c is a constant and a_{n} is a sequence of positive numbers,
then X_{n}=O_{p}(a_{n}^{1/k}).

2. If there is a number k>0 such that E|X_{n}|^{k}\to0 (So M can be \epsilon), then X_{n}=o_{p}(1);
similarly, if E|X_{n}|^{k}\leq ca_{n}, where c is a constant and a_{n} is a sequence of positive numbers,
then X_{n}=o_{p}(b_{n}) for any sequence b_{n}>0 such that b_{n}^{-1}a_{n}^{1/k}\to0.

3. If there are sequences of vectors \{\mu_{n}\} and singularization matrices \{A_{n}\} such that A_{n}(X_{n}-\mu_{n}) converges in distribution, then X_{n}=\mu_{n}+O_{p}(||A_{n}^{-1}||).

References:

Elements of Large-Sample Theory, E.L. Lehmann, 1998

Asymptotic Statistics, A. W. van der Vaart, 2000

Linear and Generalized Linear Mixed Models and Their Applications, Jiming Jiang, 2006

Advertisements
This entry was posted in analysis, asymptotic, convergence. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s