## Stochastic o and O symbols

Stochastic $o$ and $O$ symbols are the basic symbols for Asymptotic Statistics or Large Sample Theory.

(i) $A_{n}=o_{p}(B_{n})$: if $|\frac{A_{n}}{b_{n}}|\stackrel{P}{\to}0$.

sequence of random variables $A_{n}$ is of smaller order in probability than a sequence $B_{n}$.

In particular, $A_{n}=o_{p}(1)$, “small oh-P-one”, if and only if $A_{n}\stackrel{P}{\to}0$; so $A_{n}=o_{p}(B_{n})$ means $A_{n}=Y_{n}B_{n}$ and $Y_{n}\stackrel{P}{\to}0$.

Example: $X=o_{p}(1)$ means $X\stackrel{P}{\to}0$, and $X=o_{p}(n^{-1/2})$ means $n^{1/2}X\stackrel{P}{\to}0$, or $X$ goes to $0$ faster than $\frac{1}{n^{1/2}}$ in probability(such as $X=\frac{1}{n}$).

(ii) $A_{n}=O_{p}(B_{n})$ : if given $\epsilon>0$, there exists a constant $M=M(\epsilon)$ and an integer $n_{0}=n_{0}(\epsilon)$ such that $P(|A_{n}|\leq M|B_{n}|)\geq1-\epsilon$ for all $n>n_{0}$.

sequence $A_{n}$ is be of order less than or equal to that of $B_{n}$ in probability.

In particular, $A_{n}=O{}_{p}(1)$, “big oh-P-one”, if for any$\epsilon>0$, there exists a constant $M$ and an integer $n_{0}$ such that $P(|A_{n}|\leq M)\geq1-\epsilon$ for all $n>n_{0}$, $A_{n}$ is said to be bounded in probability(or tight); so $A_{n}=O_{p}(B_{n})$ means $A_{n}=Y_{n}B_{n}$ and $Y_{n}=O_{p}(1)$.

It’s easy to see from the definition that $O_{p}(1)=O_{p}(C)$ for any constant $0.

(iii) $A_{n}\asymp_{p}B_{n}$: if given $\epsilon>0$, there exist constants $0 and an integer $n_{0}$ such that $P[m<|\frac{A_{n}}{B_{n}}| for all $n>n_{0}$.

sequence $A_{n}$ is said to be of the same order as $B_{n}$ in probability.

Some facts:

$o_{p}(1)+o_{p}(1)=o_{p}(1)$: If $X_{n}\stackrel{P}{\to}0$ and $Y_{n}\stackrel{P}{\to}0$, then $Z_{n}=X_{n}+Y_{n}\stackrel{P}{\to}0$. (example of continuous-mapping theorem)

$o_{p}(1)+O_{p}(1)=O_{p}(1)$

$O_{p}(1)o_{p}(1)=o_{p}(1)$: If the sequence $\{Y_{n},n=1,2,\cdots\}$ is bounded in probability and if $\{C_{n}\}$ is a sequence of random variables tending to 0 in probability, then $C_{n}Y_{n}\stackrel{P}{\to}0$.

$(1+o_{p}(1))^{-1}=O_{p}(1)$

$o_{p}(R_{n})=R_{n}o_{p}(1)$

$O_{p}(R_{n})=R_{n}O_{p}(1)$

$o_{p}(O_{p}(1))=o_{p}(1)$

Lemma: Let $R$ be a function defined on domain in $\mathcal{R}^{k}$ such that $R(0)=0$. Let $X_{n}$ be a sequence of random vectors with values in the domain of $R$ that converges in probability to zero. Then, for every $p>0$,

(i) if $R(h)=o(||h||^{p})$ as $h\to0$, then $R(X_{n})=o_{p}(||X_{n}||^{p})$;

(ii) if $R(h)=O(||h||^{p})$ as $h\to0$, then $R(X_{n})=O_{p}(||X_{n}||^{p})$;

Result: For a random variable $S$, $S=ES+O_{p}(\sqrt{Var(S}))$.

Proof:

We only needs to prove that $(S-ES)/\sqrt{Var(S)}=O_{p}(1)$ or equally, for any$\epsilon>0$, there exists a constant $M$ and an integer $n_{0}$ such that $P(|(S-ES)/\sqrt{Var(S)}|\leq M)\geq1-\epsilon$ for all $n>n_{0}$.

Let $NS=(S-ES)/\sqrt{Var(S)}$,

According to Markov inequality, $P(|NS|\leq M)\geq ENS^{2}/M^{2}=[E(S-ES)^{2}/Var(S)]/M^{2}=1/M^{2}\to0$ as $M\to\infty$.

From the proof above we know that for any normalized random variable $NS=(S-ES)/\sqrt{Var(S)}$, we have $NS=O_{p}(1)$, or $NS$ is bounded in probability- the reason is natural, if any random variable is not bounded, either its mean is too large($E(S_{n})\to\infty$) or it varies too much($Var(S_{n})\to\infty$), and normalization will eliminate those two possibilities. On the other hand, for a specified random variable $S$, if $ES<\infty$ and $Var(S)<\infty$, then $S=ES+O_{p}(1)$, especially, when $ES=0$, $S=O_{p}(1)$ .

Example: from center limit theorem we know that $\sqrt{n}(\bar{X}-EX)\to N(0,DX)$, then we have

//$\sqrt{n}(\bar{X}-EX)=N(0,DX)+o_{p}(1)=\sqrt{DX}O_{p}(1)+o_{p}(1)=O_{p}(1)$//,

//$\bar{X}=EX+O_{p}(1)\times n^{-1/2}=EX+O_{p}(n^{-1/2})$//.

$P(\frac{\sqrt{n}(\bar{X}-EX)}{DX}>M )\to P(Z>M),\ Z\sim N(0,1)$. $P(Z>M)$ can be smaller than $\forall\epsilon$ as long as $M$ is large enough, so $\frac{\sqrt{n}(\bar{X}-EX)}{DX}=O_p(1)$, or $\bar{X}=EX+O_{p}(n^{-1/2})$.

The weak law of large numbers states that $\bar{X}\stackrel{P}{\to}EX$, so we have

$\bar{X}-EX=o_{p}(1)$.

(Update: 2012/Feb/17) Similarly, let $X_{n}$ be a sequence of random vectors, using Markov inequality $P(|X_{n}|>M)\leq\frac{E|X_{n}|^{k}}{M^{k}}$, we have

1. If there is a number $k>0$ such that $E|X_{n}|^{k}$ is bounded, then $X_{n}=O_{p}(1)$;
similarly, if $E|X_{n}|^{k}\leq ca_{n}$, where $c$ is a constant and $a_{n}$ is a sequence of positive numbers,
then $X_{n}=O_{p}(a_{n}^{1/k})$.

2. If there is a number $k>0$ such that $E|X_{n}|^{k}\to0$ (So $M$ can be $\epsilon$), then $X_{n}=o_{p}(1)$;
similarly, if $E|X_{n}|^{k}\leq ca_{n}$, where $c$ is a constant and $a_{n}$ is a sequence of positive numbers,
then $X_{n}=o_{p}(b_{n})$ for any sequence $b_{n}>0$ such that $b_{n}^{-1}a_{n}^{1/k}\to0$.

3. If there are sequences of vectors $\{\mu_{n}\}$ and singularization matrices $\{A_{n}\}$ such that $A_{n}(X_{n}-\mu_{n})$ converges in distribution, then $X_{n}=\mu_{n}+O_{p}(||A_{n}^{-1}||)$.

References: