Properties of Markov chain

Chentian Wu

2023-08-04

1. Properties

$\textbf{Theorem 5.1.}\quad$ Given a distribution $\lambda$ and a transition probability matrix $P$ on a countable state space $I$, $(X_t){t⩾0} \sim \text{Markov}(\lambda, P)$ if and only if, $$ \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \cdots, X_{n}=x_{n}\right)=\lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}} $$ $\forall x_0, x_1, · · · , x_n \in I$.

$$Proof.$$ First, we prove the “if” statement. ( $$\Leftarrow$$ )

$X_0 \sim \lambda$ (Markov Chain definition)
Next, we start from the definition of the conditional probability of $$X_n$$ given the history and prove the Markov property for $(X_t)_{t⩾0}$

\begin{aligned} \mathbb{P}\left(X_{n}=x_{n} \mid X_{0}\right. & \left.=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right)=\frac{\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n}=x_{n}\right)}{\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right)} \\ & =\frac{\lambda_{x_{0}} \cdot p_{x_{0} x_{1}} \cdot p_{x_{1} x_{2}} \cdots p_{x_{n-2} x_{n-1}} \cdot p_{x_{n-1} x_{n}}}{\lambda_{x_{0}} \cdot p_{x_{0} x_{1}} \cdot p_{x_{1} x_{2}} \cdots p_{x_{n-2} x_{n-1}}} \\ & =p_{x_{n-1} x_{n}} \\ & =\mathbb{P}\left(X_{n}=x_{n} \mid X_{n-1}=x_{n-1}\right)\end{aligned}

This proves that, if $\forall x_0, x_1,\cdots, x_n \in I:\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \cdots, X_{n}=x_{n}\right) = \displaystyle \lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}}$ , then $(X_t)_{t⩾0} \sim\text{Markov}(\lambda, P)$ .

Now, we prove the “only if” statement ( $$\Rightarrow$$ ). We begin with the definition of conditional probability followed by the Markov property.

\begin{array}{l}\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n}=x_{n}\right) \\ =\mathbb{P}\left(X_{n}=x_{n} \mid X_{0}=x_{0}, \ldots X_{n-1}\right) \cdot \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right) \\ =p_{x_{n-1} x_{n}} \cdot \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right)\end{array}

We recursively use Markov property. Specifically, in the above expression we try to rewrite $\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right)$ as a conditional probability followed by use of Markov property.

\begin{array}{l}\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n}=x_{n}\right) \\ =p_{x_{n-1} x_{n}} \cdot \mathbb{P}\left(X_{n-1}=x_{n-1}\mid X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-2}=x_{n-2}\right)\\ =p_{x_{n-1} x_{n}} \cdot p_{x_{n-2} x_{n-1}}\cdot \mathbb{P}\left(X_{n-2}=x_{n-2}\mid X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-3}=x_{n-3}\right)\\ =\ldots\\ =\displaystyle \lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}} \end{array}

We next consider some consequences of Markov property.

$\textbf{Definition 5.2.}\quad$ We call $\delta_x$ as the degenerate distribution (退化分布) at $x$ if $X \sim \delta_x$, implies that $P(X = x) = 1$.

$\textbf{Remark 5.3.}$ As a consequence of the Markov property ( $\textbf{Theorem 5.1}$ ), we can show that if $(X_t)_{t⩾0}\sim \text{Markov}(\lambda, P)$ , then given $$X_k= x$$ , we have $(X_t)_{t⩾k}\sim \text{Markov}(\delta_x, P)$ .

This remark reinforces the idea that Markov chains do not have memory. If we consider the state of a Markov chain at any arbitrary time instant, the random process starting that instant is also Markov. It has the same transition probability matrix; however, the initial distribution is degenerate.

$\textbf{Lemma 5.4.}$ If $$P$$ and $$Q$$ are transition matrices on $$I$$ , then $$P Q$$ is a transition matrix on $$I$$ .

$$Proof.$$ The entries of product matrix $$P Q$$ can be written as

\forall i,j: (P Q)_{i j}=\sum_{k \in I} p_{i k} q_{k j} \geqslant 0

The sum of all entries in a row is computed as the sum of $(P Q)_{ij}$ overall $$j$$ , as

\begin{aligned} \sum_{j}(P Q)_{i j}=\sum_{j} \sum_{k} p_{i k} q_{k j} & =\sum_{k}\left(\sum_{j} q_{k j}\right) p_{i k} &(\text{Fubini's Theorem}) \\ & =\sum_{k} p_{i k} &(\forall k: \sum_{j} q_{k j}=1)\\ & =1 &(\forall i: \sum_{k} p_{i k}=1) \end{aligned}

Since every entry in the $$P Q$$ matrix is non-negative and the row sum of the matrix is 1, we know that $$P Q$$ is a transition probability matrix (row stochastic).

$\textbf{Corollary 5.5.}$ If $$P$$ is a transition probability matrix, then $$P$$ n is also a transition probability matrix.

$$Proof.$$ We can prove this statement easily by induction. The base case is when $$n = 1$$ . This is true trivially. We prove, given that the statement is true for $$n = k-1$$ , it is also true for $$n = k$$ . Begin by rewriting $$P^k$$ as a product of $$P$$ and $P^{k−1}$ . From our assumption, $P^{k−1}$ is a transition probability matrix. We know from $\textbf{Lemma 5.4}$ that since both $$P$$ and $P^{k−1}$ are transition probability matrix, $$P^k$$ is also a transition matrix.

Note that the converse is not true. $$P^n$$ being a transition matrix is insufficient for $$P$$ to be a transition matrix. A counter-example can be easily constructed.

$\textbf{Counter-example}\ (n = 2)$ : Consider a transition matrix $A^{2}=\left[\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right]$ . We can write $$A^2 = AA$$ where $A=\left[\begin{array}{cc}1 / \sqrt{2} & 1 / \sqrt{2} \\ 1 / \sqrt{2} & -1 / \sqrt{2}\end{array}\right]$ . This counter-example proves that $$A^2$$ being a transition matrix is not sufficient for $$A$$ to be a transition matrix.

$\textbf{Remark 5.6.}$ If $(X_t)_{t⩾0} \sim \text{Markov}(\lambda, P)$ , then

$\mathbb{P}\left(X_{n}=i\right)=\left(\lambda P^{n}\right)_{i}$
$\mathbb{P}\left(X_{n}=j \mid X_{0}=i\right)=\left(P^{n}\right)_{i j}$

$$Proof.$$

We know from $\textbf{Theorem 5.1}$ , $\displaystyle \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \cdots, X_{n}=x_{n}\right)=\lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}}$ . We can write the probability of a state being $$i$$ as,

\begin{aligned} \mathbb{P}\left(X_{n}=i\right) & =\sum_{x_{n-1}} \sum_{x_{n-2}} \ldots \sum_{x_{0}} \lambda_{x_{0}} p_{x_{0} x_{1}} \ldots p_{x_{n-1} x_{n-1}} p_{x_{n-1} i} \\ & =\sum_{x_{n-2}} \ldots \sum_{x_{0}} \lambda_{x_{0}} p_{x_{0} x_{1}} \ldots\left(\sum_{x_{n-1}} p_{x_{n-1} x_{n-1}} p_{x_{n-1} i}\right) &\text{(Fubini’s Theorem)}\\ & =\left(\lambda P^{n}\right)_{i} &\text{(Recursively using Fubini’s Theorem)} \end{aligned}

We begin by using the definition of conditional probability and $\textbf{Theorem 5.1.}$ Moreover, we use $\text{Fubini’s theorem}$ recursively.

\begin{aligned} \mathbb{P}\left(X_{n}=j \mid X_{0}=i\right) & =\frac{\mathbb{P}\left(X_{n}=j, X_{0}=i\right)}{\mathbb{P}\left(X_{0}=i\right)} \\ & =\frac{\sum_{x_{n-1}} \sum_{x_{n-1}} \ldots \sum_{x_{1}} \lambda_{i} p_{i x_{1}} \ldots p_{x_{n-1} x_{n-1}} p_{x_{n-1} i}}{\lambda_{i}} \\ & =\left(P^{n}\right)_{i, j} &\text{(Fubini’s Theorem)} \end{aligned}

$\textbf{Corollary 5.7.}$ Let $(X_t)_{t⩾0} \sim \text{Markov}(\lambda, P)$ . Then,

\forall a\ge 0,b\ge 1: (X_{a+bt})_{t⩾0}\sim \text{Markov}(\lambda P^a , P^b)

2. Hitting Time

$\textbf{Definition 5.8.}\quad$ Consider a discrete-time Markov Chain $(X_t)_{t⩾0} \sim \text{Markov}(\lambda, P)$ taking values in $I$ and a set $A \subseteq I$. We define the hitting time of $A$, denoted by $H^A$, as, $$ H^A := \inf (\{n ⩾ 0 \mid X_n \in A\}). $$ We denote the hitting time of $A$ given initial state as $H_i^A = (H^A | X_0 = i)$.

Intuitively, $$H_i^A$$ 指的是在初始状态为 $$i$$ 的情况下达到状态 $$A$$ 所需要的时间

$\textbf{Remark 5.9.}$ Let us look at hitting time of $$A$$ for some special cases:

$$H^A = 0$$ if $X_0 \in A$
$$H^A = 1$$ if $X_0 \not \in A$ and $X_1 \in A$
$$H^A = n$$ if $X_0, X_1, .\ldots , X_{n−1} \not \in A$ and $X_n \in A$

Note that the hitting time of a set is a random variable that takes values in $\{0, 1, 2, \ldots\} \cup \{∞\}$

$\textbf{Remark 5.10.}$ We denote the probability that the hitting time of set $$A$$ is finite as $$h^A$$ . That is

h^A = \mathbb P(H^A < ∞).

We can also consider probability of finite hitting time conditioned on the initial state,

h_i^A =\mathbb P(H^A < ∞ \mid X_0 = i).

Clearly, using law of total probability.

h^A=\sum_i\lambda_ih_i^A

$\textbf{Corollary 5.11.}$ If $$h_i^A < 1$$ , then, $\mathbb E[H_i^A ] = ∞$

The proof for this corollary is simple. One can write down the definition of expectation and note that since $$h_i^A < 1$$ for some $$i$$ , hitting time when starting at $$i$$ is infinity $$H_i^A = \infty$$ with non-zero probability. Hence the expectation of hitting time is infinity. Consider the probability that $$H_i^A$$ is finite, $h_i^A = \mathbb P_i(H^A < ∞)$ . The following result is true.

$\textbf{Theorem 5.12.}\quad$ The finite hitting time probabilities $(h_i^A)_{i∈I}$ satisfy the following equation - $$ h_{i}^{A}=\left\{\begin{array}{ll}1 & \text { if } i \in A \\ \displaystyle \sum_{j \in I} p_{i j} h_{j}^{A} & \text { if } i \notin A .\end{array}\right. $$ Moreover, if $(x_i)_{i∈I}$ is another non-negative solution of the above equation, then $(h_i^A)_{i∈I}$ is the smallest non-negative solution of the above equation.