Properties of Markov chain

$$\huge\textbf{Properties of Markov chain} $$

1. Properties

$\textbf{Theorem 5.1.}\quad$ Given a distribution $\lambda$ and a transition probability matrix $P$ on a countable state space $I$, $(X_t){t⩾0} \sim \text{Markov}(\lambda, P)$ if and only if, $$ \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \cdots, X_{n}=x_{n}\right)=\lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}} $$ $\forall x_0, x_1, · · · , x_n \in I$.

$Proof.$ First, we prove the if statement. ($⇐$)

  1. $X_0 \sim \lambda$ (Markov Chain definition)
  2. Next, we start from the definition of the conditional probability of $X_n$ given the history and prove the Markov property for $(X_t)_{t⩾0}$
$$\begin{aligned} \mathbb{P}\left(X_{n}=x_{n} \mid X_{0}\right. & \left.=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right)=\frac{\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n}=x_{n}\right)}{\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right)} \\ & =\frac{\lambda_{x_{0}} \cdot p_{x_{0} x_{1}} \cdot p_{x_{1} x_{2}} \cdots p_{x_{n-2} x_{n-1}} \cdot p_{x_{n-1} x_{n}}}{\lambda_{x_{0}} \cdot p_{x_{0} x_{1}} \cdot p_{x_{1} x_{2}} \cdots p_{x_{n-2} x_{n-1}}} \\ & =p_{x_{n-1} x_{n}} \\ & =\mathbb{P}\left(X_{n}=x_{n} \mid X_{n-1}=x_{n-1}\right)\end{aligned} $$

This proves that, if $\forall x_0, x_1,\cdots, x_n \in I:\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \cdots, X_{n}=x_{n}\right) = \displaystyle \lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}}$, then $(X_t)_{t⩾0} \sim\text{Markov}(\lambda, P)$.

Now, we prove the only if statement ($⇒$). We begin with the definition of conditional probability followed by the Markov property.

$$\begin{array}{l}\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n}=x_{n}\right) \\ =\mathbb{P}\left(X_{n}=x_{n} \mid X_{0}=x_{0}, \ldots X_{n-1}\right) \cdot \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right) \\ =p_{x_{n-1} x_{n}} \cdot \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right)\end{array} $$

We recursively use Markov property. Specifically, in the above expression we try to rewrite $\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-1}=x_{n-1}\right) $ as a conditional probability followed by use of Markov property.

$$\begin{array}{l}\mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n}=x_{n}\right) \\ =p_{x_{n-1} x_{n}} \cdot \mathbb{P}\left(X_{n-1}=x_{n-1}\mid X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-2}=x_{n-2}\right)\\ =p_{x_{n-1} x_{n}} \cdot p_{x_{n-2} x_{n-1}}\cdot \mathbb{P}\left(X_{n-2}=x_{n-2}\mid X_{0}=x_{0}, X_{1}=x_{1}, \ldots, X_{n-3}=x_{n-3}\right)\\ =\ldots\\ =\displaystyle \lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}} \end{array} $$

We next consider some consequences of Markov property.

$\textbf{Definition 5.2.}\quad$ We call $\delta_x$ as the degenerate distribution (退化分布) at $x$ if $X \sim \delta_x$, implies that $P(X = x) = 1$.

$\textbf{Remark 5.3.}$ As a consequence of the Markov property ($\textbf{Theorem 5.1}$), we can show that if $(X_t)_{t⩾0}\sim \text{Markov}(\lambda, P)$, then given $X_k= x$, we have $(X_t)_{t⩾k}\sim \text{Markov}(\delta_x, P)$.

This remark reinforces the idea that Markov chains do not have memory. If we consider the state of a Markov chain at any arbitrary time instant, the random process starting that instant is also Markov. It has the same transition probability matrix; however, the initial distribution is degenerate.


$\textbf{Lemma 5.4.}$ If $P$ and $Q$ are transition matrices on $I$, then $P Q$ is a transition matrix on $I$.

$Proof.$ The entries of product matrix $P Q$ can be written as

$$\forall i,j: (P Q)_{i j}=\sum_{k \in I} p_{i k} q_{k j} \geqslant 0 $$

The sum of all entries in a row is computed as the sum of $(P Q)_{ij}$ overall $j$, as

$$\begin{aligned} \sum_{j}(P Q)_{i j}=\sum_{j} \sum_{k} p_{i k} q_{k j} & =\sum_{k}\left(\sum_{j} q_{k j}\right) p_{i k} &(\text{Fubini's Theorem}) \\ & =\sum_{k} p_{i k} &(\forall k: \sum_{j} q_{k j}=1)\\ & =1 &(\forall i: \sum_{k} p_{i k}=1) \end{aligned} $$

Since every entry in the $P Q$ matrix is non-negative and the row sum of the matrix is 1, we know that $P Q$ is a transition probability matrix (row stochastic).


$\textbf{Corollary 5.5.}$ If $P$ is a transition probability matrix, then $P$ n is also a transition probability matrix.

$Proof.$ We can prove this statement easily by induction. The base case is when $n = 1$. This is true trivially. We prove, given that the statement is true for $n = k−1$, it is also true for $n = k$. Begin by rewriting $P^k$ as a product of $P$ and $P^{k−1}$ . From our assumption, $P^{k−1}$ is a transition probability matrix. We know from $\textbf{Lemma 5.4}$ that since both $P$ and $P^{k−1}$ are transition probability matrix, $P^k$ is also a transition matrix.

Note that the converse is not true. $P^n$ being a transition matrix is insufficient for $P$ to be a transition matrix. A counter-example can be easily constructed.

$\textbf{Counter-example}\ (n = 2)$: Consider a transition matrix $A^{2}=\left[\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right]$. We can write $A^2 = AA$ where $A=\left[\begin{array}{cc}1 / \sqrt{2} & 1 / \sqrt{2} \\ 1 / \sqrt{2} & -1 / \sqrt{2}\end{array}\right]$. This counter-example proves that $A^2$ being a transition matrix is not sufficient for $A$ to be a transition matrix.


$\textbf{Remark 5.6.}$ If $(X_t)_{t⩾0} \sim \text{Markov}(\lambda, P)$, then

  1. $\mathbb{P}\left(X_{n}=i\right)=\left(\lambda P^{n}\right)_{i}$
  2. $\mathbb{P}\left(X_{n}=j \mid X_{0}=i\right)=\left(P^{n}\right)_{i j}$

$Proof.$

  1. We know from $\textbf{Theorem 5.1}$, $\displaystyle \mathbb{P}\left(X_{0}=x_{0}, X_{1}=x_{1}, \cdots, X_{n}=x_{n}\right)=\lambda_{x_{0}} \cdot \prod_{i=1}^n p_{x_{i-1} x_{i}}$ . We can write the probability of a state being $i$ as,
$$\begin{aligned} \mathbb{P}\left(X_{n}=i\right) & =\sum_{x_{n-1}} \sum_{x_{n-2}} \ldots \sum_{x_{0}} \lambda_{x_{0}} p_{x_{0} x_{1}} \ldots p_{x_{n-1} x_{n-1}} p_{x_{n-1} i} \\ & =\sum_{x_{n-2}} \ldots \sum_{x_{0}} \lambda_{x_{0}} p_{x_{0} x_{1}} \ldots\left(\sum_{x_{n-1}} p_{x_{n-1} x_{n-1}} p_{x_{n-1} i}\right) &\text{(Fubini’s Theorem)}\\ & =\left(\lambda P^{n}\right)_{i} &\text{(Recursively using Fubini’s Theorem)} \end{aligned} $$
  1. We begin by using the definition of conditional probability and $\textbf{Theorem 5.1.}$ Moreover, we use $\text{Fubini’s theorem}$ recursively.
$$\begin{aligned} \mathbb{P}\left(X_{n}=j \mid X_{0}=i\right) & =\frac{\mathbb{P}\left(X_{n}=j, X_{0}=i\right)}{\mathbb{P}\left(X_{0}=i\right)} \\ & =\frac{\sum_{x_{n-1}} \sum_{x_{n-1}} \ldots \sum_{x_{1}} \lambda_{i} p_{i x_{1}} \ldots p_{x_{n-1} x_{n-1}} p_{x_{n-1} i}}{\lambda_{i}} \\ & =\left(P^{n}\right)_{i, j} &\text{(Fubini’s Theorem)} \end{aligned} $$

$\textbf{Corollary 5.7.}$ Let $(X_t)_{t⩾0} \sim \text{Markov}(\lambda, P)$. Then,

$$\forall a\ge 0,b\ge 1: (X_{a+bt})_{t⩾0}\sim \text{Markov}(\lambda P^a , P^b) $$

2. Hitting Time

$\textbf{Definition 5.8.}\quad$ Consider a discrete-time Markov Chain $(X_t)_{t⩾0} \sim \text{Markov}(\lambda, P)$ taking values in $I$ and a set $A \subseteq I$. We define the hitting time of $A$, denoted by $H^A$, as, $$ H^A := \inf (\{n ⩾ 0 \mid X_n \in A\}). $$ We denote the hitting time of $A$ given initial state as $H_i^A = (H^A | X_0 = i)$.

Intuitively, $H_i^A$ 指的是在初始状态为 $i$ 的情况下达到状态 $A$ 所需要的时间

$\textbf{Remark 5.9.}$ Let us look at hitting time of $A$ for some special cases:

  1. $H^A = 0$ if $X_0 \in A$
  2. $H^A = 1$ if $X_0 \not \in A$ and $X_1 \in A$
  3. $H^A = n$ if $X_0, X_1, .\ldots , X_{n−1} \not \in A$ and $X_n \in A$

Note that the hitting time of a set is a random variable that takes values in $\{0, 1, 2, \ldots\} \cup \{∞\}$


$\textbf{Remark 5.10.}$ We denote the probability that the hitting time of set $A$ is finite as $h^A$. That is

$$h^A = \mathbb P(H^A < ∞). $$

We can also consider probability of finite hitting time conditioned on the initial state,

$$h_i^A =\mathbb P(H^A < ∞ \mid X_0 = i). $$

Clearly, using law of total probability.

$$h^A=\sum_i\lambda_ih_i^A $$

$\textbf{Corollary 5.11.}$ If $h_i^A < 1$, then, $\mathbb E[H_i^A ] = ∞$


The proof for this corollary is simple. One can write down the definition of expectation and note that since $h_i^A < 1$ for some $i$, hitting time when starting at $i$ is infinity $H_i^A = ∞$ with non-zero probability. Hence the expectation of hitting time is infinity. Consider the probability that $H_i^A$ is finite, $h_i^A = \mathbb P_i(H^A < ∞)$. The following result is true.


$\textbf{Theorem 5.12.}\quad$ The finite hitting time probabilities $(h_i^A)_{i∈I}$ satisfy the following equation - $$ h_{i}^{A}=\left\{\begin{array}{ll}1 & \text { if } i \in A \\ \displaystyle \sum_{j \in I} p_{i j} h_{j}^{A} & \text { if } i \notin A .\end{array}\right. $$ Moreover, if $(x_i)_{i∈I}$ is another non-negative solution of the above equation, then $(h_i^A)_{i∈I}$ is the smallest non-negative solution of the above equation.