On the concept of entropy of a finite probabilistic scheme

D. K. Faddeev, 1956.

English translation by Arina Zinovyeva.

$\newcommand{\half}{\frac{1}{2}}$

Section 1. A. Ya. Khinchin's article [1] established that the entropy $H(p_1 , p_2 , ..., p_n )$ of a finite probabilistic scheme $(A_1 , A_2 , ..., A_n )$ can be characterised by the following system of axioms.

1. $H(p_1 , p_2 , ..., p_n )$ is a continuous function of $p_1 , p_2 , ..., p_n$ in the domain $0\le p_i\le 1, p_1 + ... + p_n = 1$ .

2. $H(p_1 , p_2 , ..., p_n )$ is a symmetric function of $p_1 , p_2 , ..., p_n$ .

3. $H(p_1 , p_2 , ..., p_n , 0) = H(p_1 , p_2 , ..., p_n )$ .

4. $H(q_{11} , ...,q_{1m} ; q_{21} , ... , q_{2m} ; ...; q_{n1} , ... , q_{nm} ) = H(p_1 , p_2 , ..., p_n ) + \sum_{i=1}^n p_i H(\frac{q_{i1}}{p_i},...,\frac{q_{im}}{p_i})$

where $p_i = q_{i1} + ... + q_{im}\ \ (i = 1, ..., n)$ (this axiom is formulated in the article [1] in other terms).

5. $H(p_1 , p_2 , ..., p_n ) \le H(\frac{1}{n},\frac{1}{n},..., \frac{1}{n}).$

Entropy is defined by these axioms uniquely up to a positive multiplier. The aim of this note is a simplification of this system of axioms. Namely, we offer the following axioms:

1'. $H(p, 1-p)$ is continuous, with $0 \le p \le1,$ and positive at least at one point.

2'. $H(p_1 , p_2 , ..., p_n )$ is a symmetric function of $p_1 , p_2 , ..., p_n$ .

3'. With $n \ge 2,$ $H(p_1 , ...., p_{n-1} ; q_1 , q_2 ) = H(p_1 , ...., p_{n-1} , p_n ) + p_n H(\frac{q_1}{p_n}, \frac{q_2}{p_n}).$ Here $p_n = q_1 + q_2$ .

The difference between these two systems of axioms is, first of all, axiom 5 (extremality) is replaced with the requirement of positivity of the entropy at one point, and second of all, axioms 3 and 4 are replaced with axiom 3' that is a very natural one if we consider entropy as a measure of the "indefiniteness" of a scheme. Truly, the "indefiniteness" of a scheme $\left( \begin{array}{ccccc} A_1, & ..., & A_{n-1}; & B_1, & B_2 \\ p_1, & ..., & p_{n-1}; & q_1, & q_2 \end{array}\right)$ is different from the "indefiniteness" of the scheme $\left( \begin{array}{cccc} A_1, & ..., & A_{n-1}, & A_n \\ p_1, & ..., & p_{n-1}, & p_n \end{array}\right)$ occurring from dividing events $A_n$ into two sub-events $B_1$ , $B_2$ with conditional probabilities $\frac{q_1}{p_n},\frac{q_2}{p_n}$ . This "indefiniteness" is overcome only when the event $A_n$ is realised, the probability of which equals $p_n$ .

Section 2. First of all, we establish that axioms 2, 3, 4 are equivalent to 2', 3'.

Lemma 1. From 2, 3, 4, we conclude $H(1, 0) = 0.$

Proof. We calculate $H(\frac{1}{2}, \frac{1}{2}, 0, 0)$ with two methods. On one hand, by the consequence of 3, we have $H(\frac{1}{2}, \frac{1}{2}, 0, 0) = H(\frac{1}{2}, \frac{1}{2} ).$ On the other hand, $H(\frac{1}{2}, \frac{1}{2}, 0, 0) = H(\frac{1}{2}, 0, \frac{1}{2}, 0)$ and by the consequence of 4, we get $H(\frac{1}{2}, 0, \frac{1}{2}, 0) = H(\frac{1}{2}, \frac{1}{2})+\frac{1}{2}H(1, 0) + \frac{1}{2}H(1, 0).$ Therefore, $H(1, 0)=0.$

Lemma 2. From 2, 3, 4 we conclude 3'.

Proof. $H(p_1, 0; ...; p_{n-1}, 0; q_1, q_2 )$ equals, by the consequence of 2 and 3, $H(p_1, 0; ... p_{n-1}, 0; q_1, q_2 )$ , which equals, by the consequence of 4, $H(p_1, ..., p_{n-1}, p_n ) + p_1 H(1, 0) + ... + p_{n-1} H(1, 0) + p_n H(\frac{q_1}{p_n}, \frac{q_2}{p_n}) = H(p_1, ..., p_{n-1}, p_n ) + p_n H(\frac{q_1}{p_n}, \frac{q_2}{p_n}).$

Lemma 3. From 2', 3', we conclude that $H(1, 0) = 0.$

Proof. We calculate with two methods $H(\half, \half, 0).$ On one hand, by the consequence of 3', we have $H(\half; \half, 0) = H(\half, \half ) + \half H(1, 0).$ On the other hand, $H(\half, \half, 0) = H(0;\half,\half) = H(0, 1) + H(\half, \half ).$ Therefore, $H(1, 0) = 0.$

Lemma 4. From 2', 3' we conclude 3.

Proof. $H(p_1, ... p_{n-1}; p_n, 0) = H(p_1,...,p_{n-1}, p_n) + p_n H(1, 0) = H(p_1, ..., p_{n-1}, p_n ).$

Lemma 5. From 3', we conclude that: $H(p_1, ..., p_{n-1} ; q_1, ..., q_m) = H(p_1, ..., p_{n-1}, p_n) + p_n H\Bigl(\frac{q_1}{p_n},...,\frac{q_m}{p_n}\Bigr).$ Here $n\ge2$ , $p_n = q_1 + ... + q_m.$

Proof. With $m=2$ the statement of lemma is the same as axiom 3', with $m\ge3$ the lemma is easily proven by induction on $m$ .

Lemma 6. From 2', 3', we conclude that: $H(q_{11}, ..., q_{1m_1} ; q_{21}, ..., q_{2m_2}; ...; q_{n1}, ..., q_{nm_n}) = H(p_1, ..., p_n) + \sum_{i=1}^n p_i H\Bigl(\frac{q_{i1}}{p_i},...,\frac{q_{im_i}}{p_i}\Bigr).$ Here $p_i = q_{i1} + ... + q_{im_i}.$

Proof. We need to apply lemma 5, $n$ times to each group of arguments, which is possible by the consequence of symmetry. Setting $m_1 = m_2 = ... = m_n = m,$ we get that from 2', 3' we can conclude 4.

So, we have established the equivalence of axiom groups 2, 3, 4 and 2', 3'.

Section 3. Assume, following the article [1], $F(n) = H\Bigl(\frac{1}{n},\frac{1}{n},..., \frac{1}{n}\Bigr)$ with $n\ge2$ and $F(1) = 0.$ Applying lemma 6 to the case $m_1 =... =m_n=m , q_{ij} = \frac{1}{mn}$ we get $\begin{align} F(mn) = F(m) + F(n) \end{align}$ with $m\ge 2, n\ge 2.$ With $m =1$ or $n=1,$ equation (1) is trivial. Further, applying lemma 5 to $H\bigl(\frac{1}{n};\frac{1}{n},...,\frac{1}{n}\bigr)$ , we get $H\biggl(\frac{1}{n},\frac{1}{n},...,\frac{1}{n}\biggr) = H\biggl(\frac{1}{n}, \frac{n-1}{n}\biggr) + \frac{n-1}{n} H\biggl(\frac{1}{n-1},...,\frac{1}{n-1}\biggr),$ from that we conclude: $\eta_n = H\biggl(\frac{1}{n}, \frac{n-1}{n}\biggr) = F(n) - \frac{n-1}{n} F(n-1).$

Lemma 7. With $n \to \infty,$ $\mu_n = \frac{F(n)}{n}\to 0 \ \ \mathrm{and}\ \ \lambda_n=F(n)-F(n-1)\to 0.$

Proof. By the consequence of the continuity of $H(p,1-p)$ with $p=\eta_n=H\bigl(\frac{1}{n}, \frac{n-1}{n}\bigr)\to H(0,1)=0.$

Furthermore, $n\eta_n=nF(n)-(n-1)F(n-1),$ where $nF(n)=\sum_{k=1}^{n}k\eta_k$ and $\frac{F(n)}{n} = \frac{1}{n^2}\sum_{k=1}^{n} k\eta_k = \frac{n+1}{2n}\cdot\frac{2}{n(n+1)}\sum_{k=1}^{n}k\eta_k.$ But $\frac{2}{n(n+1)}\sum_{k=1}^n k\eta_k$ is an arithmetic average of the first $\frac{n(n+1)}{2}$ members of the sequence $\eta_1,\eta_2,\eta_2,\eta_3,\eta_3,\eta_3,...$ which tends to zero. Therefore, $\frac{2}{n(n+1)}\sum_{k=1}^n k\eta_k\to 0$ and $\mu_n=\frac{F(n)}{n}\to 0.$ Furthermore, $\lambda_n=F(n)-F(n-1)=\eta_n-\frac{1}{n}F(n-1)\to 0.$ The lemma is proven.

By the consequence of equation (1), $F(n)$ will be defined for all natural $n$ , as soon as we set the values of the $F$ function on prime numbers. Precisely, if $n=p_1^{\alpha_1},...,p_s^{\alpha_s}$ where $p_1,...,p_s$ are prime, then $F(n)=\alpha_1 F(p_1)+...+\alpha_s F(p_s).$ Assuming $F(p)=c_p \log p$ Then $F (n) = \alpha_1 c_{p_1} \log p_1 + ... + \alpha_s c_{p_s} \log p_s.$

Lemma 8. There is a greatest number among $c_p (p=2, 3, 5, 7, ...)$ .

Proof. We assume the opposite and show that this assumption will contradict the assumption about the continuity of $H(p, 1-p)$ at $p=0.$ It's true, if there is no greatest number in the sequence $c_p (p=2,3,5,7,...),$ then we can build an infinite sequence of prime numbers $p_1, p_2,...,p_i$ so that $p_1=2,$ $p_i$ is the smallest prime number such that $c_{p_i} > c_{p_{i-1}}.$ It is clear from the method of building this sequence that when a prime number $q$ is smaller than $p_i$ , then $c_q\le c_{p_i}.$

Let $i>1$ and $q_1^{\alpha_1}...q_s^{\alpha_s}$ be a canonical decomposition of the number $p_i-1$ . Consider $\begin{align*} \lambda_{p_i} &= F(p_i)-F(p_i-1) \\ &= F(p_i) - \frac{F(p_i)}{\log p_i} \log(p_i-1) + c_{p_i} \log(p_i-1) - F(p_i-1) \\ &= \frac{F(p_i)}{p_i}\cdot \frac{p_i}{\log p_i} \log \frac{p_i}{p_i-1} + \sum_{j=1}^s \alpha_j(c_{p_i} - c_{q_j})\log q_j. \end{align*}$ It is clear that $\begin{align*} \sum_{j=1}^s \alpha_j (c_{p_i}-c_{q_j}) \log q_j \ge (c_{p_i} - c_2)\log 2 \ge (c_{p_2} - c_2)\log 2, \end{align*}$ because $q_j<p_i, c_{q_j}\le c_{p_i}$ and by the consequence of the parity of $p_i-1$ among $q_j,$ there is a number 2 with a nonzero index. Then with $i\to\infty\ \ \frac{F(p_i)}{p_i}\to 0,\ \frac{p_i}{\log p_i}\log\frac{p_i}{p_i-1}\to 0$ and $\lambda_{p_i}\to 0.$

Therefore, $(c_{p_2}-c_2)\log 2\le 0$ which is impossible.

Using exactly the same method, we establish that there is a smallest among $c_p.$

Lemma 9. $F (n) = c\log n,$ where c is a constant.

Proof. It's enough to establish that all $c_p$ are equal.

Assume that such a prime number $p'$ is found, that $c_{p'}>c_2.$ We set a prime number $p,$ for which $c_p$ takes the greatest value. Then $c_p>c_2$ and $p>2.$

Let $m$ be a natural number and $q_1^{\alpha_1}...q_s^{\alpha_s}$ a canonical decomposition of $p^m-1$ . We consider $\begin{align*} \lambda_{p^m} &= F(p^m)-F(p^m-1) \\ &= F(p^m)-\frac{F(p^m)}{\log p^m} \log(p^m-1) + c_p\log(p^m-1)-F(p^m-1) \\ &= \frac{F(p^m)}{p^m}\cdot\frac{p^m}{\log p^m} \log \frac{p^m}{p^m-1} + \sum_{j=1}^s \alpha_j (c_p-c_{q_j})\log q_j \\ &\ge \frac{F(p^m)}{p^m}\cdot\frac{p^m}{\log p^m}\log\frac{p^m}{p^m-1} + (c_p-c_2)\log 2. \end{align*}$ Under the transition to the limit $m\to\infty$ we establish: $(c_p-c_2)\log 2\le 0,$ which is impossible. Therefore, $c_{p'}<c_2$ takes place for all $p'.$

Using exactly the same method, it's established that $c_{p'}\ge c_2$ for all $p'$ . The lemma is proven.

Theorem. $H(p_1,p_2,...,p_n) = c\sum_{i=1}^n p_i \log \frac{1}{p_i}\ \ \ (c>0).$

Proof. We first consider the case $n=2.$ Let $p = {r\over s}$ with r and s whole. We apply lemma 6 to $H\bigl(\frac{1}{s},...,\frac{1}{s}\bigr),$ by combining arguments into two groups with $r$ and $s-r$ elements. We get: $H\biggl(\frac{1}{s},...,\frac{1}{s}\biggr) = H\biggl(\frac{r}{s}, 1-\frac{r}{s}\biggr) + \frac{r}{s} H\biggl(\frac{1}{r},...,\frac{1}{r}\biggr) + \frac{s-r}{s} H\biggl(\frac{1}{s-r},...,\frac{1}{s-r}\biggr),$ Where $\begin{align*} H(p, 1-p) &= F(s) - pF(r)-(1-p)F(s-r) \\ &= c\log s - pc\log r - (1-p)c\log(s-r) \\ &= c\Bigl( p\log\frac{s}{r} + (1-p)\log\frac{s}{s-r}\Bigr) \\ &= c\Bigl( p\log\frac{1}{p} + (1-p)\log\frac{1}{1-p}\Bigr). \end{align*}$

By the consequence of the continuity of $H(p, 1-p)$ , the formula that we get can be also applied for irrational values of $p.$ $c > 0$ by the consequence of positivity of $H(p, 1-p)$ at least at one point.

The transition to the general case is realised by induction, based on axiom 3'.

Citations:

[1] A. Ya. Khinchin, On the concept of entropy in the theory of probabilities, Uspekhi Mat. Nauk, VIII, issue 3 (55), (1953).