back to arrowtheory.com

## On the concept of entropy of a finite probabilistic scheme

English translation by Arina Zinovyeva.

Section 1. A. Ya. Khinchin's article [1] established that the entropy $H(p_1 , p_2 , ..., p_n )$ of a finite probabilistic scheme $(A_1 , A_2 , ..., A_n )$ can be characterised by the following system of axioms.

1. $H(p_1 , p_2 , ..., p_n )$ is a continuous function of $p_1 , p_2 , ..., p_n$ in the domain $0\le p_i\le 1, p_1 + ... + p_n = 1$.

2. $H(p_1 , p_2 , ..., p_n )$ is a symmetric function of $p_1 , p_2 , ..., p_n$ .

3. $H(p_1 , p_2 , ..., p_n , 0) = H(p_1 , p_2 , ..., p_n )$.

4. $H(q_{11} , ...,q_{1m} ; q_{21} , ... , q_{2m} ; ...; q_{n1} , ... , q_{nm} ) = H(p_1 , p_2 , ..., p_n ) + \sum_{i=1}^n p_i H(\frac{q_{i1}}{p_i},...,\frac{q_{im}}{p_i})$

where (this axiom is formulated in the article [1] in other terms).

5. $H(p_1 , p_2 , ..., p_n ) \le H(\frac{1}{n},\frac{1}{n},..., \frac{1}{n}).$

Entropy is defined by these axioms uniquely up to a positive multiplier. The aim of this note is a simplification of this system of axioms. Namely, we offer the following axioms:

1'. $H(p, 1-p)$ is continuous, with $0 \le p \le1,$ and positive at least at one point.

2'. $H(p_1 , p_2 , ..., p_n )$ is a symmetric function of $p_1 , p_2 , ..., p_n$ .

3'. With $n \ge 2,$ Here $p_n = q_1 + q_2$.

The difference between these two systems of axioms is, first of all, axiom 5 (extremality) is replaced with the requirement of positivity of the entropy at one point, and second of all, axioms 3 and 4 are replaced with axiom 3' that is a very natural one if we consider entropy as a measure of the "indefiniteness" of a scheme. Truly, the "indefiniteness" of a scheme is different from the "indefiniteness" of the scheme occurring from dividing events $A_n$ into two sub-events $B_1$, $B_2$ with conditional probabilities $\frac{q_1}{p_n},\frac{q_2}{p_n}$. This "indefiniteness" is overcome only when the event $A_n$ is realised, the probability of which equals $p_n$ .

Section 2. First of all, we establish that axioms 2, 3, 4 are equivalent to 2', 3'.

Lemma 1. From 2, 3, 4, we conclude $H(1, 0) = 0.$

Proof. We calculate $H(\frac{1}{2}, \frac{1}{2}, 0, 0)$ with two methods. On one hand, by the consequence of 3, we have $H(\frac{1}{2}, \frac{1}{2}, 0, 0) = H(\frac{1}{2}, \frac{1}{2} ).$ On the other hand, $H(\frac{1}{2}, \frac{1}{2}, 0, 0) = H(\frac{1}{2}, 0, \frac{1}{2}, 0)$ and by the consequence of 4, we get $H(\frac{1}{2}, 0, \frac{1}{2}, 0) = H(\frac{1}{2}, \frac{1}{2})+\frac{1}{2}H(1, 0) + \frac{1}{2}H(1, 0).$ Therefore, $H(1, 0)=0.$

Lemma 2. From 2, 3, 4 we conclude 3'.

Proof. $H(p_1, 0; ...; p_{n-1}, 0; q_1, q_2 )$ equals, by the consequence of 2 and 3, $H(p_1, 0; ... p_{n-1}, 0; q_1, q_2 )$, which equals, by the consequence of 4, $H(p_1, ..., p_{n-1}, p_n ) + p_1 H(1, 0) + ... + p_{n-1} H(1, 0) + p_n H(\frac{q_1}{p_n}, \frac{q_2}{p_n}) = H(p_1, ..., p_{n-1}, p_n ) + p_n H(\frac{q_1}{p_n}, \frac{q_2}{p_n}).$

Lemma 3. From 2', 3', we conclude that $H(1, 0) = 0.$

Proof. We calculate with two methods $H(\half, \half, 0).$ On one hand, by the consequence of 3', we have $H(\half; \half, 0) = H(\half, \half ) + \half H(1, 0).$ On the other hand, $H(\half, \half, 0) = H(0;\half,\half) = H(0, 1) + H(\half, \half ).$ Therefore, $H(1, 0) = 0.$

Lemma 4. From 2', 3' we conclude 3.

Proof. $H(p_1, ... p_{n-1}; p_n, 0) = H(p_1,...,p_{n-1}, p_n) + p_n H(1, 0) = H(p_1, ..., p_{n-1}, p_n ).$

Lemma 5. From 3', we conclude that: Here $n\ge2$, $p_n = q_1 + ... + q_m.$

Proof. With $m=2$ the statement of lemma is the same as axiom 3', with $m\ge3$ the lemma is easily proven by induction on $m$.

Lemma 6. From 2', 3', we conclude that: Here $p_i = q_{i1} + ... + q_{im_i}.$

Proof. We need to apply lemma 5, $n$ times to each group of arguments, which is possible by the consequence of symmetry. Setting $m_1 = m_2 = ... = m_n = m,$ we get that from 2', 3' we can conclude 4.

So, we have established the equivalence of axiom groups 2, 3, 4 and 2', 3'.

Section 3. Assume, following the article [1], $F(n) = H\Bigl(\frac{1}{n},\frac{1}{n},..., \frac{1}{n}\Bigr)$ with $n\ge2$ and $F(1) = 0.$ Applying lemma 6 to the case $m_1 =... =m_n=m , q_{ij} = \frac{1}{mn}$ we get with $m\ge 2, n\ge 2.$ With $m =1$ or $n=1,$ equation (1) is trivial. Further, applying lemma 5 to $H\bigl(\frac{1}{n};\frac{1}{n},...,\frac{1}{n}\bigr)$, we get from that we conclude:

Lemma 7. With $n \to \infty,$

Proof. By the consequence of the continuity of $H(p,1-p)$ with $p=\eta_n=H\bigl(\frac{1}{n}, \frac{n-1}{n}\bigr)\to H(0,1)=0.$

Furthermore, $n\eta_n=nF(n)-(n-1)F(n-1),$ where and But $\frac{2}{n(n+1)}\sum_{k=1}^n k\eta_k$ is an arithmetic average of the first $\frac{n(n+1)}{2}$ members of the sequence $\eta_1,\eta_2,\eta_2,\eta_3,\eta_3,\eta_3,...$ which tends to zero. Therefore, $\frac{2}{n(n+1)}\sum_{k=1}^n k\eta_k\to 0$ and $\mu_n=\frac{F(n)}{n}\to 0.$ Furthermore, $\lambda_n=F(n)-F(n-1)=\eta_n-\frac{1}{n}F(n-1)\to 0.$ The lemma is proven.

By the consequence of equation (1), $F(n)$ will be defined for all natural $n$, as soon as we set the values of the $F$ function on prime numbers. Precisely, if $n=p_1^{\alpha_1},...,p_s^{\alpha_s}$ where $p_1,...,p_s$ are prime, then $F(n)=\alpha_1 F(p_1)+...+\alpha_s F(p_s).$ Assuming $F(p)=c_p \log p$ Then

Lemma 8. There is a greatest number among $c_p (p=2, 3, 5, 7, ...)$.

Proof. We assume the opposite and show that this assumption will contradict the assumption about the continuity of $H(p, 1-p)$ at $p=0.$ It's true, if there is no greatest number in the sequence $c_p (p=2,3,5,7,...),$ then we can build an infinite sequence of prime numbers $p_1, p_2,...,p_i$ so that $p_1=2,$ $p_i$ is the smallest prime number such that $c_{p_i} > c_{p_{i-1}}.$ It is clear from the method of building this sequence that when a prime number $q$ is smaller than $p_i$, then $c_q\le c_{p_i}.$

Let $i>1$ and $q_1^{\alpha_1}...q_s^{\alpha_s}$ be a canonical decomposition of the number $p_i-1$. Consider It is clear that because $q_j and by the consequence of the parity of $p_i-1$ among $q_j,$ there is a number 2 with a nonzero index. Then with $i\to\infty\ \ \frac{F(p_i)}{p_i}\to 0,\ \frac{p_i}{\log p_i}\log\frac{p_i}{p_i-1}\to 0$ and $\lambda_{p_i}\to 0.$

Therefore, $(c_{p_2}-c_2)\log 2\le 0$ which is impossible.

Using exactly the same method, we establish that there is a smallest among $c_p.$

Lemma 9. $F (n) = c\log n,$ where c is a constant.

Proof. It's enough to establish that all $c_p$ are equal.

Assume that such a prime number $p'$ is found, that $c_{p'}>c_2.$ We set a prime number $p,$ for which $c_p$ takes the greatest value. Then $c_p>c_2$ and $p>2.$

Let $m$ be a natural number and $q_1^{\alpha_1}...q_s^{\alpha_s}$ a canonical decomposition of $p^m-1$. We consider Under the transition to the limit $m\to\infty$ we establish: which is impossible. Therefore, $c_{p'} takes place for all $p'.$

Using exactly the same method, it's established that $c_{p'}\ge c_2$ for all $p'$. The lemma is proven.

Theorem.

Proof. We first consider the case $n=2.$ Let $p = {r\over s}$ with r and s whole. We apply lemma 6 to $H\bigl(\frac{1}{s},...,\frac{1}{s}\bigr),$ by combining arguments into two groups with $r$ and $s-r$ elements. We get: Where

By the consequence of the continuity of $H(p, 1-p)$, the formula that we get can be also applied for irrational values of $p.$ $c > 0$ by the consequence of positivity of $H(p, 1-p)$ at least at one point.

The transition to the general case is realised by induction, based on axiom 3'.

Citations:

[1] A. Ya. Khinchin, On the concept of entropy in the theory of probabilities, Uspekhi Mat. Nauk, VIII, issue 3 (55), (1953).