By Alfred Renyi

Now let's do the decoding using the principle of majority: if r^s+l, then the signal repeated 2^+1 times will be taken as 1, while if r^s, it will be taken to be 0. ) Even in this way, it is possible to be mistaken when decoding, but the probability of this can be made as small as we want, if 0

_i the number PN—PN+PN+I- Now we have a probability distribution of N elements and, as assumed previously, we know how to construct a primitive code of minimal average word length or its code tree, having numbers Pi, ••••>PN-I,P% assigned to its N terminal nodes. On this tree, let's branch out two new branches from the node with p'^, and put the numbers p^y and p,si+i at the two terminal nodes. ,PN,PN+I) distribution. An example can make this process crystal clear. Let N=5 and the probabihties of the messages be the following: Pi 1 3' 1 1 1 1 ' 5 ' P.

That is why i(^, tj) is usually called the relative information of^ and t]. The lecturer made a very interesting theoretical comment about this last characteristic. He said that the deep cause of the equality of I(^,t])=I(ri, 0 is as follows: if we investigate two entities that are random and to a certain extent dependent on each other, then we cannot by using information theory deduce which of the two is the cause and which is the effect in their relationship. The only thing that can be established is how close their dependence is.