November 2005
<<October November December>>
Answer:
Fix p and let x(n) be the probability that a random binary tree, T, selected (without throwing out infinite trees) has depth at least n. Then x(n+1)=2*p*x(n) - (p*x(n))**2 (by the principle of inclusion and exclusion since if T is has depth at least n+1 then some subtree of a son of the root of T has depth at least n but we don't want to double count the case where the root of T has two sons and each is the root of a subtree of depth at least n). We are interested in the behavior of x(n) as n goes to infinity. Clearly x(n) has a limit as it is non-increasing and bounded below by 0. Let this limit be x. Then x satisfies the equation x=2*p*x - (p*x)**2. The equation has two solutions x=0 or x=(2*p-1)/(p*p). It is easy to show the limit is equal to the first solution, x=0, for 0<p<.5, and that the limit is equal to the second solution, x=(2*p-1)/(p*p) for .5<p<1. Clearly x amounts to the probability that a random binary tree has infinite size. So 1-x is the probability that it is finite. This is 1 for 0<p<.5 and ((1-p)/p)**2 for .5<p<1.
Now we can compute the expected size, s, of a random binary tree (with the removal of infinite trees). Suppose first 0<p<.5. There are 2**n vertices at depth n which could be included in a random binary tree. Each will be included if each of the n edges in the path to the root is chosen. This will occur with probability p**n. So the expected number of vertices at depth n is (2*p)**n. So summing over all depths s=1+2*p+(2*p)**2+...+(2*p)**n+... . This is a simple geometric series. Since p<.5, 2*p<1. So the sum converges and s=1/(1-2*p).
Next suppose .5<p<1. We have the following equation for s by expressing the expected size of a random binary tree, T, in terms of the expected sizes of the subtrees whose roots are the sons of the root of T.
s = ((1-p)**2 + 2*p*(1-p)*q*(1+s) + p*p*q*q*(1+2*s))/
((1-p)**2 + 2*p*(1-p)*q + p*p*q*q )
Here q = ((1-p)/p)**2 which as shown above is the probability that a random binary tree is finite. The equation for s has three terms depending on how many edges out of the root are retained. So for example the (1-p)**2 term corresponds to the case where T consists solely of the root. The factors of q account for the fact that we are throwing out the cases where a subtree (and therefore T) is infinite. The divisor, d, is included to account for the fact that we are only averaging over the cases where T is finite. Note by substituting for q and simplifying we have d=((1-p)/p)**2=q as expected. Substituting into the equation for s we obtain
s = p**2 + 2*p*(1-p)*(1+s) + ((1-p)**2)*(1+2*s)
= 1 + s*(2*p*(1-p) + 2*(1-p)**2)
= 1 + s*2*(1-p)
so solving for s we have s=1/(2*p-1). This argument is not entirely rigorous as we are assuming s exists. However I believe it could be made rigorous by defining s(n) as the expected number of vertices in the random tree within distance n from the root and then letting n go to infinity.
An alternative approach for .5<p<1 is to show that the distribution of random binary trees is the same as for q=1-p. For consider any finite binary tree with m edges. Any such tree will have m+2 places where an edge was not selected (else the tree would be bigger). Hence the tree will be chosen (without throwing out infinite trees) with probability (p**m)*((1-p)**(m+2)). But this can be rewritten as ((1-p)**m)*(p**(m+2))*(((1-p)/p)**2). So the distribution of finite trees for p is the same as the distribution of finite trees for q=1-p multiplied by the constant factor ((1-p)/p)**2. The sum of the probabilities over all finite trees will be 1 for q. Hence it will be ((1-p)/p)**2 for p. This agrees with the argument above that ((1-p)/p)**2 is the probability that the random binary tree is finite. So when we exclude infinite trees the constant multiplier in the distribution for p will disappear making the distribution of random binary trees exactly the same as for q=1-p. This means the expected size will be 1/(1-2*q) = 1/(2*p-1) as above.
If you have any problems you think we might enjoy, please send them in. All replies should be sent to: ponder@il.ibm.com