About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
November 2005
<<October November December>>
Answer:
Fix p and let x(n) be the probability that a random binary tree, T, selected (without throwing out infinite trees) has depth at least n. Then x(n+1)=2*p*x(n) - (p*x(n))**2 (by the principle of inclusion and exclusion since if T is has depth at least n+1 then some subtree of a son of the root of T has depth at least n but we don't want to double count the case where the root of T has two sons and each is the root of a subtree of depth at least n). We are interested in the behavior of x(n) as n goes to infinity. Clearly x(n) has a limit as it is non-increasing and bounded below by 0. Let this limit be x. Then x satisfies the equation x=2*p*x - (p*x)**2. The equation has two solutions x=0 or x=(2*p-1)/(p*p). It is easy to show the limit is equal to the first solution, x=0, for 0<p<.5, and that the limit is equal to the second solution, x=(2*p-1)/(p*p) for .5<p<1. Clearly x amounts to the probability that a random binary tree has infinite size. So 1-x is the probability that it is finite. This is 1 for 0<p<.5 and ((1-p)/p)**2 for .5<p<1.
Now we can compute the expected size, s, of a random binary tree (with the removal of infinite trees). Suppose first 0<p<.5. There are 2**n vertices at depth n which could be included in a random binary tree. Each will be included if each of the n edges in the path to the root is chosen. This will occur with probability p**n. So the expected number of vertices at depth n is (2*p)**n. So summing over all depths s=1+2*p+(2*p)**2+...+(2*p)**n+... . This is a simple geometric series. Since p<.5, 2*p<1. So the sum converges and s=1/(1-2*p).
Next suppose .5<p<1. We have the following equation for s by expressing the expected size of a random binary tree, T, in terms of the expected sizes of the subtrees whose roots are the sons of the root of T.
s = ((1-p)**2 + 2*p*(1-p)*q*(1+s) + p*p*q*q*(1+2*s))/
((1-p)**2 + 2*p*(1-p)*q + p*p*q*q )
Here q = ((1-p)/p)**2 which as shown above is the probability that a random binary tree is finite. The equation for s has three terms depending on how many edges out of the root are retained. So for example the (1-p)**2 term corresponds to the case where T consists solely of the root. The factors of q account for the fact that we are throwing out the cases where a subtree (and therefore T) is infinite. The divisor, d, is included to account for the fact that we are only averaging over the cases where T is finite. Note by substituting for q and simplifying we have d=((1-p)/p)**2=q as expected. Substituting into the equation for s we obtain
s = p**2 + 2*p*(1-p)*(1+s) + ((1-p)**2)*(1+2*s)
= 1 + s*(2*p*(1-p) + 2*(1-p)**2)
= 1 + s*2*(1-p)
so solving for s we have s=1/(2*p-1). This argument is not entirely rigorous as we are assuming s exists. However I believe it could be made rigorous by defining s(n) as the expected number of vertices in the random tree within distance n from the root and then letting n go to infinity.
An alternative approach for .5<p<1 is to show that the distribution of random binary trees is the same as for q=1-p. For consider any finite binary tree with m edges. Any such tree will have m+2 places where an edge was not selected (else the tree would be bigger). Hence the tree will be chosen (without throwing out infinite trees) with probability (p**m)*((1-p)**(m+2)). But this can be rewritten as ((1-p)**m)*(p**(m+2))*(((1-p)/p)**2). So the distribution of finite trees for p is the same as the distribution of finite trees for q=1-p multiplied by the constant factor ((1-p)/p)**2. The sum of the probabilities over all finite trees will be 1 for q. Hence it will be ((1-p)/p)**2 for p. This agrees with the argument above that ((1-p)/p)**2 is the probability that the random binary tree is finite. So when we exclude infinite trees the constant multiplier in the distribution for p will disappear making the distribution of random binary trees exactly the same as for q=1-p. This means the expected size will be 1/(1-2*q) = 1/(2*p-1) as above.
If you have any problems you think we might enjoy, please send them in. All replies should be sent to: ponder@il.ibm.com