Balls and Bins: A Study in Negative Dependence

Devdatt P. Dubhashi, Desh Ranjan


This paper investigates the notion of negative dependence amongst random variables and attempts to advocate its use as a simple and unifying paradigm for
the analysis of random structures and algorithms.
The assumption of independence between random variables is often very convenient for the several reasons. Firstly, it makes analyses and calculations much simpler. Secondly, one has at hand a whole array of powerful mathematical concepts and tools from classical probability theory for the analysis, such as laws of large numbers, central limit theorems and large deviation bounds which are usually derived under the assumption of independence. Unfortunately, the analysis of most randomized algorithms involves random variables that are not independent. In this case, classical tools from standard probability
theory like large deviation theorems, that are valid under the assumption of independence between the random variables involved, cannot be used as such. It is
then necessary to determine under what conditions of dependence one can still use the classical tools.
It has been observed before [32, 33, 38, 8], that in some situations, even though the variables involved are not independent, one can still apply some of the standard
tools that are valid for independent variables (directly or in suitably modified form), provided that the variables are dependent in specific ways. Unfortunately, it
appears that in most cases somewhat ad hoc strategems have been devised, tailored to the specific situation at hand, and that a unifying underlying theory that delves
deeper into the nature of dependence amongst the variables involved is lacking. A frequently occurring scenario underlying the analysis of many randomised
algorithms and processes involves random variables that are, intuitively, dependent in the following negative way: if one subset of the variables is "high" then a disjoint
subset of the variables is "low". In this paper, we bring to the forefront and systematize some precise notions of negative dependence in the literature, analyse
their properties, compare them relative to each other, and illustrate them with several applications.
One specific paradigm involving negative dependence is the classical "balls and bins" experiment. Suppose we throw m balls into n bins independently at random.
For i in [n], let Bi be the random variable denoting the number of balls in the ith bin. We will often refer to these variables as occupancy numbers. This is a
classical probabilistic paradigm [16, 22, 26] (see also [31, sec. 3.1]) that underlies the
analysis of many probabilistic algorithms and processes. In the case when the balls
are identical, this gives rise to the well-known multinomial distribution [16, sec VI.9]: there are m repeated independent trials (balls) where each trial (ball) can result in one of the outcomes E1, ..., En (bins). The probability of the realisation of event Ei is pi for i in [n] for each trial. (Of course the probabilities are subject to the condition
Sum_i pi = 1.) Under the multinomial distribution, for any integers
m1, ..., mn such that
Sum_i mi = m the probability that for each i in [n], event Ei
occurs mi times is m!
m1! : : :mn!pm1
1 : : :pmn
n :
The balls and bins experiment is a generalisation of the multinomial distribution:
in the general case, one can have an arbitrary set of probabilities for each ball: the
probability that ball k goes into bin i is pi;k, subject only to the natural restriction
that for each ball k,
i pi;k = 1. The joint distribution function correspondingly
has a more complicated form.
A fundamental natural question of interest is: how are these Bi related? Note
that even though the balls are thrown independently of each other, the Bi variables are not independent; in particular, their sum is fixed to m. Intuitively, the Bi's
are negatively dependent on each other in the manner described above: if one set
of variables is "high", a disjoint set is "low". However, establishing such assertions
precisely by a direct calculation from the joint distribution function, though possible
in principle, appears to be quite a formidable task, even in the case where the balls are assumed to be identical.
One of the major contributions of this paper is establishing that the the Bi are negatively dependent in a very strong sense. In particular, we show that the Bi variables satisfy negative association and negative regression, two strong notions of negative dependence that we define precisely below. All the intuitively obvious assertions of negative dependence in the balls and bins experiment follow as easy corollaries. We illustrate the usefulness of these results by showing how to streamline and simplify many existing probabilistic analyses in literature.

Full Text:


This website uses cookies to allow us to see how the site is used. The cookies cannot identify you or any content at your own computer.

ISSN: 0909-0878 

Hosted by the State and University Library and Aarhus University Library