### Balls and Bins: A Study in Negative Dependence

#### Abstract

This paper investigates the notion of negative dependence amongst random variables and attempts to advocate its use as a simple and unifying paradigm for

the analysis of random structures and algorithms.

The assumption of independence between random variables is often very convenient for the several reasons. Firstly, it makes analyses and calculations much simpler. Secondly, one has at hand a whole array of powerful mathematical concepts and tools from classical probability theory for the analysis, such as laws of large numbers, central limit theorems and large deviation bounds which are usually derived under the assumption of independence. Unfortunately, the analysis of most randomized algorithms involves random variables that are not independent. In this case, classical tools from standard probability

theory like large deviation theorems, that are valid under the assumption of independence between the random variables involved, cannot be used as such. It is

then necessary to determine under what conditions of dependence one can still use the classical tools.

It has been observed before [32, 33, 38, 8], that in some situations, even though the variables involved are not independent, one can still apply some of the standard

tools that are valid for independent variables (directly or in suitably modified form), provided that the variables are dependent in specific ways. Unfortunately, it

appears that in most cases somewhat ad hoc strategems have been devised, tailored to the specific situation at hand, and that a unifying underlying theory that delves

deeper into the nature of dependence amongst the variables involved is lacking. A frequently occurring scenario underlying the analysis of many randomised

algorithms and processes involves random variables that are, intuitively, dependent in the following negative way: if one subset of the variables is "high" then a disjoint

subset of the variables is "low". In this paper, we bring to the forefront and systematize some precise notions of negative dependence in the literature, analyse

their properties, compare them relative to each other, and illustrate them with several applications.

One specific paradigm involving negative dependence is the classical "balls and bins" experiment. Suppose we throw m balls into n bins independently at random.

For i in [n], let Bi be the random variable denoting the number of balls in the ith bin. We will often refer to these variables as occupancy numbers. This is a

classical probabilistic paradigm [16, 22, 26] (see also [31, sec. 3.1]) that underlies the

analysis of many probabilistic algorithms and processes. In the case when the balls

are identical, this gives rise to the well-known multinomial distribution [16, sec VI.9]: there are m repeated independent trials (balls) where each trial (ball) can result in one of the outcomes E1, ..., En (bins). The probability of the realisation of event Ei is pi for i in [n] for each trial. (Of course the probabilities are subject to the condition

Sum_i pi = 1.) Under the multinomial distribution, for any integers

m1, ..., mn such that

Sum_i mi = m the probability that for each i in [n], event Ei

occurs mi times is m!

m1! : : :mn!pm1

1 : : :pmn

n :

The balls and bins experiment is a generalisation of the multinomial distribution:

in the general case, one can have an arbitrary set of probabilities for each ball: the

probability that ball k goes into bin i is pi;k, subject only to the natural restriction

that for each ball k,

P

i pi;k = 1. The joint distribution function correspondingly

has a more complicated form.

A fundamental natural question of interest is: how are these Bi related? Note

that even though the balls are thrown independently of each other, the Bi variables are not independent; in particular, their sum is fixed to m. Intuitively, the Bi's

are negatively dependent on each other in the manner described above: if one set

of variables is "high", a disjoint set is "low". However, establishing such assertions

precisely by a direct calculation from the joint distribution function, though possible

in principle, appears to be quite a formidable task, even in the case where the balls are assumed to be identical.

One of the major contributions of this paper is establishing that the the Bi are negatively dependent in a very strong sense. In particular, we show that the Bi variables satisfy negative association and negative regression, two strong notions of negative dependence that we define precisely below. All the intuitively obvious assertions of negative dependence in the balls and bins experiment follow as easy corollaries. We illustrate the usefulness of these results by showing how to streamline and simplify many existing probabilistic analyses in literature.

the analysis of random structures and algorithms.

The assumption of independence between random variables is often very convenient for the several reasons. Firstly, it makes analyses and calculations much simpler. Secondly, one has at hand a whole array of powerful mathematical concepts and tools from classical probability theory for the analysis, such as laws of large numbers, central limit theorems and large deviation bounds which are usually derived under the assumption of independence. Unfortunately, the analysis of most randomized algorithms involves random variables that are not independent. In this case, classical tools from standard probability

theory like large deviation theorems, that are valid under the assumption of independence between the random variables involved, cannot be used as such. It is

then necessary to determine under what conditions of dependence one can still use the classical tools.

It has been observed before [32, 33, 38, 8], that in some situations, even though the variables involved are not independent, one can still apply some of the standard

tools that are valid for independent variables (directly or in suitably modified form), provided that the variables are dependent in specific ways. Unfortunately, it

appears that in most cases somewhat ad hoc strategems have been devised, tailored to the specific situation at hand, and that a unifying underlying theory that delves

deeper into the nature of dependence amongst the variables involved is lacking. A frequently occurring scenario underlying the analysis of many randomised

algorithms and processes involves random variables that are, intuitively, dependent in the following negative way: if one subset of the variables is "high" then a disjoint

subset of the variables is "low". In this paper, we bring to the forefront and systematize some precise notions of negative dependence in the literature, analyse

their properties, compare them relative to each other, and illustrate them with several applications.

One specific paradigm involving negative dependence is the classical "balls and bins" experiment. Suppose we throw m balls into n bins independently at random.

For i in [n], let Bi be the random variable denoting the number of balls in the ith bin. We will often refer to these variables as occupancy numbers. This is a

classical probabilistic paradigm [16, 22, 26] (see also [31, sec. 3.1]) that underlies the

analysis of many probabilistic algorithms and processes. In the case when the balls

are identical, this gives rise to the well-known multinomial distribution [16, sec VI.9]: there are m repeated independent trials (balls) where each trial (ball) can result in one of the outcomes E1, ..., En (bins). The probability of the realisation of event Ei is pi for i in [n] for each trial. (Of course the probabilities are subject to the condition

Sum_i pi = 1.) Under the multinomial distribution, for any integers

m1, ..., mn such that

Sum_i mi = m the probability that for each i in [n], event Ei

occurs mi times is m!

m1! : : :mn!pm1

1 : : :pmn

n :

The balls and bins experiment is a generalisation of the multinomial distribution:

in the general case, one can have an arbitrary set of probabilities for each ball: the

probability that ball k goes into bin i is pi;k, subject only to the natural restriction

that for each ball k,

P

i pi;k = 1. The joint distribution function correspondingly

has a more complicated form.

A fundamental natural question of interest is: how are these Bi related? Note

that even though the balls are thrown independently of each other, the Bi variables are not independent; in particular, their sum is fixed to m. Intuitively, the Bi's

are negatively dependent on each other in the manner described above: if one set

of variables is "high", a disjoint set is "low". However, establishing such assertions

precisely by a direct calculation from the joint distribution function, though possible

in principle, appears to be quite a formidable task, even in the case where the balls are assumed to be identical.

One of the major contributions of this paper is establishing that the the Bi are negatively dependent in a very strong sense. In particular, we show that the Bi variables satisfy negative association and negative regression, two strong notions of negative dependence that we define precisely below. All the intuitively obvious assertions of negative dependence in the balls and bins experiment follow as easy corollaries. We illustrate the usefulness of these results by showing how to streamline and simplify many existing probabilistic analyses in literature.

#### Full Text:

PDFDOI: http://dx.doi.org/10.7146/brics.v3i25.20006

ISSN: 0909-0878

Hosted by the State and University Library and Aarhus University Library