Saturday, 15 November 2014

pr.probability - Constructing Bernoulli random variables with prescribed correlation

Here's a generalization of gowers's construction that is practical for Bernoulli RVs with $pne1/2$. You want to generate $n$ Bernoulli RVs, each taking on value 0 or 1, each with mean $p<1/2$. (For $p>1/2$, do as described below for $1-p$, then complement the results.) Let $d=sqrt{2}text{ erfc}^{-1}left(2pright)$. (This is just the inverse survival function for the standard normal distribution.) Take unit vectors $v_1,...v_n$ as before. Generate a random $n$-vector $z$ whose components are IID standard normal RVs. Let $B_i=1$ iff $zcdot v_i>d$. $zcdot v_i$ is standard normal, so obviously gives the desired mean of $p$.



What about correlations? As in gowers's construction, these depend uniquely on the angle between vectors $v_i$ and $v_j$. Let $c_{ij}$ be the coincidence frequency between $B_i$ and $B_j$, i.e., the frequency with which both are 1, which is related to the correlation. If $theta_{ij}=cos^{-1}left(v_icdot v_jright)$, then



$$c_{ij}=int_d^infty Phileft(frac{ucostheta_{ij}-d}{sintheta_{ij}}right)phileft(uright)du$$



where $Phi(z)$ and $phi(z)$ are the standard normal CDF and PDF, respectively. $c$ decreases monotonically from $p$ at $theta=0$ to 0 at $theta=pi$. In a practical problem you'd probably want the inverse: you'd know $p$ and $c$ and want to get $theta$. I doubt that can be done other than numerically, but $c$ is a single function of two bounded variables $p$ and $theta$, so you can tabulate it numerically once and invert the interpolated function if you're going to be doing a lot of this.



Now you know what all dot products $v_icdot v_j=cos{theta_{ij}}$ need to be, it is simple to construct vectors at these angles. Let $v_1=left(1,0,...,0right)$. Then $v_2=left(costheta_{12},sintheta_{12},0,...,0right)$. For $v_3$, solve



$$pmatrix{v_{11}&v_{12}cr v_{21}&v_{22}}pmatrix{v_{31}cr v_{32}}=pmatrix{1&0cr costheta_{12}&sintheta_{12}}pmatrix{v_{31}cr v_{32}}=pmatrix{costheta_{13}crcostheta_{23}}$$



... then let $v_{33}=sqrt{1-v_{31}^2-v_{32}^2}$. Continue to generate the rest of the $v_i$. Since the matrix at every stage is lower triangular, the solution is unique as long as the diagonal is positive. The construction fails only if the norm of the first $i-1$ components of $v_i$ is $ge1$. I'm going to speculate that that occurs only if you give it a set of impossible coincidence frequencies (for instance, $c_{12}=c_{13}=p$, $c_{23}=0$), but I haven't attempted to show that.



Edit: Nope, I was too optimistic. For instance, if you have three mutually exclusive Bernoulli RVs with $ple1/3$, which is clearly possible, this construction fails. Alas.

No comments:

Post a Comment