This is related to How to define an n-variate empirical distribution function probability for any n?

and

RandomVariate from 2-dimensional probability distribution

but I don’t think neither questions (and their answers) answer the question below. (The first one constructs the problem from data and not from a priori specified probability weights, and hence they can use NProbability; and the second one seems like an extreme overkill for something like a simple discrete distribution where one doesn’t really need to kick out random number generators).

I want to construct a multivariate discrete distribution so that I can use the full functionality of RandomVariate and things of that sort.

In 1-dimension, I can use EmpiricalDistribution. For instance, for a X∼Bernoulli(p)X∼Bernoulli(p)X \sim Bernoulli(p) with p=1/2p=1/2p = 1/2, it is simply

gdist = EmpiricalDistribution[{0.5, 0.5} -> {0, 1}]

and from this, I can go on to compute mean and variances via Expectation, say

Expectation[ 2*x + 1, x \[Distributed] gdist]

as afforded by RandomVariate and all its friends.

Question: How does one do that for a multivariate discrete distribution (whether via EmpiricalDistribution or not)? That is, suppose we consider,

(X,Y)={(1,0),with probability p10(0,1),p01(0,0),p00(1,1),p11.,

(X,Y) =

\begin{equation}

\begin{cases}

(1,0), \quad \text{with probability } p_{10} \\

(0,1), \quad p_{01} \\

(0,0), \quad p_{00} \\

(1,1), \quad p_{11}.

\end{cases}

\end{equation},

where of course p10+p01+p00+p11=1p_{10} + p_{01} + p_{00} + p_{11} = 1 are the probability weights. How does one implement the above distribution, say labelled as gmultdist so that we can compute Expectation[ 2*x + 3*y, {x,y} \[Distributed] gmultdist]?

=================

=================

2 Answers

2

=================

The weghts do not have to be equal or even numerical.

$Version

(* “10.2.0 for Mac OS X x86 (64-bit) (July 7, 2015)” *)

Format[p[x_, y_]] := Subscript[p, Row[{x, y}]]

assume = {Thread[0 <= {p[0, 0], p[0, 1], p[1, 0], p[1, 1]} <= 1], p[0, 0] + p[0, 1] + p[1, 0] + p[1, 1] == 1} // Flatten; gmultdist = EmpiricalDistribution[{p[0, 0], p[0, 1], p[1, 0], p[1, 1]} -> {{0, 0}, {0, 1}, {1, 0}, {1, 1}}];

Using a replacement rule for the total probability gives the expected form of the result.

PDF[gmultdist, #] & /@ {{0, 0}, {0, 1}, {1, 0}, {1, 1}} /.

p[0, 0] + p[0, 1] + p[1, 0] + p[1, 1] -> 1

However, using assumptions gives the result in terms of three independent variables rather than the shorter form wth four variables.

Assuming[assume, (PDF[gmultdist, #] & /@ {{0, 0}, {0, 1}, {1, 0}, {1, 1}}) //

Simplify]

Assuming[assume, (Probability[

x == #[[1]] && y == #[[2]], {x, y} \[Distributed] gmultdist] & /@ {{0,

0}, {0, 1}, {1, 0}, {1, 1}}) // Simplify]

This is a better solution than mine 🙂

– user32416

Aug 23 ’15 at 19:24

I’d subsequently realized that I could simply just do

gmultdist = EmpiricalDistribution[ {0.25, 0.25, 0.25, 0.25} -> { {0, 0}, {0, 1}, {1, 0}, {1, 1} } ]

for the case of all equal weights. My original mistake was in “flipping” the order of -> initially in EmpiricalDistribution.