In a given population, we partition it into two groups $A$ and $B$, where $a$ and $b$ are the number of people in each group. We then sample $n$ people without replacement.

Let $X$ be the Random variable that equals the number of people from group $A$. Then $X$ is said to have a **Hypergeometric distribution** with parameters $a$, $b$, and $n$. Textbook writes $X∼HGeom(a,b,n)$.

Note that we are sampling *without* replacement to get our group of $n$ people.

What distribution would

`X`

have if we sampledwithreplacement?We can show this by considering that the number of samples is $n$, and the success rate (picking a person from group $A$) is $a+ba $. In addition, each of the $n$ trials are independent Bernoulli trials.

Then we can say $X∼Bin(n,a+ba )$.

## PMF

Each sample size of $n$ is a valid outcome in our experiment. We can count the number of outcomes, which is given by $(na+b )$. They’re all equally likely.

To find $P(X=k)$, we need to count the number of ways to choose exactly $k$ people from group $A$ out of the $n$ sample size.

- $(ka )$ ways to choose $k$ people from group $A$.
- $(n−kb )$ ways to choose the remaining $n−k$ people from group $B$.

Therefore, we have

$P(X=k)=(na+b )(ka )(n−kb ) $for integers $0≤k≤w$ and $0≤n−k≤b$.

## Expectation

We can use Indicator variables for Hypergeometric distributions as well. Given $X∼HGeom(a,b,n)$, we let $X$ be the sum of $n$ indicator variables, where $I_{i}=1$ if the $i$th person is included in the sample.

$XE(X) =i=1∑n I_{i}=i=1∑n E(I_{i})=n⋅(a+ba )=a+bna $This is assuming that $X$ equals the number of people from group $A$ in our sample of $n$. Without loss of generality this applies for counting $b$ as well.