Confidence Interval for the Difference of Two Population Proportions

Formula for confidence interval for difference of two proportions. C.K. Taylor

By

Updated on March 06, 2017

Confidence intervals are one part of inferential statistics. The basic idea behind this topic is to estimate the value of an unknown population parameter by using a statistical sample. We can not only estimate the value of a parameter, but we can also adapt our methods to estimate the difference between two related parameters. For example we may want to find the difference in the percentage of the male U.S. voting population who supports a particular piece of legislation compared to the female voting population.

We will see how to do this type of calculation by constructing a confidence interval for the difference of two population proportions. In the process we will examine some of the theory behind this calculation. We will see some similarities in how we construct a confidence interval for a single population proportion as well as a confidence interval for the difference of two population means.

Generalities

Before looking at the specific formula that we will use, let's consider the overall framework that this type of confidence interval fits into. The form of the type of confidence interval that we will look at is given by the following formula:

Estimate +/- Margin of Error

Many confidence intervals are of this type. There are two numbers that we need to calculate. The first of these values is the estimate for the parameter. The second value is the margin of error. This margin of error accounts for the fact that we do have an estimate. The confidence interval provides us with a range of possible values for our unknown parameter.

Conditions

We should make sure that all of the conditions are satisfied before doing any calculation. To find a confidence interval for the difference of two population proportions, we need to make sure that the following hold:

We have two simple random samples from large populations. Here "large" means that the population is at least 20 times larger than the size of the sample. The sample sizes will be denoted by n₁ and n₂.
Our individuals have been chosen independently of one another.
There are at least ten successes and ten failures in each of our samples.

Samples and Population Proportions

Now we are ready to construct our confidence interval. We start with the estimate for the difference between our population proportions. Both of these population proportions are estimated by a sample proportion. These sample proportions are statistics that are found by dividing the number of successes in each sample, and then dividing by the respective sample size.

The first population proportion is denoted by p₁. If the number of successes in our sample from this population is k₁, then we have a sample proportion of k₁ / n_1.

We denote this statistic by p̂₁. We read this symbol as "p₁-hat" because it looks like the symbol p₁ with a hat on top.

In a similar way we can calculate a sample proportion from our second population. The parameter from this population is p₂. If the number of successes in our sample from this population is k₂, and our sample proportion is p̂₂= k₂ / n_2.

These two statistics become the first part of our confidence interval. The estimate of p₁ is p̂₁. The estimate of p₂ is p̂_2.So the estimate for the difference p₁ - p₂ is p̂₁- p̂_2.

Sampling Distribution of the Difference of Sample Proportions

Next we need to obtain the formula for the margin of error. To do this we will first consider the sampling distribution of p̂₁. This is a binomial distribution with probability of success p₁ and n₁ trials. The mean of this distribution is the proportion p₁. The standard deviation of this type of random variable has variance of p₁(1 - p₁)/n₁.

The sampling distribution of p̂₂is similar to that of p̂₁. Simply change all of the indices from 1 to 2 and we have a binomial distribution with mean of p₂and variance of p₂(1 - p₂)/n₂.

We now need a few results from mathematical statistics in order to determine the sampling distribution of p̂₁- p̂₂. The mean of this distribution is p₁ - p₂. Due to the fact that the variances add together, we see that the variance of the sampling distribution is p₁(1 - p₁)/n₁ + p₂(1 - p₂)/n_2.The standard deviation of the distribution is the square root of this formula.

There are a couple of adjustments that we need to make. The first is that the formula for the standard deviation of p̂₁- p̂₂ uses the unknown parameters of p₁and p₂. Of course if we really knew these values, then it would not be an interesting statistical problem at all. We would not need to estimate the difference between p₁and p_2..Instead we could simply calculate the exact difference.

This problem can be fixed by calculating a standard error rather than a standard deviation. All that we need to do is to replace the population proportions by sample proportions. Standard errors are calculated from upon statistics instead of parameters. A standard error is useful because it effectively estimates a standard deviation. What this means for us is that we no longer need to know the value of the parameters p₁ and p₂. .Since these sample proportions are known, the standard error is given by the square root of the following expression:

p̂₁(1 - p̂₁)/n₁ + p̂₂(1 - p̂₂)/n_2.

The second item that we need to address is the particular form of our sampling distribution. It turns out that we can use a normal distribution to approximate the sampling distribution of p̂₁- p̂₂. The reason for this is somewhat technical, but is outlined in the next paragraph.

Both p̂₁and p̂₂have a sampling distribution that is binomial. Each of these binomial distributions may be approximated quite well by a normal distribution. Thus p̂₁- p̂₂is a random variable. It is formed as a linear combination of two random variables. Each of these are approximated by a normal distribution. Therefore the sampling distribution of p̂₁- p̂₂is also normally distributed.

Confidence Interval Formula

We now have everything we need to assemble our confidence interval. The estimate is (p̂₁- p̂₂) and the margin of error is z* [p̂₁(1 - p̂₁)/n₁ + p̂₂(1 - p̂₂)/n_2.]^0.5. The value that we enter for z* is dictated by the level of confidence C. Commonly used values for z* are 1.645 for 90% confidence and 1.96 for 95% confidence. These values for z* denote the portion of the standard normal distribution where exactly C percent of the distribution is between -z* and z*.

The following formula gives us a confidence interval for the difference of two population proportions:

(p̂₁- p̂₂) +/- z* [p̂₁(1 - p̂₁)/n₁ + p̂₂(1 - p̂₂)/n_2.]^0.5

Format

mla apa chicago

Your Citation

Taylor, Courtney. "Confidence Interval for the Difference of Two Population Proportions." ThoughtCo, Apr. 5, 2023, thoughtco.com/difference-of-two-population-proportions-4061672. Taylor, Courtney. (2023, April 5). Confidence Interval for the Difference of Two Population Proportions. Retrieved from https://www.thoughtco.com/difference-of-two-population-proportions-4061672 Taylor, Courtney. "Confidence Interval for the Difference of Two Population Proportions." ThoughtCo. https://www.thoughtco.com/difference-of-two-population-proportions-4061672 (accessed April 18, 2024).