A 2-sample test of means is useful for determining questions such as, “Is the mean age of people who favor the legalization of marijuana lower than the mean age of people who oppose the legalization of marijuana?
For this article, I will adapt the previously stated question into a 2-sample test of means as an example of how this statistical analysis can be used to form a conclusion.
By using descriptive statistics in Microsoft Excel, I have summarized the sample results for the ages of the group that “favors legalization of marijuana,” and for the group that “opposes the legalization of marijuana.” For the 2-sample test of means, I will need to record the sample sizes, the mean age of two groups (denoted as u1 and u2), the standard deviation, and the 95% confidence interval estimates.
Sample 1 (n1) = 234 individuals favoring the legalization of marijuana and sample 2 (n2) = 767 oppose the legalization of marijuana.
The Means & Standard Deviation
From the data, I found the means to be u1 = 41.744 and u2 = 47.432. The standard deviation for the first group was a value of 14.242 and for the second group the value was 17.907.
95% Confidence Interval Estimate
To find the confidence interval estimate for two independent populations, I subtracted the second mean, 47.432, from the first mean, 41.744. Then I calculated my t-value (use Excel function TINV and enter .05 probability, and 233 for degrees of freedom). Next, I squared the first standard deviation and divided by sample size 1 and then squared the second standard deviation and divided by sample size 2. Then I took the square root of these two numbers added together.
-5.688 +/- 1.97(square root of 1.285)
= -5.688 +/- 2.2218
Therefore, I am 95% confident that the true mean difference in age between people who favor and oppose the legalization of marijuana can be captured within the interval of -7.9211
Always check your sample sizes (n1 and n2) to be n1 > 30 and n2 > 30. This means that your sample sizes are large enough to run a 2-sample test of means.
Seeing that both of our sample sizes, 234 and 767, are larger than 30, we need to decide whether or not to “pool” the data together. Pooling data before running a 2-sample test of means allows you to merge two similar sets of data. In order to pool, the sample sizes and the variances need to be fairly similar. In my example, we will not pool the data because the sample sizes are not similar.
The other assumption you must check involves the shape of your data. Is the data symmetrical? Does the data follow a normal, bell-shaped curve? You can check this assumption by creating a histogram or stem and leaf plot.
Now we can get back to business! It is time to draw out our two hypotheses: H0 and H1, which are the null hypothesis and alternative hypothesis, respectively. When we are comparing two means, H0 will be equal. H0: u1 – u2 = 0. Since we are trying to determine if the mean age of people who favor legalization of marijuana is lower than the mean age of people who oppose the legalization of marijuana, our alternate hypothesis will be H1: u1 – u2 > 0.
Test Statistic for a 2-sample t-Test
We will now need to calculate our t-statistic in order to determine our p-value.
T = (sample statistic – Null value)/Standard error
My sample statistic is the difference of my two means, -5.688. The null value is 0. For the standard error, I squared the first standard deviation and divided by sample size 1 and then squared the second standard deviation and divided by sample size 2. Then I took the square root of these two numbers added together. Sound familiar?
For my data set, the equation yielded a t-value of -5.018 (the value was negative because group 1 had a lower mean value than group 2).
I used the TDIST Function in Excel to get my p-value. I plugged in the absolute value of -5.018 (the function does not accept negative numbers) for the X field. I used 233 for degrees of freedom (smallest sample size -1) and I entered a 1 because this is a one-tailed test. My p-value equaled 0.000000518119, or 5.18119E-07. This is an extremely small p-value and therefore I will reject the null hypothesis and conclude that there is sufficient evidence to support H1.
There is enough evidence to conclude that the mean average age of individuals who favor the legalization of marijuana is lower than the mean average age of individuals who oppose the legalization of marijuana.