doc: M Research Method/Quantitative Research/Descriptive Statistics.md Social Psychology/Aggression.md Social Psychology/Altruism.md, A Research Method/Quantitative Research/Normal Distribution.md Research Method/Quantitative Research/Systematic Comparison of Student's t, Welch's t, and Mann-Whitney U Tests.md
382 lines
10 KiB
Markdown
382 lines
10 KiB
Markdown
---
|
|
Course:
|
|
tags:
|
|
- statistics
|
|
- hypothesis-testing
|
|
- t-test
|
|
- welch
|
|
- mann-whitney
|
|
- nonparametric
|
|
- parametric
|
|
- comparison
|
|
---
|
|
## 1. Overview and Purpose
|
|
|
|
This systematic note provides a comprehensive comparison of three commonly used statistical tests for comparing two independent groups: Student's t-test, Welch's t-test, and the Mann-Whitney U test. Each test serves different purposes and has specific assumptions and applications.
|
|
|
|
## 2. Quick Reference Table
|
|
|
|
| Test | Type | Key Assumptions | When to Use | Effect Size |
|
|
|------|------|----------------|-------------|-------------|
|
|
| **Student's t-test** | Parametric | Normality, equal variances, independence | Normal data with equal variances | Cohen's d |
|
|
| **Welch's t-test** | Parametric | Normality, independence | Normal data with unequal variances | Cohen's d |
|
|
| **Mann-Whitney U** | Nonparametric | Independence, ordinal/continuous data | Non-normal data, ordinal data | Rank-biserial correlation |
|
|
|
|
## 3. Detailed Test Characteristics
|
|
|
|
### 3.1. Student's t-test (Independent Samples)
|
|
|
|
**Definition**: A parametric test comparing means of two independent groups assuming equal population variances.
|
|
|
|
**Test Statistic**:
|
|
$$
|
|
t = \frac{\bar{X}_1 - \bar{X}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}
|
|
$$
|
|
|
|
Where:
|
|
|
|
- $\bar{X}_1$, $\bar{X}_2$ = sample means
|
|
- $n_1$, $n_2$ = sample sizes
|
|
- $s_p$ = pooled standard deviation
|
|
|
|
**Pooled Standard Deviation**:
|
|
$$
|
|
s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}
|
|
$$
|
|
|
|
**Degrees of Freedom**:
|
|
$$
|
|
df = n_1 + n_2 - 2
|
|
$$
|
|
|
|
**Key Assumptions**:
|
|
|
|
1. **Normality**: Data in each group are normally distributed
|
|
2. **Homogeneity of variances**: Population variances are equal
|
|
3. **Independence**: Observations are independent
|
|
4. **Interval/ratio scale**: Data are continuous
|
|
|
|
**R Implementation**:
|
|
```R
|
|
# Student's t-test (equal variances assumed)
|
|
result <- t.test(group1, group2, var.equal = TRUE)
|
|
|
|
# With formula interface
|
|
result <- t.test(score ~ group, data = dataset, var.equal = TRUE)
|
|
```
|
|
|
|
### 3.2. Welch's t-test
|
|
|
|
**Definition**: A parametric test comparing means without assuming equal variances between groups.
|
|
|
|
**Test Statistic**:
|
|
$$
|
|
t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
|
|
$$
|
|
|
|
**Degrees of Freedom** (Welch-Satterthwaite equation):
|
|
$$
|
|
df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}
|
|
$$
|
|
|
|
**Key Assumptions**:
|
|
|
|
1. **Normality**: Data in each group are normally distributed
|
|
2. **Independence**: Observations are independent
|
|
3. **Interval/ratio scale**: Data are continuous
|
|
4. **Unequal variances allowed**: No homogeneity of variances assumption
|
|
|
|
**R Implementation**:
|
|
```R
|
|
# Welch's t-test (default in R)
|
|
result <- t.test(group1, group2, var.equal = FALSE)
|
|
|
|
# Explicit specification
|
|
result <- t.test(group1, group2)
|
|
|
|
# With formula interface
|
|
result <- t.test(score ~ group, data = dataset)
|
|
```
|
|
|
|
### 3.3. Mann-Whitney U Test (Wilcoxon Rank-Sum Test)
|
|
|
|
**Definition**: A nonparametric test determining if one group tends to have larger values than another.
|
|
|
|
**Test Procedure**:
|
|
|
|
1. Combine all observations from both groups
|
|
2. Rank them from smallest to largest
|
|
3. Calculate U statistics:
|
|
- $U_1 = R_1 - \frac{n_1(n_1+1)}{2}$
|
|
- $U_2 = R_2 - \frac{n_2(n_2+1)}{2}$
|
|
4. Test statistic: $U = \min(U_1, U_2)$
|
|
|
|
**Key Assumptions**:
|
|
|
|
1. **Independence**: Observations are independent
|
|
2. **Ordinal/continuous data**: Data can be ranked
|
|
3. **Similar shape distributions**: For location shift interpretation
|
|
4. **No normality assumption**: Distribution-free
|
|
|
|
**R Implementation**:
|
|
|
|
```R
|
|
# Mann-Whitney U test
|
|
result <- wilcox.test(group1, group2)
|
|
|
|
# With formula interface
|
|
result <- wilcox.test(score ~ group, data = dataset)
|
|
|
|
# Extract results
|
|
U_statistic <- result$statistic
|
|
p_value <- result$p.value
|
|
```
|
|
|
|
## 4. Decision Framework
|
|
|
|
### 4.1. Test Selection Algorithm
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[Start: Compare Two Independent Groups] --> B{Data Normal?};
|
|
B -->|Yes| C{Equal Variances?};
|
|
B -->|No| D[Mann-Whitney U Test];
|
|
C -->|Yes| E[Student's t-test];
|
|
C -->|No| F[Welch's t-test];
|
|
|
|
style D fill:#e1f5fe
|
|
style E fill:#f3e5f5
|
|
style F fill:#e8f5e8
|
|
```
|
|
|
|
### 4.2. Detailed Selection Criteria
|
|
|
|
| Scenario | Recommended Test | Rationale |
|
|
|----------|-----------------|-----------|
|
|
| **Normal data, equal variances** | Student's t-test | Maximizes power when assumptions met |
|
|
| **Normal data, unequal variances** | Welch's t-test | Robust to variance heterogeneity |
|
|
| **Non-normal data** | Mann-Whitney U test | Distribution-free, handles outliers |
|
|
| **Ordinal data** | Mann-Whitney U test | Designed for ranked data |
|
|
| **Small samples** | Mann-Whitney U test | Less sensitive to distribution |
|
|
| **Unequal sample sizes** | Welch's t-test | Handles unequal n better |
|
|
| **Default choice** | Welch's t-test | More robust, recommended by many statisticians |
|
|
|
|
## 5. Assumption Checking Procedures
|
|
|
|
### 5.1. Normality Testing
|
|
|
|
**Shapiro-Wilk Test**:
|
|
|
|
```R
|
|
# Test normality for each group
|
|
shapiro.test(group1)
|
|
shapiro.test(group2)
|
|
```
|
|
|
|
**Visual Inspection**:
|
|
|
|
- Q-Q plots
|
|
- Histograms
|
|
- Density plots
|
|
|
|
### 5.2. Homogeneity of Variances
|
|
|
|
**Levene's Test**:
|
|
|
|
```R
|
|
library(car)
|
|
leveneTest(score ~ group, data = dataset)
|
|
```
|
|
|
|
**F-test**:
|
|
|
|
```R
|
|
var.test(group1, group2)
|
|
```
|
|
|
|
**Bartlett's Test**:
|
|
|
|
```R
|
|
bartlett.test(score ~ group, data = dataset)
|
|
```
|
|
|
|
### 5.3. Independence
|
|
|
|
- Research design consideration
|
|
- No statistical test available
|
|
- Ensure random sampling and assignment
|
|
|
|
## 6. Effect Size Measures
|
|
|
|
### 6.1. For Parametric Tests (Student's and Welch's t-tests)
|
|
|
|
**Cohen's d**:
|
|
$$
|
|
d = \frac{\bar{X}_1 - \bar{X}_2}{s_{pooled}}
|
|
$$
|
|
|
|
Where:
|
|
$$
|
|
s_{pooled} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}
|
|
$$
|
|
|
|
**Interpretation**:
|
|
|
|
- Small: $d = 0.2$
|
|
- Medium: $d = 0.5$
|
|
- Large: $d = 0.8$
|
|
|
|
### 6.2. For Mann-Whitney U Test
|
|
|
|
**Rank-biserial correlation**:
|
|
$$
|
|
r = 1 - \frac{2U}{n_1n_2}
|
|
$$
|
|
|
|
**Common language effect size**:
|
|
|
|
- Probability that random observation from group 1 > group 2
|
|
- $CL = \frac{U}{n_1n_2}$
|
|
|
|
## 7. Practical Examples
|
|
|
|
### 7.1. Example 1: Student's t-test
|
|
|
|
**Scenario**: Comparing exam scores between two classes with similar variance.
|
|
|
|
```R
|
|
# Data
|
|
class_A <- c(78, 82, 85, 76, 79, 81, 83, 77, 80, 84)
|
|
class_B <- c(75, 78, 72, 79, 76, 74, 77, 73, 75, 78)
|
|
|
|
# Assumption checking
|
|
shapiro.test(class_A) # p = 0.423 (normal)
|
|
shapiro.test(class_B) # p = 0.356 (normal)
|
|
var.test(class_A, class_B) # p = 0.218 (equal variances)
|
|
|
|
# Student's t-test
|
|
t.test(class_A, class_B, var.equal = TRUE)
|
|
```
|
|
|
|
### 7.2. Example 2: Welch's t-test
|
|
|
|
**Scenario**: Comparing reaction times between two age groups with different variances.
|
|
|
|
```R
|
|
# Data
|
|
young <- c(210, 195, 225, 240, 205, 215, 230, 220, 200, 210)
|
|
elderly <- c(280, 295, 270, 310, 320, 290, 300, 285, 315, 305)
|
|
|
|
# Assumption checking
|
|
shapiro.test(young) # p = 0.512 (normal)
|
|
shapiro.test(elderly) # p = 0.487 (normal)
|
|
var.test(young, elderly) # p = 0.023 (unequal variances)
|
|
|
|
# Welch's t-test
|
|
t.test(young, elderly) # var.equal = FALSE by default
|
|
```
|
|
|
|
### 7.3. Example 3: Mann-Whitney U Test
|
|
|
|
**Scenario**: Comparing customer satisfaction ratings (ordinal scale 1-5).
|
|
|
|
```R
|
|
# Data
|
|
store_A <- c(4, 3, 5, 2, 4, 3, 5, 4, 3, 4)
|
|
store_B <- c(3, 2, 3, 1, 2, 3, 2, 1, 3, 2)
|
|
|
|
# Mann-Whitney U test
|
|
wilcox.test(store_A, store_B)
|
|
```
|
|
|
|
## 8. Power and Sample Size Considerations
|
|
|
|
### 8.1. Relative Power
|
|
|
|
- **Student's t-test**: Most powerful when assumptions are perfectly met
|
|
- **Welch's t-test**: Slightly less power than Student's when variances equal, but better Type I error control
|
|
- **Mann-Whitney U**: About 95% as powerful as t-tests for normal data, often more powerful for non-normal data
|
|
|
|
### 8.2. Sample Size Guidelines
|
|
|
|
| Test | Minimum Sample Size | Recommended per Group |
|
|
|------|---------------------|----------------------|
|
|
| Student's t-test | 15-20 | 30+ |
|
|
| Welch's t-test | 15-20 | 30+ |
|
|
| Mann-Whitney U | 5-10 | 20+ |
|
|
|
|
## 9. Common Pitfalls and Best Practices
|
|
|
|
### 9.1. Common Mistakes
|
|
|
|
1. **Using Student's t-test without checking variances**
|
|
2. **Applying parametric tests to non-normal data**
|
|
3. **Ignoring effect sizes**
|
|
4. **Not reporting assumption checks**
|
|
5. **Using multiple tests without correction**
|
|
|
|
### 9.2. Best Practices
|
|
|
|
1. **Always check assumptions first**
|
|
2. **Use Welch's t-test as default for parametric comparisons**
|
|
3. **Report both p-values and effect sizes**
|
|
4. **Use visualizations to support statistical findings**
|
|
5. **Consider the research question when choosing tests**
|
|
|
|
## 10. Advanced Considerations
|
|
|
|
### 10.1. Transformations
|
|
|
|
When data violate normality assumptions:
|
|
|
|
- **Log transformation**: For right-skewed data
|
|
- **Square root transformation**: For count data
|
|
- **Arcsin transformation**: For proportions
|
|
|
|
### 10.2. Robust Alternatives
|
|
|
|
- **Trimmed means**: Remove extreme values
|
|
- **Bootstrap methods**: Resampling approaches
|
|
- **Permutation tests**: Exact nonparametric tests
|
|
|
|
### 10.3. Software Implementation
|
|
|
|
**Python**:
|
|
```python
|
|
from scipy import stats
|
|
# Student's t-test
|
|
stats.ttest_ind(group1, group2, equal_var=True)
|
|
# Welch's t-test
|
|
stats.ttest_ind(group1, group2, equal_var=False)
|
|
# Mann-Whitney U test
|
|
stats.mannwhitneyu(group1, group2)
|
|
```
|
|
|
|
## 11. Summary and Recommendations
|
|
|
|
### 11.1. Key Takeaways
|
|
|
|
1. **Student's t-test**: Use only when normality and equal variances are confirmed
|
|
2. **Welch's t-test**: Recommended default for parametric comparisons
|
|
3. **Mann-Whitney U**: Go-to choice for non-normal or ordinal data
|
|
4. **Always validate assumptions** before test selection
|
|
5. **Report comprehensive results** including effect sizes and assumption checks
|
|
|
|
### 11.2. Final Decision Matrix
|
|
|
|
| Data Characteristic | Preferred Test |
|
|
|---------------------|----------------|
|
|
| Normal + equal variances | Student's t-test |
|
|
| Normal + unequal variances | Welch's t-test |
|
|
| Non-normal data | Mann-Whitney U test |
|
|
| Ordinal data | Mann-Whitney U test |
|
|
| Small samples | Mann-Whitney U test |
|
|
| Default choice | Welch's t-test |
|
|
|
|
### 11.3. Related Tests
|
|
|
|
- **Paired t-test**: For dependent samples
|
|
- **One-way ANOVA**: For comparing >2 groups
|
|
- **Kruskal-Wallis test**: Nonparametric alternative to ANOVA
|
|
- **Bootstrapping**: For complex data situations
|