diff --git a/Research Method/Quantitative Research/Descriptive Statistics.md b/Research Method/Quantitative Research/Descriptive Statistics.md index ab6075c..9c717c7 100644 --- a/Research Method/Quantitative Research/Descriptive Statistics.md +++ b/Research Method/Quantitative Research/Descriptive Statistics.md @@ -28,6 +28,7 @@ Mean is the average of a specific variable in a data set. To tell apart, a popul $$ \mu = \frac{\sum_{i}^{N}{x}}{N} $$ + $$ \bar{x} = \frac{\sum_{i}^{n}{x}}{n} $$ diff --git a/Research Method/Quantitative Research/Normal Distribution.md b/Research Method/Quantitative Research/Normal Distribution.md new file mode 100644 index 0000000..9bda6c3 --- /dev/null +++ b/Research Method/Quantitative Research/Normal Distribution.md @@ -0,0 +1,273 @@ +--- +Course: + - PSYC10100 Introduction to Statistics for Psychological Sciences +tags: + - statistics + - probability + - distributions + - normal-distribution +--- +## 1. Introduction to Probability Distributions + +### 1.1. What is a Probability Distribution? + +A probability distribution describes how probabilities are distributed over the values of a random variable. It specifies the likelihood of different outcomes in an experiment or observation. + +### 1.2. Types of Probability Distributions + +- **Discrete Distributions**: For countable outcomes (e.g., binomial, Poisson) +- **Continuous Distributions**: For measurable outcomes (e.g., normal, exponential) + +## 2. The Normal Distribution + +### 2.1. Definition and Properties +The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution characterized by its bell-shaped curve. + +**Key Properties**: +- Symmetrical about the mean +- Mean = Median = Mode +- Defined by two parameters: mean ($\mu$) and standard deviation ($\sigma$) +- Total area under the curve equals 1 +- Follows the Empirical Rule (68-95-99.7 rule) + +### 2.2. Probability Density Function +The probability density function (PDF) of the normal distribution is: + +$$ +f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} +$$ + +Where: + +- $\mu$ = mean +- $\sigma$ = standard deviation +- $\pi$ ≈ 3.14159 +- $e$ ≈ 2.71828 + +### 2.3. Empirical Rule (68-95-99.7 Rule) + +For normally distributed data: + +- Approximately 68% of data falls within $\pm1$ standard deviation from the mean +- Approximately 95% of data falls within $\pm2$ standard deviations from the mean +- Approximately 99.7% of data falls within $\pm3$ standard deviations from the mean + +## 3. Distribution Shape Characteristics + +### 3.1. Skewness + +Skewness measures the asymmetry of a probability distribution around its mean. It indicates whether data are concentrated more on one side of the distribution. + +**Types of Skewness**: + +- **Positive Skew (Right Skew)**: Tail extends to the right, mean > median > mode +- **Negative Skew (Left Skew)**: Tail extends to the left, mean < median < mode +- **Zero Skew**: Symmetrical distribution, mean = median = mode + +**Calculation**: See [[Descriptive Statistics]] + +### 3.2. Kurtosis + +Kurtosis measures the "tailedness" of a probability distribution, indicating how much data are in the tails compared to a normal distribution. + +**Types of Kurtosis**: + +- **Mesokurtic**: Normal distribution, kurtosis = 3 (excess kurtosis = 0) +- **Leptokurtic**: Heavy tails and sharp peak, kurtosis > 3 (excess kurtosis > 0) +- **Platykurtic**: Light tails and flat peak, kurtosis < 3 (excess kurtosis < 0) + +**Calculation**: See [[Descriptive Statistics]] + +## 4. Standard Normal Distribution (Z-Distribution) + +### 4.1. Definition + +The standard normal distribution is a special case of the normal distribution with: + +- Mean ($\mu$) = 0 +- Standard deviation ($\sigma$) = 1 + +### 4.2. Z-Scores + +A z-score (standard score) measures how many standard deviations an observation is from the mean: + +$$ +z = \frac{x - \mu}{\sigma} +$$ + +**Interpretation**: + +- $z = 0$: Value equals the mean +- $z > 0$: Value above the mean +- $z < 0$: Value below the mean + +### 4.3. Z-Table and Probability Calculations + +Z-tables provide the cumulative probability from $-\infty$ to a given z-value. Common z-values and their probabilities: + +| Z-Score | Cumulative Probability | +| ------- | ---------------------- | +| -3.0 | 0.0013 | +| -2.0 | 0.0228 | +| -1.0 | 0.1587 | +| 0.0 | 0.5000 | +| 1.0 | 0.8413 | +| 2.0 | 0.9772 | +| 3.0 | 0.9987 | + +## 5. Student's t-Distribution + +### 5.1. Definition and Purpose +The t-distribution is used when: +- Sample sizes are small ($n < 30$) +- Population standard deviation is unknown +- We need to estimate population parameters from sample data + +### 5.2. Properties + +- Similar bell shape to normal distribution +- Heavier tails than normal distribution (more probability in extremes) +- Approaches normal distribution as degrees of freedom increase +- Defined by degrees of freedom ($df = n - 1$) + +### 5.3. Degrees of Freedom + +Degrees of freedom represent the number of independent pieces of information available to estimate a parameter: + +$$ +df = n - 1 +$$ + +Where $n$ is the sample size. + +### 5.4. T-Scores + +T-scores are calculated similarly to z-scores but use sample standard deviation: + +$$ +t = \frac{\bar{x} - \mu}{s/\sqrt{n}} +$$ + +Where: + +- $\bar{x}$ = sample mean +- $\mu$ = population mean (hypothesized) +- $s$ = sample standard deviation +- $n$ = sample size + +## 6. Comparing Z and T Distributions + +| Characteristic | Z-Distribution | T-Distribution | +|----------------|----------------|----------------| +| **When to Use** | $\sigma$ known, large $n$ | $\sigma$ unknown, small $n$ | +| **Parameters** | $\mu$, $\sigma$ | $\mu$, $s$, $df$ | +| **Shape** | Fixed bell curve | Varies with $df$ | +| **Tails** | Lighter | Heavier | +| **Applications** | Hypothesis testing, confidence intervals | Same, but for small samples | + +## 7. Other Important Distributions + +### 7.1. Bimodal Distribution + +- Has two distinct peaks or modes +- Often indicates two different populations or processes +- Common in mixed data sets + +### 7.2. Uniform Distribution + +- All outcomes equally likely +- Rectangular shape +- Constant probability density function + +### 7.3. Other Common Distributions + +- **Binomial**: For binary outcomes +- **Poisson**: For count data +- **Exponential**: For time between events + +## 8. Applications in Psychological Research + +### 8.1. Hypothesis Testing + +- Using z-tests for large samples with known population parameters +- Using t-tests for small samples or unknown population parameters + +### 8.2. Confidence Intervals + +- Constructing intervals for population means +- Determining margin of error + +### 8.3. Effect Size Calculations + +- Standardizing measures for comparison across studies +- Cohen's d and other effect size metrics + +## 9. Practical Examples + +### 9.1. Example 1: Z-Score Calculation + +Given: $\mu = 100$, $\sigma = 15$, $x = 130$ + +$$ +z = \frac{130 - 100}{15} = 2.0 +$$ + +Interpretation: This score is 2 standard deviations above the mean. + +### 9.2. Example 2: T-Score Calculation + +Given: $\mu = 50$, $\bar{x} = 55$, $s = 8$, $n = 25$ + +$$ +t = \frac{55 - 50}{8/\sqrt{25}} = \frac{5}{1.6} = 3.125 +$$ + +$df = 25 - 1 = 24$ + +## 10. R Implementation + +### 10.1. Normal Distribution Functions + +```R +# Probability density +dnorm(x, mean = 0, sd = 1) + +# Cumulative probability +pnorm(q, mean = 0, sd = 1) + +# Quantile function +qnorm(p, mean = 0, sd = 1) + +# Random generation +rnorm(n, mean = 0, sd = 1) +``` + +### 10.2. T-Distribution Functions + +```R +# Probability density +dt(x, df) + +# Cumulative probability +pt(q, df) + +# Quantile function +qt(p, df) + +# Random generation +rt(n, df) +``` + +### 10.3. Sample Standard Deviation + +```R +sample_sd <- sd(data) # Sample standard deviation +``` + +## 11. Summary + +- The normal distribution is fundamental in statistics with predictable properties +- Z-distribution is used when population parameters are known +- T-distribution is used for small samples with unknown population parameters +- Understanding distribution shapes (skewness, kurtosis) helps interpret data patterns +- These distributions form the basis for many statistical tests in psychological research diff --git a/Research Method/Quantitative Research/Systematic Comparison of Student's t, Welch's t, and Mann-Whitney U Tests.md b/Research Method/Quantitative Research/Systematic Comparison of Student's t, Welch's t, and Mann-Whitney U Tests.md new file mode 100644 index 0000000..396a15a --- /dev/null +++ b/Research Method/Quantitative Research/Systematic Comparison of Student's t, Welch's t, and Mann-Whitney U Tests.md @@ -0,0 +1,381 @@ +--- +Course: +tags: + - statistics + - hypothesis-testing + - t-test + - welch + - mann-whitney + - nonparametric + - parametric + - comparison +--- +## 1. Overview and Purpose + +This systematic note provides a comprehensive comparison of three commonly used statistical tests for comparing two independent groups: Student's t-test, Welch's t-test, and the Mann-Whitney U test. Each test serves different purposes and has specific assumptions and applications. + +## 2. Quick Reference Table + +| Test | Type | Key Assumptions | When to Use | Effect Size | +|------|------|----------------|-------------|-------------| +| **Student's t-test** | Parametric | Normality, equal variances, independence | Normal data with equal variances | Cohen's d | +| **Welch's t-test** | Parametric | Normality, independence | Normal data with unequal variances | Cohen's d | +| **Mann-Whitney U** | Nonparametric | Independence, ordinal/continuous data | Non-normal data, ordinal data | Rank-biserial correlation | + +## 3. Detailed Test Characteristics + +### 3.1. Student's t-test (Independent Samples) + +**Definition**: A parametric test comparing means of two independent groups assuming equal population variances. + +**Test Statistic**: +$$ +t = \frac{\bar{X}_1 - \bar{X}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} +$$ + +Where: + +- $\bar{X}_1$, $\bar{X}_2$ = sample means +- $n_1$, $n_2$ = sample sizes +- $s_p$ = pooled standard deviation + +**Pooled Standard Deviation**: +$$ +s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}} +$$ + +**Degrees of Freedom**: +$$ +df = n_1 + n_2 - 2 +$$ + +**Key Assumptions**: + +1. **Normality**: Data in each group are normally distributed +2. **Homogeneity of variances**: Population variances are equal +3. **Independence**: Observations are independent +4. **Interval/ratio scale**: Data are continuous + +**R Implementation**: +```R +# Student's t-test (equal variances assumed) +result <- t.test(group1, group2, var.equal = TRUE) + +# With formula interface +result <- t.test(score ~ group, data = dataset, var.equal = TRUE) +``` + +### 3.2. Welch's t-test + +**Definition**: A parametric test comparing means without assuming equal variances between groups. + +**Test Statistic**: +$$ +t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} +$$ + +**Degrees of Freedom** (Welch-Satterthwaite equation): +$$ +df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} +$$ + +**Key Assumptions**: + +1. **Normality**: Data in each group are normally distributed +2. **Independence**: Observations are independent +3. **Interval/ratio scale**: Data are continuous +4. **Unequal variances allowed**: No homogeneity of variances assumption + +**R Implementation**: +```R +# Welch's t-test (default in R) +result <- t.test(group1, group2, var.equal = FALSE) + +# Explicit specification +result <- t.test(group1, group2) + +# With formula interface +result <- t.test(score ~ group, data = dataset) +``` + +### 3.3. Mann-Whitney U Test (Wilcoxon Rank-Sum Test) + +**Definition**: A nonparametric test determining if one group tends to have larger values than another. + +**Test Procedure**: + +1. Combine all observations from both groups +2. Rank them from smallest to largest +3. Calculate U statistics: + - $U_1 = R_1 - \frac{n_1(n_1+1)}{2}$ + - $U_2 = R_2 - \frac{n_2(n_2+1)}{2}$ +4. Test statistic: $U = \min(U_1, U_2)$ + +**Key Assumptions**: + +1. **Independence**: Observations are independent +2. **Ordinal/continuous data**: Data can be ranked +3. **Similar shape distributions**: For location shift interpretation +4. **No normality assumption**: Distribution-free + +**R Implementation**: + +```R +# Mann-Whitney U test +result <- wilcox.test(group1, group2) + +# With formula interface +result <- wilcox.test(score ~ group, data = dataset) + +# Extract results +U_statistic <- result$statistic +p_value <- result$p.value +``` + +## 4. Decision Framework + +### 4.1. Test Selection Algorithm + +```mermaid +graph TD + A[Start: Compare Two Independent Groups] --> B{Data Normal?}; + B -->|Yes| C{Equal Variances?}; + B -->|No| D[Mann-Whitney U Test]; + C -->|Yes| E[Student's t-test]; + C -->|No| F[Welch's t-test]; + + style D fill:#e1f5fe + style E fill:#f3e5f5 + style F fill:#e8f5e8 +``` + +### 4.2. Detailed Selection Criteria + +| Scenario | Recommended Test | Rationale | +|----------|-----------------|-----------| +| **Normal data, equal variances** | Student's t-test | Maximizes power when assumptions met | +| **Normal data, unequal variances** | Welch's t-test | Robust to variance heterogeneity | +| **Non-normal data** | Mann-Whitney U test | Distribution-free, handles outliers | +| **Ordinal data** | Mann-Whitney U test | Designed for ranked data | +| **Small samples** | Mann-Whitney U test | Less sensitive to distribution | +| **Unequal sample sizes** | Welch's t-test | Handles unequal n better | +| **Default choice** | Welch's t-test | More robust, recommended by many statisticians | + +## 5. Assumption Checking Procedures + +### 5.1. Normality Testing + +**Shapiro-Wilk Test**: + +```R +# Test normality for each group +shapiro.test(group1) +shapiro.test(group2) +``` + +**Visual Inspection**: + +- Q-Q plots +- Histograms +- Density plots + +### 5.2. Homogeneity of Variances + +**Levene's Test**: + +```R +library(car) +leveneTest(score ~ group, data = dataset) +``` + +**F-test**: + +```R +var.test(group1, group2) +``` + +**Bartlett's Test**: + +```R +bartlett.test(score ~ group, data = dataset) +``` + +### 5.3. Independence + +- Research design consideration +- No statistical test available +- Ensure random sampling and assignment + +## 6. Effect Size Measures + +### 6.1. For Parametric Tests (Student's and Welch's t-tests) + +**Cohen's d**: +$$ +d = \frac{\bar{X}_1 - \bar{X}_2}{s_{pooled}} +$$ + +Where: +$$ +s_{pooled} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}} +$$ + +**Interpretation**: + +- Small: $d = 0.2$ +- Medium: $d = 0.5$ +- Large: $d = 0.8$ + +### 6.2. For Mann-Whitney U Test + +**Rank-biserial correlation**: +$$ +r = 1 - \frac{2U}{n_1n_2} +$$ + +**Common language effect size**: + +- Probability that random observation from group 1 > group 2 +- $CL = \frac{U}{n_1n_2}$ + +## 7. Practical Examples + +### 7.1. Example 1: Student's t-test + +**Scenario**: Comparing exam scores between two classes with similar variance. + +```R +# Data +class_A <- c(78, 82, 85, 76, 79, 81, 83, 77, 80, 84) +class_B <- c(75, 78, 72, 79, 76, 74, 77, 73, 75, 78) + +# Assumption checking +shapiro.test(class_A) # p = 0.423 (normal) +shapiro.test(class_B) # p = 0.356 (normal) +var.test(class_A, class_B) # p = 0.218 (equal variances) + +# Student's t-test +t.test(class_A, class_B, var.equal = TRUE) +``` + +### 7.2. Example 2: Welch's t-test + +**Scenario**: Comparing reaction times between two age groups with different variances. + +```R +# Data +young <- c(210, 195, 225, 240, 205, 215, 230, 220, 200, 210) +elderly <- c(280, 295, 270, 310, 320, 290, 300, 285, 315, 305) + +# Assumption checking +shapiro.test(young) # p = 0.512 (normal) +shapiro.test(elderly) # p = 0.487 (normal) +var.test(young, elderly) # p = 0.023 (unequal variances) + +# Welch's t-test +t.test(young, elderly) # var.equal = FALSE by default +``` + +### 7.3. Example 3: Mann-Whitney U Test + +**Scenario**: Comparing customer satisfaction ratings (ordinal scale 1-5). + +```R +# Data +store_A <- c(4, 3, 5, 2, 4, 3, 5, 4, 3, 4) +store_B <- c(3, 2, 3, 1, 2, 3, 2, 1, 3, 2) + +# Mann-Whitney U test +wilcox.test(store_A, store_B) +``` + +## 8. Power and Sample Size Considerations + +### 8.1. Relative Power + +- **Student's t-test**: Most powerful when assumptions are perfectly met +- **Welch's t-test**: Slightly less power than Student's when variances equal, but better Type I error control +- **Mann-Whitney U**: About 95% as powerful as t-tests for normal data, often more powerful for non-normal data + +### 8.2. Sample Size Guidelines + +| Test | Minimum Sample Size | Recommended per Group | +|------|---------------------|----------------------| +| Student's t-test | 15-20 | 30+ | +| Welch's t-test | 15-20 | 30+ | +| Mann-Whitney U | 5-10 | 20+ | + +## 9. Common Pitfalls and Best Practices + +### 9.1. Common Mistakes + +1. **Using Student's t-test without checking variances** +2. **Applying parametric tests to non-normal data** +3. **Ignoring effect sizes** +4. **Not reporting assumption checks** +5. **Using multiple tests without correction** + +### 9.2. Best Practices + +1. **Always check assumptions first** +2. **Use Welch's t-test as default for parametric comparisons** +3. **Report both p-values and effect sizes** +4. **Use visualizations to support statistical findings** +5. **Consider the research question when choosing tests** + +## 10. Advanced Considerations + +### 10.1. Transformations + +When data violate normality assumptions: + +- **Log transformation**: For right-skewed data +- **Square root transformation**: For count data +- **Arcsin transformation**: For proportions + +### 10.2. Robust Alternatives + +- **Trimmed means**: Remove extreme values +- **Bootstrap methods**: Resampling approaches +- **Permutation tests**: Exact nonparametric tests + +### 10.3. Software Implementation + +**Python**: +```python +from scipy import stats +# Student's t-test +stats.ttest_ind(group1, group2, equal_var=True) +# Welch's t-test +stats.ttest_ind(group1, group2, equal_var=False) +# Mann-Whitney U test +stats.mannwhitneyu(group1, group2) +``` + +## 11. Summary and Recommendations + +### 11.1. Key Takeaways + +1. **Student's t-test**: Use only when normality and equal variances are confirmed +2. **Welch's t-test**: Recommended default for parametric comparisons +3. **Mann-Whitney U**: Go-to choice for non-normal or ordinal data +4. **Always validate assumptions** before test selection +5. **Report comprehensive results** including effect sizes and assumption checks + +### 11.2. Final Decision Matrix + +| Data Characteristic | Preferred Test | +|---------------------|----------------| +| Normal + equal variances | Student's t-test | +| Normal + unequal variances | Welch's t-test | +| Non-normal data | Mann-Whitney U test | +| Ordinal data | Mann-Whitney U test | +| Small samples | Mann-Whitney U test | +| Default choice | Welch's t-test | + +### 11.3. Related Tests + +- **Paired t-test**: For dependent samples +- **One-way ANOVA**: For comparing >2 groups +- **Kruskal-Wallis test**: Nonparametric alternative to ANOVA +- **Bootstrapping**: For complex data situations diff --git a/Social Psychology/Aggression.md b/Social Psychology/Aggression.md index ffcc839..970dcad 100644 --- a/Social Psychology/Aggression.md +++ b/Social Psychology/Aggression.md @@ -1,5 +1,7 @@ --- Course: PSYG2504 Social psychology +tags: + - Psychology/Social --- ## 1. Definition of Aggression diff --git a/Social Psychology/Altruism.md b/Social Psychology/Altruism.md index 2280f39..ba606a8 100644 --- a/Social Psychology/Altruism.md +++ b/Social Psychology/Altruism.md @@ -1,5 +1,7 @@ --- Course: PSYG2504 Social psychology +tags: + - Psychology/Social --- ## 1. Definitions