2025-10-12 16:40:14

doc: M Research Method/Quantitative Research/Descriptive Statistics.md
2025-10-12 16:40:14 +01:00
parent 901418b65e
commit 40ce1eff2d
1 changed files with 13 additions and 0 deletions
--- a/Research/Descriptive
+++ b/Research/Descriptive
@@ -66,16 +66,19 @@ The **median** is the middle value in an ordered data set. It divides the data i
 - For even number of observations: Average of two middle values

 **Calculation**:
+
 - Order the data from smallest to largest
 - If $n$ is odd: $\text{median} = x_{(n+1)/2}$
 - If $n$ is even: $\text{median} = \frac{x_{n/2} + x_{n/2 + 1}}{2}$

 **Characteristics**:
+
 - Robust to outliers and skewed distributions
 - More appropriate than mean for ordinal data
 - Represents the 50th percentile (second quartile)

 **Example**: 
+
 - Data: {3, 1, 7, 4, 2} → Ordered: {1, 2, 3, 4, 7} → Median = 3
 - Data: {3, 1, 7, 4} → Ordered: {1, 3, 4, 7} → Median = (3+4)/2 = 3.5

@@ -109,11 +112,13 @@ s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}
 $$

 where:
+
 - $n$ = sample size
 - $\bar{x}$ = sample mean
 - $x_i$ = individual values

 **Key Differences**:
+
 - **Denominator**: Population uses $N$, sample uses $n-1$ (Bessel's correction)
 - **Purpose**: Population variance describes the entire population, sample variance estimates population variance
 - **Bias**: Using $n$ instead of $n-1$ in sample variance creates a biased estimator
@@ -233,6 +238,7 @@ $$
 $$

 where:
+
 - $Q_1$ = First quartile (25th percentile)
 - $Q_3$ = Third quartile (75th percentile)

@@ -245,17 +251,20 @@ There are several methods to calculate quartiles. Common approaches include:

 **Outlier Detection**:
 IQR is commonly used to identify outliers using the "1.5×IQR rule":
+
 - **Lower fence**: $Q_1 - 1.5 \times \text{IQR}$
 - **Upper fence**: $Q_3 + 1.5 \times \text{IQR}$
 - Values outside these fences are considered potential outliers

 **Box Plot Relationship**:
 IQR forms the "box" in box plots:
+
 - Box extends from Q1 to Q3
 - Line inside box represents median
 - Whiskers extend to most extreme non-outlier values

 **Advantages**:
+
 - **Robust**: Not affected by extreme values
 - **Intuitive**: Easy to interpret and visualize
 - **Standardized**: Widely used in exploratory data analysis
@@ -378,6 +387,7 @@ g_2 = \frac{\frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})^4}{\left(\frac{1}{n}\sum_{i
 $$

 **Interpretation**:
+
 - **Excess Kurtosis = 0**: Normal distribution
 - **Excess Kurtosis > 0**: Heavy tails (more outliers)
 - **Excess Kurtosis < 0**: Light tails (fewer outliers)
@@ -467,6 +477,7 @@ $$
 $$

 where:
+
 - $\bar{x}$ = sample mean
 - $s$ = sample standard deviation
 - $n$ = sample size
@@ -474,11 +485,13 @@ where:
 - $t_{\alpha/2, df}$ = critical t-value with $df = n-1$

 **Common Confidence Levels**:
+
 - **90% CI**: $\alpha = 0.10$, $z_{0.05} = 1.645$
 - **95% CI**: $\alpha = 0.05$, $z_{0.025} = 1.96$
 - **99% CI**: $\alpha = 0.01$, $z_{0.005} = 2.576$

 **Interpretation**:
+
 - "We are 95% confident that the true population mean lies between [lower, upper]"
 - Does NOT mean there's a 95% probability that the specific interval contains the parameter