doc: M Research Method/Quantitative Research/Descriptive Statistics.md Social Psychology/Aggression.md Social Psychology/Altruism.md, A Research Method/Quantitative Research/Normal Distribution.md Research Method/Quantitative Research/Systematic Comparison of Student's t, Welch's t, and Mann-Whitney U Tests.md
7.2 KiB
Course, tags
Course | tags | |||||
---|---|---|---|---|---|---|
|
|
1. Introduction to Probability Distributions
1.1. What is a Probability Distribution?
A probability distribution describes how probabilities are distributed over the values of a random variable. It specifies the likelihood of different outcomes in an experiment or observation.
1.2. Types of Probability Distributions
- Discrete Distributions: For countable outcomes (e.g., binomial, Poisson)
- Continuous Distributions: For measurable outcomes (e.g., normal, exponential)
2. The Normal Distribution
2.1. Definition and Properties
The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution characterized by its bell-shaped curve.
Key Properties:
- Symmetrical about the mean
- Mean = Median = Mode
- Defined by two parameters: mean (
\mu
) and standard deviation (\sigma
) - Total area under the curve equals 1
- Follows the Empirical Rule (68-95-99.7 rule)
2.2. Probability Density Function
The probability density function (PDF) of the normal distribution is:
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
Where:
\mu
= mean\sigma
= standard deviation\pi
≈ 3.14159e
≈ 2.71828
2.3. Empirical Rule (68-95-99.7 Rule)
For normally distributed data:
- Approximately 68% of data falls within
\pm1
standard deviation from the mean - Approximately 95% of data falls within
\pm2
standard deviations from the mean - Approximately 99.7% of data falls within
\pm3
standard deviations from the mean
3. Distribution Shape Characteristics
3.1. Skewness
Skewness measures the asymmetry of a probability distribution around its mean. It indicates whether data are concentrated more on one side of the distribution.
Types of Skewness:
- Positive Skew (Right Skew): Tail extends to the right, mean > median > mode
- Negative Skew (Left Skew): Tail extends to the left, mean < median < mode
- Zero Skew: Symmetrical distribution, mean = median = mode
Calculation: See Descriptive Statistics
3.2. Kurtosis
Kurtosis measures the "tailedness" of a probability distribution, indicating how much data are in the tails compared to a normal distribution.
Types of Kurtosis:
- Mesokurtic: Normal distribution, kurtosis = 3 (excess kurtosis = 0)
- Leptokurtic: Heavy tails and sharp peak, kurtosis > 3 (excess kurtosis > 0)
- Platykurtic: Light tails and flat peak, kurtosis < 3 (excess kurtosis < 0)
Calculation: See Descriptive Statistics
4. Standard Normal Distribution (Z-Distribution)
4.1. Definition
The standard normal distribution is a special case of the normal distribution with:
- Mean (
\mu
) = 0 - Standard deviation (
\sigma
) = 1
4.2. Z-Scores
A z-score (standard score) measures how many standard deviations an observation is from the mean:
z = \frac{x - \mu}{\sigma}
Interpretation:
z = 0
: Value equals the meanz > 0
: Value above the meanz < 0
: Value below the mean
4.3. Z-Table and Probability Calculations
Z-tables provide the cumulative probability from -\infty
to a given z-value. Common z-values and their probabilities:
Z-Score | Cumulative Probability |
---|---|
-3.0 | 0.0013 |
-2.0 | 0.0228 |
-1.0 | 0.1587 |
0.0 | 0.5000 |
1.0 | 0.8413 |
2.0 | 0.9772 |
3.0 | 0.9987 |
5. Student's t-Distribution
5.1. Definition and Purpose
The t-distribution is used when:
- Sample sizes are small (
n < 30
) - Population standard deviation is unknown
- We need to estimate population parameters from sample data
5.2. Properties
- Similar bell shape to normal distribution
- Heavier tails than normal distribution (more probability in extremes)
- Approaches normal distribution as degrees of freedom increase
- Defined by degrees of freedom (
df = n - 1
)
5.3. Degrees of Freedom
Degrees of freedom represent the number of independent pieces of information available to estimate a parameter:
df = n - 1
Where n
is the sample size.
5.4. T-Scores
T-scores are calculated similarly to z-scores but use sample standard deviation:
t = \frac{\bar{x} - \mu}{s/\sqrt{n}}
Where:
\bar{x}
= sample mean\mu
= population mean (hypothesized)s
= sample standard deviationn
= sample size
6. Comparing Z and T Distributions
Characteristic | Z-Distribution | T-Distribution |
---|---|---|
When to Use | \sigma known, large n |
\sigma unknown, small n |
Parameters | \mu , \sigma |
\mu , s , df |
Shape | Fixed bell curve | Varies with df |
Tails | Lighter | Heavier |
Applications | Hypothesis testing, confidence intervals | Same, but for small samples |
7. Other Important Distributions
7.1. Bimodal Distribution
- Has two distinct peaks or modes
- Often indicates two different populations or processes
- Common in mixed data sets
7.2. Uniform Distribution
- All outcomes equally likely
- Rectangular shape
- Constant probability density function
7.3. Other Common Distributions
- Binomial: For binary outcomes
- Poisson: For count data
- Exponential: For time between events
8. Applications in Psychological Research
8.1. Hypothesis Testing
- Using z-tests for large samples with known population parameters
- Using t-tests for small samples or unknown population parameters
8.2. Confidence Intervals
- Constructing intervals for population means
- Determining margin of error
8.3. Effect Size Calculations
- Standardizing measures for comparison across studies
- Cohen's d and other effect size metrics
9. Practical Examples
9.1. Example 1: Z-Score Calculation
Given: \mu = 100
, \sigma = 15
, x = 130
z = \frac{130 - 100}{15} = 2.0
Interpretation: This score is 2 standard deviations above the mean.
9.2. Example 2: T-Score Calculation
Given: \mu = 50
, \bar{x} = 55
, s = 8
, n = 25
t = \frac{55 - 50}{8/\sqrt{25}} = \frac{5}{1.6} = 3.125
df = 25 - 1 = 24
10. R Implementation
10.1. Normal Distribution Functions
# Probability density
dnorm(x, mean = 0, sd = 1)
# Cumulative probability
pnorm(q, mean = 0, sd = 1)
# Quantile function
qnorm(p, mean = 0, sd = 1)
# Random generation
rnorm(n, mean = 0, sd = 1)
10.2. T-Distribution Functions
# Probability density
dt(x, df)
# Cumulative probability
pt(q, df)
# Quantile function
qt(p, df)
# Random generation
rt(n, df)
10.3. Sample Standard Deviation
sample_sd <- sd(data) # Sample standard deviation
11. Summary
- The normal distribution is fundamental in statistics with predictable properties
- Z-distribution is used when population parameters are known
- T-distribution is used for small samples with unknown population parameters
- Understanding distribution shapes (skewness, kurtosis) helps interpret data patterns
- These distributions form the basis for many statistical tests in psychological research