The Law of Large Numbers

Checking hypotheses using statistics

Terms defined: *t* distribution, *t* test, Bessel correction, Bonferroni correction, Central Limit Theorem, Z test, alternative hypothesis, confidence interval, cumulative distribution function, degrees of freedom, false negative, false positive, normal distribution, null hypothesis, probability density function, probability mass function, sample variance, standard normal distribution, standard uniform distribution, uniform distribution

What is the law of large numbers?

What is the normal distribution and why do we care?

\[\begin{align*} f(x) & = & \frac{1}{\sigma \sqrt{2 \pi}} e^{- \frac{(x - \mu)^2}{2 \sigma^2}} \end{align*}\]

How can we use this to quantify confidence?

Two-Tailed Significance Test

Student’s t-distribution

\[\begin{align*} s^2 & = & \frac{1}{n-1} \sum_{i=1}^{n}(X_i - \bar{X})^2 \\ & = & \frac{\sum X_i^2 - n\bar{X}^2}{n - 1} \end{align*}\]

How can we compare the means of two datasets?

from scipy.stats import ttest_ind

def main():
    # ...parse arguments...
    # data and calculate actual means and difference...

    # test and report
    result = ttest_ind(data_left, data_right)
python bin/ --left ../hypothesis-testing/data/javascript-counts.csv --right ../hypothesis-testing/data/python-counts.csv --low 1 --high 200
Ttest_indResult(statistic=-269.67014904687954, pvalue=0.0)
Programmer Hours (Weekday vs. Weekend)
python bin/ --data data/programmer-hours.csv
weekday mean 6.804375000071998
weekend mean 3.232482993312492
Ttest_indResult(statistic=12.815512046971827, pvalue=6.936182610195961e-31)

Higher standards

  • Recall discussion of \(p\) hacking from
    • If we analyze the data enough different ways, one of them will be “significant”
  • Use the Bonferroni correction
    • The more tests we do, the more stringest our significance criteria must be
  • The Central Limit Theorem states that the distribution of the mean of a sample has a normal distribution.
  • Use a Z-test or t-test to determine whether two populations are the same or different.