Introduction

6.1. Introduction#

6.1.1. About This Chapter#

In this chapter we introduce the basic tools of statistical inference: confidence intervals, hypothesis tests, and linear regression. Our goal is to show how each of these can be more fully understood using the computational tools of Python and Jupyter notebooks.

We have chosen to write the following sections as an interactive demonstration of the tools of statistical inference for students. Rather than a simple guide to the statistical inference tools in Python, we provide code and plots which show how you can illustrate the theory in your classroom and use the tools to solve relevant problems. We have striven as much as possible to come up with examples that are interesting to students and relevant to an international audience, but you might choose to make some changes to the examples to suit your own context and audience.

6.1.2. Statistics Packages in Python#

The main Python packages that provide statistics tools are scipy and statsmodels. Unfortunately, at the time of writing there is no single package that provides all the functionality we need, so we must use a mix of these. These are loaded into the Python environment through the following commands, which appear at the top of each of the interactive notebooks. These commands will only run if you have the packages installed in your environment. You can find instructions for how to do that in the introductory sections of this book.

import scipy
import statsmodels.api # appear to need to import the api as well as the library itself for the interpreter to find the modules
import statsmodels as sm

Using scipy’s stats modules, one can access the features related to random variables and distributions, e.g., scipy.stats.norm for the normal distribution. From there you can randomly sample from the distribution, calculate probabilities with the .cdf(...) command, and even get the density function using .pdf(...). Each of these are illustrated later in the text. We also use scipy.stats for calculating confidence intervals with the normal distribution, and one sample T-tests.

The statsmodels library provides a range of functionality for high level statistics, but here we only make use of a few of its functions. We use the statsmodels library for hypothesis tests about a population proportion, for two-sample T-tests, and for linear regression.

For static plots, we are primarily using seaborn and matplotlib. You can find more information on plotting and statistical graphs using Python earlier in this book.

Some plots and figures in the text are mostly for teaching and illustrating the theory, rather than demonstrating a statistician’s actual work process, and you will find that the code for these plots is hidden by default. A number of these plots are interactive, generated using the plotly package.

We import the graphics libraries in the following way.

import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn as sns
%matplotlib inline

import plotly.graph_objects as go
import plotly.offline
plotly.offline.init_notebook_mode(connected=True)

Finally, we also make use of the numpy, pandas and math libraries, and assume the reader is familiar with these. You can find more information on these libraries in the Introduction to Python sections earlier in the book.

# import libraries
# Always run this cell first!
import numpy as np
import pandas as pd
import math

Introduction

Contents

6.1. Introduction#

6.1.1. About This Chapter#

6.1.2. Statistics Packages in Python#