# Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf
# Figure styling
sns.set_style('whitegrid')
sns.set_palette('Set2')Simple Linear Regression
python implementation with residual diagnostics.
- Understand the theoretical definition of simple linear regression.
- Derive the ordinary least squares (OLS) estimators analytically.
- Understand the statistical properties of the OLS estimators.
- Implement simple linear regression in
pythonwith residual diagnostics.
Regression
What is Regression?
Suppose we are given \(i=1,...,n\) observations \((y_i, x_{1i}, ..., x_{pi})\). We define the conditional expectation function: \[ \mu(x) = \mathbb{E}[Y \mid X = x]. \]
We wish to construct an estimate for the conditional expectation function from the observed data. Since this function can take any shape we are not able to estimate it without first making some assumptions. For example, we may assume that the conditional expectation function is a linear function of the predictor variables: \[ \mu(x) = \beta_0 + \beta_1 x_1 + ... + \beta_p x_p. \]
By assuming this function structure we are able to derive closed-form parameter estimates using methods such as ordinary least squares (OLS).
Simple Linear Regression
To keep things simple, we first consider simple linear regression where we assume that \(p=1\). For \((x_i, y_i)\) observations we have the following model: \[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i = 1, \ldots, n, \]
where:
- \(Y_i\) is the response variable.
- \(x_i\) is the predictor variable.
- \(\beta_0\) is the intercept.
- \(\beta_1\) is the slope.
- \(\varepsilon_i\) is the error term.