PSTAT100: Data Science - Concepts and Analysis
May 6, 2026
Model
A model is a simplified, mathematical representation of a real-world system or process. It captures the essential features of the system while abstracting away unnecessary detail.
Deterministic Model
A deterministic model produces the same output for the same input — there is no randomness in the model.
Probabilistic Model
A probabilistic model introduces randomness explicitly. Outputs are described by probability distributions rather than fixed values.
Statistical Model
A statistical model is a probabilistic model whose parameters are unknown and estimated from data. The process of finding parameter values that best explain the data is called model fitting.
We assume the data-generating process is:
\[ Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad \varepsilon_i \sim \text{Normal}(0, \sigma^2). \]
The parameters \(\beta_0, \beta_1, \sigma^2\) are unknown and estimated from the observed data \((x_1, y_1), \ldots, (x_n, y_n)\).
Machine Learning Model
A machine learning model learns patterns directly from data using optimization algorithms. These models are often highly flexible and prioritise predictive performance over interpretability.
| Statistical Models | Machine Learning | |
|---|---|---|
| Primary goal | Inference & understanding | Prediction |
| Interpretability | High | Often low |
| Assumptions | Explicit | Implicit |
| Sample size | Can work with small \(n\) | Often requires large \(n\) |
| Uncertainty | Quantified | Often not |
Sample Space and Events
The sample space \(\Omega\) is the set of all possible outcomes of a random experiment. An event is any subset \(A \subseteq \Omega\) (i.e. a collection of outcomes).
A probability measure \(\mathbb{P}\) assigns a number in \([0, 1]\) to each event and satisfies the Kolmogorov Axioms:
Don’t worry too much about these definitions, you will not need to write proofs for this class!
Random Variable
A random variable \(X\) is a function \(X \colon \Omega \to \mathbb{R}\) that maps each outcome in the sample space to a real number.
Probability Mass Function (PMF)
The PMF of a discrete random variable \(X\) is: \[p(x) = \mathbb{P}(X = x), \quad \text{for all } x \text{ in the support of } X.\]
Define the random variable \(X\) describing the result of a random experiment whereby we throw a fair 6-sided dice.
We will often need to work with random variables in python:
Probability Density Function (PDF)
The PDF of a continuous random variable \(X\) is a function \(f(x) \geq 0\) such that: \[P(a \leq X \leq b) = \int_a^b f(x)\, dx.\]
Cumulative Distribution Function (CDF)
The CDF of a random variable \(X\) is: \[F(x) = P(X \leq x), \quad x \in \mathbb{R}.\]
\[P(a < X \leq b) = F(b) - F(a).\]
Expectation
The expectation (or mean) of a random variable \(X\) is: \[\mathbb{E}[X] = \begin{cases} \displaystyle\sum_x x \cdot p(x) & \text{(discrete)} \\[6pt] \displaystyle\int_{-\infty}^{\infty} x \cdot f(x)\, dx & \text{(continuous).} \end{cases}\]
Variance
The variance of a random variable \(X\) is: \[\text{Var}(X) = \mathbb{E}\!\left[(X - \mathbb{E}[X])^2\right] = \mathbb{E}[X^2] - \bigl(\mathbb{E}[X]\bigr)^2.\]
Conditional Probability
The conditional probability of event \(A\) given event \(B\) (with \(P(B) > 0\)) is: \[P(A \mid B) = \frac{P(A \cap B)}{P(B)}.\]
\[P(A \cap B) = P(A \mid B)\, P(B) = P(B \mid A)\, P(A).\]
Bayes’ Theorem
\[P(A \mid B) = \frac{P(B \mid A)\, P(A)}{P(B)}.\]
Independence
Events \(A\) and \(B\) are independent if: \[P(A \cap B) = P(A)\, P(B), \quad \text{equivalently} \quad P(A \mid B) = P(A).\]
Random variables \(X\) and \(Y\) are independent if knowing the value of one provides no information about the other.
Conditional Expectation
The conditional expectation of \(Y\) given \(X = x\) is: \[\mathbb{E}[Y \mid X = x] = \begin{cases} \displaystyle\sum_y y \cdot P(Y = y \mid X = x) & \text{(discrete)} \\[6pt] \displaystyle\int_{-\infty}^{\infty} y \cdot f_{Y \mid X}(y \mid x)\, dy & \text{(continuous).} \end{cases}\]
\[\mathbb{E}[aX+bY|X] = aX + b\mathbb{E}[Y|X].\]
\[\mathbb{E}[Y] = \mathbb{E}\!\bigl[\mathbb{E}[Y \mid X]\bigr].\]
Averaging conditional expectations over \(X\) recovers the unconditional mean.
Joint Distribution
The joint distribution of \((X, Y)\) describes the probability of all pairs of outcomes simultaneously.
The marginal distribution of \(X\) is obtained by integrating (or summing) out \(Y\):
\[f_{Y \mid X}(y \mid x) = \frac{f(x, y)}{f_X(x)}, \quad f_X(x) > 0.\]
Binomial Distribution
If \(X\) counts the number of successes in \(n\) independent Bernoulli trials, each with success probability \(p\), then \(X \sim \text{Binomial}(n, p)\) with PMF: \[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n.\]
\[\mathbb{E}[X] = np, \qquad \text{Var}(X) = np(1-p).\]
fig, axes = plt.subplots(1, 3, figsize=(14, 4), sharey=False)
params = [(10, 0.3), (10, 0.5), (20, 0.7)]
for ax, (n, p) in zip(axes, params):
k = np.arange(0, n + 1)
ax.bar(k, binom.pmf(k, n, p), color='steelblue', edgecolor='white', width=0.6)
ax.axvline(n * p, color='crimson', linestyle='--', label=f'Mean = {n*p:.1f}')
ax.set_title(f'Binomial(n={n}, p={p})')
ax.set_xlabel('k'); ax.set_ylabel('P(X = k)')
ax.legend(fontsize=9)
plt.tight_layout()
plt.show()Gaussian (Normal) Distribution
A random variable \(X \sim \text{Normal}(\mu, \sigma^2)\) has PDF: \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}.\]
\[\mathbb{E}[X] = \mu, \qquad \text{Var}(X) = \sigma^2.\]
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left: varying parameters
x = np.linspace(-7, 7, 500)
for mu, sigma, color, label in [(0, 1, 'steelblue', 'μ=0, σ=1'),
(0, 2, 'crimson', 'μ=0, σ=2'),
(2, 1, 'seagreen', 'μ=2, σ=1')]:
axes[0].plot(x, norm.pdf(x, mu, sigma), color=color, lw=2, label=label)
axes[0].set_title('Normal Distributions')
axes[0].set_xlabel('x'); axes[0].set_ylabel('f(x)')
axes[0].legend()
# Right: 68-95-99.7 rule
x2 = np.linspace(-4, 4, 500)
axes[1].plot(x2, norm.pdf(x2), 'k', lw=2)
for n_sig, color, label in [(3, '#d0e8ff', '±3σ (99.7%)'),
(2, '#85b9f5', '±2σ (95%)'),
(1, '#2f6fbd', '±1σ (68%)')]:
xf = np.linspace(-n_sig, n_sig, 300)
axes[1].fill_between(xf, norm.pdf(xf), alpha=0.65, color=color, label=label)
axes[1].set_title('68-95-99.7 Rule')
axes[1].set_xlabel('x'); axes[1].set_ylabel('f(x)')
axes[1].legend()
plt.tight_layout()
plt.show()Multivariate Gaussian Distribution
A random vector \(\mathbf{X} = (X_1, \ldots, X_d)^\top \sim \text{Normal}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) has PDF: \[f(\mathbf{x}) = \frac{1}{(2\pi)^{d/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right),\] where \(\boldsymbol{\mu} \in \mathbb{R}^d\) is the mean vector and \(\boldsymbol{\Sigma} \in \mathbb{R}^{d \times d}\) is the covariance matrix (symmetric, positive semi-definite).
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
configs = [
{'title': 'Independent\n(Σ = I)', 'cov': [[1, 0 ], [0, 1 ]]},
{'title': 'Positive correlation\n(ρ = 0.8)', 'cov': [[1, 0.8 ], [0.8, 1 ]]},
{'title': 'Negative correlation\n(ρ = −0.8)', 'cov': [[1, -0.8 ], [-0.8, 1 ]]},
]
x1, x2 = np.mgrid[-3:3:0.05, -3:3:0.05]
pos = np.dstack((x1, x2))
for ax, cfg in zip(axes, configs):
rv = multivariate_normal(mean=[0, 0], cov=cfg['cov'])
ax.contourf(x1, x2, rv.pdf(pos), levels=15, cmap='Blues')
ax.set_title(cfg['title'])
ax.set_xlabel('$X_1$'); ax.set_ylabel('$X_2$')
plt.tight_layout()
plt.show()| Concept | Formula | Role in Modelling |
|---|---|---|
| PMF / PDF | \(p(x)\), \(f(x)\) | Describes the distribution of \(X\) |
| CDF | \(F(x) = P(X \leq x)\) | Computes probabilities |
| Expectation | \(\mathbb{E}[X]\) | Population mean |
| Variance | \(\text{Var}(X) = \mathbb{E}[(X-\mathbb{E}X)^2]\) | Population spread |
| Conditional Prob. | \(P(A \mid B) = P(A\cap B)/P(B)\) | Updating beliefs |
| Conditional Exp. | \(\mathbb{E}[Y \mid X]\) | Regression function |
| Joint / Marginal | \(f(x,y)\), \(f_X(x)\) | Multivariate models |