1) Why Math is Important in MLL
TBD TBD.
2) Linear Algebra for ML
Linear algebra is how we represent datasets (matrices), features (vectors), and model transformations (matrix multiplication). It’s everywhere—from linear regression to deep neural nets.
Key Concepts
- Scalars, Vectors, Matrices, Tensors
- Matrix Operations: addition, transpose, matrix multiplication
- Dot product & cosine similarity (similarity of vectors)
- Determinant & inverse (solving linear systems)
- Eigenvalues/Eigenvectors & SVD (PCA, dimensionality reduction)
Detailed Explanation (with intuition & Python)
Scalars, Vectors, Matrices
A scalar is a single number. A vector is a 1D array (features of one sample). A matrix is 2D (rows = samples, cols = features). Tensors are higher-dim arrays (images, sequences).
import numpy as np
scalar = 5
vector = np.array([2, 5, 8])
matrix = np.array([
[22, 1, 0],
[38, 1, 1],
[26, 0, 0],
[35, 1, 1]
])
print("Scalar:", scalar)
print("Vector:", vector)
print("Matrix:\\n", matrix)
Matrix Multiplication & Dot Product
The dot product measures alignment/similarity of two vectors. In ML, predictions often compute
y = w · x (weights dot features). Matrix multiplication stacks many dot products.
# Linear model prediction for one sample
x = np.array([22, 1, 0]) # features: e.g., [Age, Sex_male, Survived?]
w = np.array([0.03, 0.7, 0.0])# weights learned by the model
y_hat = np.dot(w, x) # model score
print("Prediction score:", y_hat)
# Batch prediction: XW for many samples
X = matrix[:, :2] # take first 2 columns as features
W = np.array([[0.03], [0.7]]) # weights as a column vector
scores = X @ W # matrix multiply (same as np.matmul)
print("Scores:\\n", scores)
Cosine Similarity (Vector Similarity)
Used in recommendation/NLP. Values near 1 mean “very similar direction”.
u = np.array([1, 2, 3])
v = np.array([2, 4, 6])
cos_sim = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
print("Cosine similarity:", cos_sim)
Eigenvalues & Eigenvectors (PCA intuition)
Eigenvectors give principal directions of variance; eigenvalues tell how much variance lies along them. PCA projects data onto top eigenvectors to reduce dimensions with minimal information loss.
# Covariance matrix and eigen-decomposition
X = np.array([[22, 7.25],
[38, 71.28],
[26, 7.93],
[35, 53.10]], dtype=float) # [Age, Fare]
X = (X - X.mean(axis=0)) / X.std(axis=0) # standardize
cov = np.cov(X.T)
vals, vecs = np.linalg.eig(cov)
print("Eigenvalues:", vals)
print("Eigenvectors:\\n", vecs)
3) Probability for ML
Probability models uncertainty. You’ll see it in Naive Bayes, Bayesian inference, and understanding error/noise.
Key Concepts
- Random variables & distributions (Uniform, Normal, Bernoulli, Binomial, Poisson)
- Conditional probability & Bayes’ theorem
- Expectation (mean) & variance
Detailed Explanation (with intuition & Python)
Distributions (quick tour)
- Normal: continuous, bell-curve (heights, measurement noise)
- Bernoulli: single yes/no trial (clicked ad or not)
- Binomial: # of successes in n Bernoulli trials (k clicks in n shows)
- Poisson: event counts in fixed time/space (support tickets/hour)
import numpy as np
from scipy.stats import norm, bernoulli, binom, poisson
# Normal(0,1): mean ≈ 0, std ≈ 1 as n grows
samples = np.random.normal(loc=0, scale=1, size=10000)
print("Mean≈", round(samples.mean(), 3), "Std≈", round(samples.std(), 3))
# Bernoulli trials (p=0.3)
print("Bernoulli:", bernoulli.rvs(p=0.3, size=10))
# Binomial: P(X = 5) when n=10, p=0.5
from math import comb
n, p, k = 10, 0.5, 5
pmf_5 = comb(n,k) * (p**k) * ((1-p)**(n-k))
print("Binomial P(X=5):", pmf_5)
# Poisson samples with lambda=3
print("Poisson:", poisson.rvs(mu=3, size=5))
Conditional Probability & Bayes' Theorem
P(A|B) = P(B|A)·P(A) / P(B). In spam filtering, A=spam, B=“contains ‘win’”.
# Simple Bayes example (toy numbers)
P_spam = 0.2
P_word_given_spam = 0.9
P_word = 0.4
P_spam_given_word = (P_word_given_spam * P_spam) / P_word
print("P(spam | word):", round(P_spam_given_word, 3))
4) Statistics for ML
Statistics summarizes data (descriptive) and lets us reason about populations from samples (inferential). You’ll use it for EDA, feature selection, and model evaluation.
Key Concepts
- Descriptive: mean, median, mode, variance, std, quantiles
- Correlation vs covariance (and why corr ≠ causation)
- Inferential: hypothesis tests, p-values, confidence intervals
Detailed Explanation (with intuition & Python)
Descriptive Stats
import numpy as np
import pandas as pd
from scipy import stats
data = [7, 8, 5, 6, 9, 10, 6, 7, 8]
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data, keepdims=True)[0][0])
print("Variance:", np.var(data, ddof=0)) # population variance
print("Std Dev:", np.std(data, ddof=0)) # population std
Correlation & Covariance
df = pd.DataFrame({
"Study_Hours": [2, 3, 4, 5, 6],
"Exam_Score": [50, 60, 65, 70, 80]
})
print("Covariance:\\n", df.cov())
print("Correlation:\\n", df.corr())
Hypothesis Testing (one-sample t-test)
Test if the mean differs from 7 (e.g., average satisfaction score).
t_stat, p_value = stats.ttest_1samp(data, popmean=7)
print("t-stat:", round(t_stat,3), "p-value:", round(p_value,4))
# If p < 0.05, we reject H0 (mean == 7) at 5% significance.
5) Quick Reference
Linear Algebra Probability Statistics| Topic | What | When it matters | Python Hint |
|---|---|---|---|
| Dot Product | Similarity / linear score | Linear/logistic regression, NN layers | np.dot(w, x) |
| Cosine Similarity | Angle-based similarity | NLP embeddings, recommendations | u·v / (||u||·||v||) |
| Bayes' Theorem | Update belief w/ evidence | Naive Bayes, spam filters | (P(B|A)P(A))/P(B) |
| Correlation | Linear association ∈ [−1,1] | Feature screening, EDA | df.corr() |
| Variance/Std | Spread of data | Outliers, normalization | np.var, np.std |
| PCA | Reduce dimensions | Speed, visualization | sklearn.decomposition.PCA |
6) What’s Next (and where this math shows up)
- Feature Engineering: PCA (eigenvectors), scaling (means/std), encodings.
- Linear Models: dot products, gradients (linear algebra everywhere).
- Evaluation: statistical tests, confidence intervals, correlation analyses.