Learn With Frahim

Why this matters: ML models turn data into predictions using math. This module gives you the tools you’ll use in preprocessing, modeling, and evaluation.

1) Why Math is Important in MLL

TBD TBD.

2) Linear Algebra for ML

Linear algebra is how we represent datasets (matrices), features (vectors), and model transformations (matrix multiplication). It’s everywhere—from linear regression to deep neural nets.

Key Concepts

Scalars, Vectors, Matrices, Tensors
Matrix Operations: addition, transpose, matrix multiplication
Dot product & cosine similarity (similarity of vectors)
Determinant & inverse (solving linear systems)
Eigenvalues/Eigenvectors & SVD (PCA, dimensionality reduction)

Detailed Explanation (with intuition & Python)

Scalars, Vectors, Matrices

A scalar is a single number. A vector is a 1D array (features of one sample). A matrix is 2D (rows = samples, cols = features). Tensors are higher-dim arrays (images, sequences).

import numpy as np

scalar = 5
vector = np.array([2, 5, 8])
matrix = np.array([
    [22, 1, 0],
    [38, 1, 1],
    [26, 0, 0],
    [35, 1, 1]
])

print("Scalar:", scalar)
print("Vector:", vector)
print("Matrix:\\n", matrix)

Matrix Multiplication & Dot Product

The dot product measures alignment/similarity of two vectors. In ML, predictions often compute y = w · x (weights dot features). Matrix multiplication stacks many dot products.

# Linear model prediction for one sample
x = np.array([22, 1, 0])      # features: e.g., [Age, Sex_male, Survived?]
w = np.array([0.03, 0.7, 0.0])# weights learned by the model
y_hat = np.dot(w, x)          # model score
print("Prediction score:", y_hat)

# Batch prediction: XW for many samples
X = matrix[:, :2]             # take first 2 columns as features
W = np.array([[0.03], [0.7]]) # weights as a column vector
scores = X @ W                # matrix multiply (same as np.matmul)
print("Scores:\\n", scores)

Cosine Similarity (Vector Similarity)

Used in recommendation/NLP. Values near 1 mean “very similar direction”.

u = np.array([1, 2, 3])
v = np.array([2, 4, 6])
cos_sim = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
print("Cosine similarity:", cos_sim)

Eigenvalues & Eigenvectors (PCA intuition)

Eigenvectors give principal directions of variance; eigenvalues tell how much variance lies along them. PCA projects data onto top eigenvectors to reduce dimensions with minimal information loss.

# Covariance matrix and eigen-decomposition
X = np.array([[22, 7.25],
              [38, 71.28],
              [26, 7.93],
              [35, 53.10]], dtype=float)  # [Age, Fare]
X = (X - X.mean(axis=0)) / X.std(axis=0)  # standardize
cov = np.cov(X.T)
vals, vecs = np.linalg.eig(cov)
print("Eigenvalues:", vals)
print("Eigenvectors:\\n", vecs)

ML Context: Every forward pass in a neural network is a chain of matrix multiplies. PCA (built on eigenvectors/values) is used to reduce features and speed up training.

3) Probability for ML

Probability models uncertainty. You’ll see it in Naive Bayes, Bayesian inference, and understanding error/noise.

Key Concepts

Random variables & distributions (Uniform, Normal, Bernoulli, Binomial, Poisson)
Conditional probability & Bayes’ theorem
Expectation (mean) & variance

Detailed Explanation (with intuition & Python)

Distributions (quick tour)

Normal: continuous, bell-curve (heights, measurement noise)
Bernoulli: single yes/no trial (clicked ad or not)
Binomial: # of successes in n Bernoulli trials (k clicks in n shows)
Poisson: event counts in fixed time/space (support tickets/hour)

import numpy as np
from scipy.stats import norm, bernoulli, binom, poisson

# Normal(0,1): mean ≈ 0, std ≈ 1 as n grows
samples = np.random.normal(loc=0, scale=1, size=10000)
print("Mean≈", round(samples.mean(), 3), "Std≈", round(samples.std(), 3))

# Bernoulli trials (p=0.3)
print("Bernoulli:", bernoulli.rvs(p=0.3, size=10))

# Binomial: P(X = 5) when n=10, p=0.5
from math import comb
n, p, k = 10, 0.5, 5
pmf_5 = comb(n,k) * (p**k) * ((1-p)**(n-k))
print("Binomial P(X=5):", pmf_5)

# Poisson samples with lambda=3
print("Poisson:", poisson.rvs(mu=3, size=5))

Conditional Probability & Bayes' Theorem

P(A|B) = P(B|A)·P(A) / P(B). In spam filtering, A=spam, B=“contains ‘win’”.

# Simple Bayes example (toy numbers)
P_spam = 0.2
P_word_given_spam = 0.9
P_word = 0.4
P_spam_given_word = (P_word_given_spam * P_spam) / P_word
print("P(spam | word):", round(P_spam_given_word, 3))

ML Context: Naive Bayes assumes conditional independence between features and uses Bayes’ rule to compute class probabilities efficiently.

4) Statistics for ML

Statistics summarizes data (descriptive) and lets us reason about populations from samples (inferential). You’ll use it for EDA, feature selection, and model evaluation.

Key Concepts

Descriptive: mean, median, mode, variance, std, quantiles
Correlation vs covariance (and why corr ≠ causation)
Inferential: hypothesis tests, p-values, confidence intervals

Detailed Explanation (with intuition & Python)

Descriptive Stats

import numpy as np
import pandas as pd
from scipy import stats

data = [7, 8, 5, 6, 9, 10, 6, 7, 8]

print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data, keepdims=True)[0][0])
print("Variance:", np.var(data, ddof=0))     # population variance
print("Std Dev:", np.std(data, ddof=0))      # population std

Correlation & Covariance

df = pd.DataFrame({
    "Study_Hours": [2, 3, 4, 5, 6],
    "Exam_Score":  [50, 60, 65, 70, 80]
})
print("Covariance:\\n", df.cov())
print("Correlation:\\n", df.corr())

Hypothesis Testing (one-sample t-test)

Test if the mean differs from 7 (e.g., average satisfaction score).

t_stat, p_value = stats.ttest_1samp(data, popmean=7)
print("t-stat:", round(t_stat,3), "p-value:", round(p_value,4))
# If p < 0.05, we reject H0 (mean == 7) at 5% significance.

ML Context: Use correlation to detect multicollinearity before regression. Use hypothesis tests to compare model/feature effects or A/B test performance.

5) Quick Reference

Linear Algebra Probability Statistics

Topic	What	When it matters	Python Hint
Dot Product	Similarity / linear score	Linear/logistic regression, NN layers	`np.dot(w, x)`
Cosine Similarity	Angle-based similarity	NLP embeddings, recommendations	`u·v / (\|\|u\|\|·\|\|v\|\|)`
Bayes' Theorem	Update belief w/ evidence	Naive Bayes, spam filters	`(P(B\|A)P(A))/P(B)`
Correlation	Linear association ∈ [−1,1]	Feature screening, EDA	`df.corr()`
Variance/Std	Spread of data	Outliers, normalization	`np.var, np.std`
PCA	Reduce dimensions	Speed, visualization	`sklearn.decomposition.PCA`

6) What’s Next (and where this math shows up)

Feature Engineering: PCA (eigenvectors), scaling (means/std), encodings.
Linear Models: dot products, gradients (linear algebra everywhere).
Evaluation: statistical tests, confidence intervals, correlation analyses.

04 - Math for Machine Learning

Module 1 – Math Essentials for Machine Learning

1) Why Math is Important in MLL

2) Linear Algebra for ML

Key Concepts

Scalars, Vectors, Matrices

Matrix Multiplication & Dot Product

Cosine Similarity (Vector Similarity)

Eigenvalues & Eigenvectors (PCA intuition)

3) Probability for ML

Key Concepts

Distributions (quick tour)

Conditional Probability & Bayes' Theorem

4) Statistics for ML

Key Concepts

Descriptive Stats

Correlation & Covariance

Hypothesis Testing (one-sample t-test)

5) Quick Reference

6) What’s Next (and where this math shows up)

Address

Quick Links