+91 9873530045
admin@learnwithfrahimcom
Mon - Sat : 09 AM - 09 PM

04 - Math for Machine Learning

Module 1 – Math Essentials for Machine Learning

Module 1 – Math Essentials for Machine Learning

Focused, practical math: Linear Algebra, Probability, and Statistics with hands-on Python.

Why this matters: ML models turn data into predictions using math. This module gives you the tools you’ll use in preprocessing, modeling, and evaluation.

1) Why Math is Important in MLL

TBD TBD.

2) Linear Algebra for ML

Linear algebra is how we represent datasets (matrices), features (vectors), and model transformations (matrix multiplication). It’s everywhere—from linear regression to deep neural nets.

Key Concepts

  • Scalars, Vectors, Matrices, Tensors
  • Matrix Operations: addition, transpose, matrix multiplication
  • Dot product & cosine similarity (similarity of vectors)
  • Determinant & inverse (solving linear systems)
  • Eigenvalues/Eigenvectors & SVD (PCA, dimensionality reduction)
Detailed Explanation (with intuition & Python)

Scalars, Vectors, Matrices

A scalar is a single number. A vector is a 1D array (features of one sample). A matrix is 2D (rows = samples, cols = features). Tensors are higher-dim arrays (images, sequences).

import numpy as np

scalar = 5
vector = np.array([2, 5, 8])
matrix = np.array([
    [22, 1, 0],
    [38, 1, 1],
    [26, 0, 0],
    [35, 1, 1]
])

print("Scalar:", scalar)
print("Vector:", vector)
print("Matrix:\\n", matrix)

Matrix Multiplication & Dot Product

The dot product measures alignment/similarity of two vectors. In ML, predictions often compute y = w · x (weights dot features). Matrix multiplication stacks many dot products.

# Linear model prediction for one sample
x = np.array([22, 1, 0])      # features: e.g., [Age, Sex_male, Survived?]
w = np.array([0.03, 0.7, 0.0])# weights learned by the model
y_hat = np.dot(w, x)          # model score
print("Prediction score:", y_hat)

# Batch prediction: XW for many samples
X = matrix[:, :2]             # take first 2 columns as features
W = np.array([[0.03], [0.7]]) # weights as a column vector
scores = X @ W                # matrix multiply (same as np.matmul)
print("Scores:\\n", scores)

Cosine Similarity (Vector Similarity)

Used in recommendation/NLP. Values near 1 mean “very similar direction”.

u = np.array([1, 2, 3])
v = np.array([2, 4, 6])
cos_sim = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
print("Cosine similarity:", cos_sim)

Eigenvalues & Eigenvectors (PCA intuition)

Eigenvectors give principal directions of variance; eigenvalues tell how much variance lies along them. PCA projects data onto top eigenvectors to reduce dimensions with minimal information loss.

# Covariance matrix and eigen-decomposition
X = np.array([[22, 7.25],
              [38, 71.28],
              [26, 7.93],
              [35, 53.10]], dtype=float)  # [Age, Fare]
X = (X - X.mean(axis=0)) / X.std(axis=0)  # standardize
cov = np.cov(X.T)
vals, vecs = np.linalg.eig(cov)
print("Eigenvalues:", vals)
print("Eigenvectors:\\n", vecs)
ML Context: Every forward pass in a neural network is a chain of matrix multiplies. PCA (built on eigenvectors/values) is used to reduce features and speed up training.

3) Probability for ML

Probability models uncertainty. You’ll see it in Naive Bayes, Bayesian inference, and understanding error/noise.

Key Concepts

  • Random variables & distributions (Uniform, Normal, Bernoulli, Binomial, Poisson)
  • Conditional probability & Bayes’ theorem
  • Expectation (mean) & variance
Detailed Explanation (with intuition & Python)

Distributions (quick tour)

  • Normal: continuous, bell-curve (heights, measurement noise)
  • Bernoulli: single yes/no trial (clicked ad or not)
  • Binomial: # of successes in n Bernoulli trials (k clicks in n shows)
  • Poisson: event counts in fixed time/space (support tickets/hour)
import numpy as np
from scipy.stats import norm, bernoulli, binom, poisson

# Normal(0,1): mean ≈ 0, std ≈ 1 as n grows
samples = np.random.normal(loc=0, scale=1, size=10000)
print("Mean≈", round(samples.mean(), 3), "Std≈", round(samples.std(), 3))

# Bernoulli trials (p=0.3)
print("Bernoulli:", bernoulli.rvs(p=0.3, size=10))

# Binomial: P(X = 5) when n=10, p=0.5
from math import comb
n, p, k = 10, 0.5, 5
pmf_5 = comb(n,k) * (p**k) * ((1-p)**(n-k))
print("Binomial P(X=5):", pmf_5)

# Poisson samples with lambda=3
print("Poisson:", poisson.rvs(mu=3, size=5))

Conditional Probability & Bayes' Theorem

P(A|B) = P(B|A)·P(A) / P(B). In spam filtering, A=spam, B=“contains ‘win’”.

# Simple Bayes example (toy numbers)
P_spam = 0.2
P_word_given_spam = 0.9
P_word = 0.4
P_spam_given_word = (P_word_given_spam * P_spam) / P_word
print("P(spam | word):", round(P_spam_given_word, 3))
ML Context: Naive Bayes assumes conditional independence between features and uses Bayes’ rule to compute class probabilities efficiently.

4) Statistics for ML

Statistics summarizes data (descriptive) and lets us reason about populations from samples (inferential). You’ll use it for EDA, feature selection, and model evaluation.

Key Concepts

  • Descriptive: mean, median, mode, variance, std, quantiles
  • Correlation vs covariance (and why corr ≠ causation)
  • Inferential: hypothesis tests, p-values, confidence intervals
Detailed Explanation (with intuition & Python)

Descriptive Stats

import numpy as np
import pandas as pd
from scipy import stats

data = [7, 8, 5, 6, 9, 10, 6, 7, 8]

print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data, keepdims=True)[0][0])
print("Variance:", np.var(data, ddof=0))     # population variance
print("Std Dev:", np.std(data, ddof=0))      # population std

Correlation & Covariance

df = pd.DataFrame({
    "Study_Hours": [2, 3, 4, 5, 6],
    "Exam_Score":  [50, 60, 65, 70, 80]
})
print("Covariance:\\n", df.cov())
print("Correlation:\\n", df.corr())

Hypothesis Testing (one-sample t-test)

Test if the mean differs from 7 (e.g., average satisfaction score).

t_stat, p_value = stats.ttest_1samp(data, popmean=7)
print("t-stat:", round(t_stat,3), "p-value:", round(p_value,4))
# If p < 0.05, we reject H0 (mean == 7) at 5% significance.
ML Context: Use correlation to detect multicollinearity before regression. Use hypothesis tests to compare model/feature effects or A/B test performance.

5) Quick Reference

Linear Algebra Probability Statistics
TopicWhatWhen it mattersPython Hint
Dot Product Similarity / linear score Linear/logistic regression, NN layers np.dot(w, x)
Cosine Similarity Angle-based similarity NLP embeddings, recommendations u·v / (||u||·||v||)
Bayes' Theorem Update belief w/ evidence Naive Bayes, spam filters (P(B|A)P(A))/P(B)
Correlation Linear association ∈ [−1,1] Feature screening, EDA df.corr()
Variance/Std Spread of data Outliers, normalization np.var, np.std
PCA Reduce dimensions Speed, visualization sklearn.decomposition.PCA

6) What’s Next (and where this math shows up)

  • Feature Engineering: PCA (eigenvectors), scaling (means/std), encodings.
  • Linear Models: dot products, gradients (linear algebra everywhere).
  • Evaluation: statistical tests, confidence intervals, correlation analyses.
© 2025 Machine Learning Course | Module 1 – Math Essentials