+91 9873530045
admin@learnwithfrahimcom
Mon - Sat : 09 AM - 09 PM

05 - Module 03 - Advance Machine Learning

Module 3: Advanced Machine Learning Techniques

Module 3: Advanced Machine Learning Techniques

Ensemble Methods • Support Vector Machines • Feature Engineering & Feature Selection

Prerequisites

  • Module 1 (Math essentials) and Module 2 (Supervised learning) fundamentals
  • Comfort with Python, NumPy, Pandas, Matplotlib / Seaborn
  • Experience using scikit-learn (train/test split, basic model API)
Goal: after this module learners will understand ensemble strategies (bagging, boosting, random forest), how SVMs work and when to use them, and practical feature engineering & selection techniques to improve model performance.

1. Ensemble Methods

Ensemble methods combine multiple base models to improve stability and predictive performance. We'll cover Bagging, Boosting, and Random Forests.

1.1 Bagging (Bootstrap Aggregating)

Concept: train multiple models on different bootstrap samples (sampling with replacement) and aggregate predictions.

Why it helps: reduces variance (averaging many high-variance learners makes predictions more stable).

Workflow

  1. Create many bootstrap samples from the training data.
  2. Train a base learner (often decision trees) on each sample.
  3. Aggregate predictions: average for regression, majority vote for classification.

Python example (BaggingClassifier)

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

bagging = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=50,
    random_state=42
)
bagging.fit(X_train, y_train)
y_pred = bagging.predict(X_test)
print("Bagging accuracy:", accuracy_score(y_test, y_pred))

Real-life use cases

  • Medical diagnostics where you want more stable predictions than a single tree
  • Ensemble for high-variance models on noisy datasets

1.2 Boosting

Concept: build models sequentially; each new model focuses on mistakes of the previous ones.

Effect: reduces bias and often yields very high accuracy, but can overfit if not regularized.

Common boosting algorithms

  • AdaBoost — reweights samples iteratively based on errors
  • Gradient Boosting (GBM) — fits new models to the residuals (negative gradients)
  • XGBoost / LightGBM / CatBoost — optimized, faster, often higher accuracy in practice

Python example (AdaBoost)

from sklearn.ensemble import AdaBoostClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

adaboost = AdaBoostClassifier(n_estimators=100, random_state=42)
adaboost.fit(X_train, y_train)
y_pred = adaboost.predict(X_test)
print("AdaBoost accuracy:", accuracy_score(y_test, y_pred))

When to use boosting?

  • When you need better predictive performance and have enough data
  • Common in tabular data competitions and production models (XGBoost / LightGBM)

1.3 Random Forests

Concept: an ensemble of decision trees trained on bootstrap samples where each split considers a random subset of features.

Why it helps: reduces variance (bagging) and decorrelates trees via feature randomness, giving robust performance and built-in feature importance.

Python example (RandomForestClassifier)

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred))

Notes & tips

  • Random Forests are strong off-the-shelf models for many problems
  • They handle categorical (if encoded), numerical, and missing values reasonably well
  • Use feature_importances_ to inspect important predictors
Practical tip: If you need extremely fast training on large datasets, consider LightGBM or XGBoost (they implement boosting efficiently). For interpretability, Random Forests and shallow trees are easier to inspect.

2. Support Vector Machines (SVM)

SVMs find a hyperplane that best separates classes by maximizing the margin between them. They work well in high-dimensional spaces and with clear margin separation.

2.1 Intuition & math (high level)

The margin width is 2 / ||w||. SVMs minimize ||w|| while allowing some classification errors (soft-margin) controlled by parameter C.

2.2 Kernels

When data are not linearly separable in input space, kernels map inputs into a higher-dimensional space:

  • Linear — when data are linearly separable
  • RBF (Gaussian) — popular default; handles curved boundaries
  • Polynomial — for polynomial decision boundaries

2.3 Python example (SVM)

from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

wine = load_wine()
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3, random_state=42)

svm = SVC(kernel='rbf', C=1.0, gamma='scale')
svm.fit(X_train, y_train)
y_pred = svm.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

2.4 When to use SVM

  • Small-to-medium sized datasets with clear margins or high-dimensional feature spaces
  • Problems where outliers are not dominant (SVMs can be sensitive to noise)
  • Text classification with TF-IDF vectors (often effective)
Practical note: SVMs with RBF kernel require careful tuning of C and gamma (or use an automated search like GridSearchCV).

3. Feature Engineering & Feature Selection

Transforming raw data into features that better represent the underlying problem can substantially improve model performance. Feature selection simplifies models, reduces overfitting and speeds up training.

3.1 Feature Engineering techniques

  • Encoding categorical variables: One-Hot, Label Encoding, Ordinal Encoding, Target Encoding (careful with leakage)
  • Scaling numeric features: StandardScaler, MinMaxScaler, RobustScaler
  • Create interaction features: multiplication or concatenation of features
  • Polynomial / basis expansions for non-linear models
  • Time features: extract hour/day/month, cyclic encoding for time-of-day
  • Text features: TF-IDF, embeddings

Python example: encoding and scaling

import pandas as pd
from sklearn.preprocessing import OneHotEncoder, StandardScaler

df = pd.DataFrame({
    'Color': ['Red','Blue','Green','Blue'],
    'Size': [10, 20, 30, 25]
})

# One-hot encode Color
ohe = OneHotEncoder(sparse_output=False)
encoded = ohe.fit_transform(df[['Color']])
encoded_df = pd.DataFrame(encoded, columns=ohe.get_feature_names_out(['Color']))

# Scale Size
scaler = StandardScaler()
scaled_size = scaler.fit_transform(df[['Size']])

print("Encoded:\n", encoded_df)
print("Scaled Size:\n", scaled_size)

3.2 Feature Selection techniques

Three broad families:

  • Filter methods — independent of model (correlation threshold, chi-square, mutual information)
  • Wrapper methods — use a model to evaluate subsets (RFE)
  • Embedded methods — selection occurs during model training (Lasso, tree-based importance)

Python example: Recursive Feature Elimination (RFE)

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=500)
rfe = RFE(model, n_features_to_select=5)
rfe.fit(X, y)  # X,y from your dataset
print("Selected features mask:", rfe.support_)
print("Feature ranking:", rfe.ranking_)

Notes on feature selection

  • Always perform selection inside cross-validation to avoid selection bias
  • Tree-based models give good feature importance estimates (but can be biased for high-cardinality categorical features)
  • L1 regularization (Lasso) can zero out features — useful for high-dimensional sparse data
Practical workflow suggestion:
  1. Start with domain-aware feature engineering (dates, aggregations, group stats)
  2. Apply simple filters (remove constant or near-constant features)
  3. Use embedded or wrapper methods inside CV to refine selected features

4. Summary Table

Technique Type Metrics Used Key Advantages / Typical Uses
Bagging Ensemble Accuracy, RMSE, R² Reduces variance; use with high-variance base learners (e.g., trees)
Boosting (AdaBoost, GBM, XGBoost) Ensemble Accuracy, F1-score, AUC Reduces bias; very powerful for tabular data; careful tuning required
Random Forest Ensemble Accuracy, Feature Importance Robust, handles large feature sets, built-in importance
Support Vector Machine (SVM) Classifier / Regressor Accuracy, Precision/Recall, F1, AUC Effective in high-dimensional spaces; kernel trick handles non-linearities
Feature Engineering Preprocessing Depends on downstream model metrics Transforms raw data into predictive signals; essential step in pipelines
Feature Selection (RFE, Lasso, tree-based) Preprocessing/Embedded Model performance metrics (AUC, R², RMSE) Reduces dimensionality, improves generalization and speed

5. Final notes & best practices

  • Always start with a simple baseline model (e.g., logistic regression / small tree) before moving to complex ensembles.
  • Carefully split data into train / validation / test; use cross-validation for robust estimates.
  • Scale features when required (SVM, KNN, linear models with regularization).
  • Be mindful of data leakage — never use target information when creating features for train/test splits.
  • Use feature importance and model-agnostic explainability tools (SHAP, LIME) to interpret ensemble models.
© 2025 Machine Learning Course | Module 3 — Advanced Techniques