1) Introduction to Deep Learning
Deep learning is about teaching computers to learn from examples—similar to how we learn. Neural networks with many layers can spot patterns in images, text, audio, and more.
- Why now? Lots of data, fast hardware (GPUs/TPUs), and friendly libraries (TensorFlow, PyTorch).
- Used in: Face ID, self-driving cars, chatbots, medical imaging, recommendation systems.
2) Neural Network Basics
2.1 Building Blocks (Neurons & Layers)
A neuron takes inputs, multiplies by weights, adds a bias, then applies an activation function to decide the output. Layers of neurons form a network.
# A tiny "neuron" demo in Python
import numpy as np
x = np.array([0.6, 0.2, 0.9]) # inputs
w = np.array([0.5, -0.3, 1.2]) # weights
b = 0.1 # bias
z = np.dot(x, w) + b # weighted sum
a = max(0, z) # ReLU activation (f(x) = max(0, x))
print("Output:", a)
2.2 Activation Functions (Why we need them)
| Activation | Where used | What it does |
|---|---|---|
| ReLU | Hidden layers | Fast, simple; helps avoid vanishing gradients |
| Sigmoid | Binary outputs | Squashes to 0–1 (good for probabilities) |
| Softmax | Multi-class outputs | Turns scores into probabilities that sum to 1 |
| Tanh | Hidden layers (sometimes) | Outputs −1 to 1 (zero-centered) |
| Leaky ReLU | Hidden layers | Like ReLU but keeps small negative slope (prevents “dying” ReLUs) |
3) Feedforward Neural Networks (FNN)
Information flows forward: input → hidden layers → output. Great for tabular data (rows/columns).
4) Convolutional Neural Networks (CNNs)
CNNs are specialized for images. They learn to detect edges, shapes, textures, and objects.
How a CNN works
- Convolution: small filters slide across the image to detect patterns.
- Activation (e.g., ReLU): add non-linearity.
- Pooling: reduce size while keeping important info (e.g., max pooling).
- Dense layers: finalize decision (e.g., “cat” vs “dog”).
Quick CNN (Keras) Example
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)),
MaxPooling2D(pool_size=(2,2)),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Flatten(),
Dense(128, activation='relu'),
Dense(1, activation='sigmoid') # binary classification
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
5) RNNs, LSTMs, and GRUs (for sequences)
RNNs handle ordered data (text, audio, time series). They “remember” previous steps while processing the next.
Challenge: Basic RNNs forget long-range info (vanishing gradients). LSTMs and GRUs fix this using “gates”.
- LSTM: Strong long-term memory via input/forget/output gates (great for complex language tasks).
- GRU: Simpler and faster than LSTM; often performs similarly.
Quick LSTM Example
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# Input shape: (timesteps, features). Example: 30 days of 1 feature each.
model = Sequential([
LSTM(64, return_sequences=True, input_shape=(30, 1)),
LSTM(64),
Dense(1) # e.g., predict next value
])
model.compile(optimizer='adam', loss='mse')
6) Activation Functions (recap)
Use ReLU/Leaky ReLU in hidden layers, Sigmoid for binary outputs, and Softmax for multi-class outputs.
7) Loss Functions (how wrong am I?)
| Problem | Loss | Why |
|---|---|---|
| Regression | MSE (Mean Squared Error) | Penalizes larger errors more (squared) |
| Binary classification | Binary Cross-Entropy | Works with probabilities (0–1) |
| Multi-class classification | Categorical Cross-Entropy | Softmax probabilities across classes |
8) Optimizers (how to learn)
| Optimizer | Good for | Notes |
|---|---|---|
| SGD | Small/simple problems | May need momentum; straightforward |
| Adam | Most projects | Adaptive learning rate; common default |
| RMSprop | RNNs/sequences | Handles non-stationary objectives well |
9) Evaluation Metrics (did I do well?)
Regression
- MSE: Average squared error.
- RMSE: √MSE (back to original units).
- R²: % of variance explained (1.0 is perfect).
Classification
- Confusion Matrix: TP, TN, FP, FN.
- Accuracy: (TP + TN) / Total.
- Precision: Of predicted positives, how many were correct? (TP / (TP+FP))
- Recall: Of actual positives, how many did we catch? (TP / (TP+FN))
- F1: Balance between precision and recall.
Mini Example: Confusion Matrix
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
y_true = np.array([1,0,1,1,0,0,1,0])
y_pred = np.array([1,0,1,0,0,1,1,0])
print(confusion_matrix(y_true, y_pred))
print(classification_report(y_true, y_pred))
10) Frameworks: TensorFlow & PyTorch
• Great for production and scaling.
• High-level API (
tf.keras) is beginner friendly.import tensorflow as tf
print(tf.__version__)
• Popular in research, dynamic graphs, easy debugging.
import torch
print(torch.__version__)
11) Summary Table
| Model Type | Best For | Key Metrics |
|---|---|---|
| FNN | Tabular data | MSE/RMSE (reg), Accuracy/F1 (clf) |
| CNN | Images & spatial data | Accuracy, F1, IoU (for detection/segm) |
| RNN | Sequences (text, audio) | Accuracy/F1, Perplexity (NLP) |
| LSTM/GRU | Long dependencies | Accuracy/F1, RMSE (time series) |
12) Try It Yourself
- Build a simple CNN for MNIST digits (handwritten 0–9).
- Train an LSTM to predict the next word in a sentence.
- Compare ReLU vs. Sigmoid on a small tabular dataset.