+91 9873530045
admin@learnwithfrahimcom
Mon - Sat : 09 AM - 09 PM

Day 8: Data Handling with Pandas, NumPy, and Visualization


Step 1: Sample Dataset

We will use the following sales data (CSV/Excel format) for examples:

OrderID Customer Product Quantity Price Date
101 Alice Laptop 2 1200 2025-08-01
102 Bob Mouse 5 25 2025-08-02
103 Charlie Keyboard 3 45 2025-08-03
104 Alice Monitor 1 300 2025-08-04
105 Bob Headset 2 80 2025-08-05

Step 2: Loading Data with Pandas

import pandas as pd # Import pandas library

# Load CSV file
df = pd.read_csv("sales_data.csv")

# Display first 5 rows to check data
print(df.head())

# Check data types of each column
print(df.dtypes)

Step 3: Inspecting & Basic Commands

# Display first 3 rows
df.head(3)

# Display last 3 rows
df.tail(3)

# Get number of rows and columns
print(df.shape)

# Display column names
print(df.columns)

# Get summary statistics for numerical columns
print(df.describe())

Step 4: Data Cleaning

# Check for missing values
print(df.isnull().sum()) # Count missing values per column

# Fill missing Quantity with 0
df['Quantity'].fillna(0, inplace=True)

# Remove duplicate rows
df.drop_duplicates(inplace=True)

# Rename column 'Price' to 'UnitPrice'
df.rename(columns={'Price':'UnitPrice'}, inplace=True)

Step 5: Data Manipulation

# Filter rows for Alice
alice_orders = df[df['Customer']=='Alice']
print(alice_orders)

# Add new column: TotalPrice = Quantity * UnitPrice
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']
print(df)

# Group by Customer and sum TotalPrice
customer_total = df.groupby('Customer')['TotalPrice'].sum()
print(customer_total)

Step 6: NumPy Integration

import numpy as np

# Convert Quantity column to numpy array for fast computation
quantities = df['Quantity'].to_numpy()
print(quantities)

# Calculate mean and standard deviation
print("Mean Quantity:", np.mean(quantities))
print("Std Dev Quantity:", np.std(quantities))

Step 7: Visualization with Matplotlib

import matplotlib.pyplot as plt

# Plot TotalPrice per Customer
customer_total.plot(kind='bar', color='skyblue') # Bar chart
plt.title("Total Sales per Customer") # Add title
plt.xlabel("Customer") # X-axis label
plt.ylabel("Total Sales") # Y-axis label
plt.show()

Graph Output:

Bar chart of total sales per customer

Step 8: Visualization with Seaborn

import seaborn as sns

# Scatter plot Quantity vs UnitPrice by Customer
sns.scatterplot(data=df, x='Quantity', y='UnitPrice', hue='Customer', s=100)
plt.title("Quantity vs UnitPrice")
plt.show()

# Correlation heatmap
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()

Graph Outputs:

Scatter plot Quantity vs UnitPrice Correlation heatmap

Cheat Sheet: Pandas, NumPy & Visualization Commands

  • Loading: pd.read_csv(), pd.read_excel()
  • Inspect: df.head(), df.tail(), df.shape, df.info(), df.describe()
  • Clean: isnull(), fillna(), dropna(), drop_duplicates(), rename()
  • Filter & Manipulate: df[df['Col']==val], df['NewCol']=df['A']*df['B'], groupby()
  • NumPy: np.array(), to_numpy(), np.mean(), np.std()
  • Matplotlib: plot(), bar(), scatter(), show()
  • Seaborn: scatterplot(), heatmap(), barplot()
✔ End of Day 8 – You now understand Pandas, NumPy, Matplotlib & Seaborn with hands-on examples, data cleaning, manipulation, and visualization.