Step 1: Sample Dataset
We will use the following sales data (CSV/Excel format) for examples:
| OrderID |
Customer |
Product |
Quantity |
Price |
Date |
| 101 |
Alice |
Laptop |
2 |
1200 |
2025-08-01 |
| 102 |
Bob |
Mouse |
5 |
25 |
2025-08-02 |
| 103 |
Charlie |
Keyboard |
3 |
45 |
2025-08-03 |
| 104 |
Alice |
Monitor |
1 |
300 |
2025-08-04 |
| 105 |
Bob |
Headset |
2 |
80 |
2025-08-05 |
Step 2: Loading Data with Pandas
import pandas as pd # Import pandas library
# Load CSV file
df = pd.read_csv("sales_data.csv")
# Display first 5 rows to check data
print(df.head())
# Check data types of each column
print(df.dtypes)
Step 3: Inspecting & Basic Commands
# Display first 3 rows
df.head(3)
# Display last 3 rows
df.tail(3)
# Get number of rows and columns
print(df.shape)
# Display column names
print(df.columns)
# Get summary statistics for numerical columns
print(df.describe())
Step 4: Data Cleaning
# Check for missing values
print(df.isnull().sum()) # Count missing values per column
# Fill missing Quantity with 0
df['Quantity'].fillna(0, inplace=True)
# Remove duplicate rows
df.drop_duplicates(inplace=True)
# Rename column 'Price' to 'UnitPrice'
df.rename(columns={'Price':'UnitPrice'}, inplace=True)
Step 5: Data Manipulation
# Filter rows for Alice
alice_orders = df[df['Customer']=='Alice']
print(alice_orders)
# Add new column: TotalPrice = Quantity * UnitPrice
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']
print(df)
# Group by Customer and sum TotalPrice
customer_total = df.groupby('Customer')['TotalPrice'].sum()
print(customer_total)
Step 6: NumPy Integration
import numpy as np
# Convert Quantity column to numpy array for fast computation
quantities = df['Quantity'].to_numpy()
print(quantities)
# Calculate mean and standard deviation
print("Mean Quantity:", np.mean(quantities))
print("Std Dev Quantity:", np.std(quantities))
Step 7: Visualization with Matplotlib
import matplotlib.pyplot as plt
# Plot TotalPrice per Customer
customer_total.plot(kind='bar', color='skyblue') # Bar chart
plt.title("Total Sales per Customer") # Add title
plt.xlabel("Customer") # X-axis label
plt.ylabel("Total Sales") # Y-axis label
plt.show()
Graph Output:
Step 8: Visualization with Seaborn
import seaborn as sns
# Scatter plot Quantity vs UnitPrice by Customer
sns.scatterplot(data=df, x='Quantity', y='UnitPrice', hue='Customer', s=100)
plt.title("Quantity vs UnitPrice")
plt.show()
# Correlation heatmap
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()
Graph Outputs:
Cheat Sheet: Pandas, NumPy & Visualization Commands
- Loading: pd.read_csv(), pd.read_excel()
- Inspect: df.head(), df.tail(), df.shape, df.info(), df.describe()
- Clean: isnull(), fillna(), dropna(), drop_duplicates(), rename()
- Filter & Manipulate: df[df['Col']==val], df['NewCol']=df['A']*df['B'], groupby()
- NumPy: np.array(), to_numpy(), np.mean(), np.std()
- Matplotlib: plot(), bar(), scatter(), show()
- Seaborn: scatterplot(), heatmap(), barplot()
✔ End of Day 8 – You now understand Pandas, NumPy, Matplotlib & Seaborn with hands-on examples, data cleaning, manipulation, and visualization.