Pandas Tutorial Data Analysis & Manipulation with Python

Zaheer Ahmad Mar 26, 2026 5 min read min read

Python

Introduction

Welcome to this Pandas Tutorial: Data Analysis & Manipulation with Python — a complete guide designed especially for Pakistani students stepping into the world of data science.

If you’ve ever worked with Excel sheets, you already understand rows, columns, and tables. Now imagine doing all of that — but faster, smarter, and with automation using Python. That’s exactly what pandas python offers.

Pandas is one of the most powerful Python libraries used for:

Data analysis
Data cleaning
Data manipulation
Data visualization preparation

Whether you're analyzing student results in Lahore, sales data in Karachi, or survey responses in Islamabad, pandas helps you turn raw data into meaningful insights.

Why should Pakistani students learn pandas?

High demand in data science and AI jobs
Used in freelancing platforms (Fiverr, Upwork)
Essential for fields like finance, healthcare, and business analytics
Works well with other tools like NumPy and Matplotlib

Prerequisites

Before starting this pandas dataframe guide, you should have:

Basic knowledge of Python (variables, loops, functions)
Understanding of lists and dictionaries
Python installed on your system
Familiarity with CSV/Excel files (helpful but not required)

Optional but recommended:

Basic understanding of NumPy
Jupyter Notebook or VS Code for coding

Core Concepts & Explanation

DataFrames & Series (The Foundation of Pandas)

In pandas, the two main data structures are:

Series → One-dimensional data (like a column)
DataFrame → Two-dimensional table (rows + columns)

Example:

import pandas as pd

data = {
    "Name": ["Ali", "Fatima", "Ahmad"],
    "Marks": [85, 90, 78]
}

df = pd.DataFrame(data)
print(df)

Explanation:

import pandas as pd → Imports pandas library
data → Python dictionary containing student data
pd.DataFrame(data) → Converts dictionary into table format
print(df) → Displays the data

Output:

     Name  Marks
0     Ali     85
1  Fatima     90
2   Ahmad     78

Reading & Exploring Data

Pandas allows you to load data from CSV, Excel, databases, etc.

Example:

import pandas as pd

df = pd.read_csv("students.csv")
print(df.head())

Explanation:

pd.read_csv() → Reads CSV file
"students.csv" → File name
df.head() → Shows first 5 rows

Other useful functions:

df.info() → Data types and structure
df.describe() → Statistical summary
df.shape → Number of rows and columns

Data Cleaning & Manipulation

Real-world data is messy. Pandas helps clean it.

Handling Missing Values:

df = df.dropna()

Removes rows with missing values

Filling Missing Values:

df["Marks"] = df["Marks"].fillna(0)

Replaces missing values with 0

Renaming Columns:

df.rename(columns={"Marks": "Score"}, inplace=True)

Changes column name

Filtering & Selection

high_scores = df[df["Marks"] > 80]

Selects students with marks > 80

GroupBy Operations

df.groupby("City")["Marks"].mean()

Groups data by city and calculates average marks

Merging DataFrames

merged = pd.merge(df1, df2, on="ID")

Combines two tables using a common column

Practical Code Examples

Example 1: Student Result Analysis (Pakistan Context)

import pandas as pd

# Step 1: Create dataset
data = {
    "Name": ["Ali", "Fatima", "Ahmad", "Sara"],
    "City": ["Lahore", "Karachi", "Islamabad", "Lahore"],
    "Marks": [85, 92, 76, 88]
}

# Step 2: Convert to DataFrame
df = pd.DataFrame(data)

# Step 3: Show first rows
print(df.head())

# Step 4: Calculate average marks
avg = df["Marks"].mean()
print("Average Marks:", avg)

# Step 5: Filter top students
top_students = df[df["Marks"] > 85]
print(top_students)

Line-by-line Explanation:

data → Dictionary storing student info
pd.DataFrame(data) → Converts data into table
df.head() → Displays first rows
df["Marks"].mean() → Calculates average
df[df["Marks"] > 85] → Filters top students

Example 2: Real-World Application (Sales Data in PKR)

import pandas as pd

# Step 1: Sales dataset
data = {
    "Product": ["Mobile", "Laptop", "Tablet", "Mobile"],
    "City": ["Karachi", "Lahore", "Islamabad", "Karachi"],
    "Sales": [50000, 120000, 30000, 70000]
}

# Step 2: Create DataFrame
df = pd.DataFrame(data)

# Step 3: Total sales per city
city_sales = df.groupby("City")["Sales"].sum()
print(city_sales)

# Step 4: Total sales per product
product_sales = df.groupby("Product")["Sales"].sum()
print(product_sales)

Explanation:

groupby("City") → Groups sales by city
.sum() → Adds total sales
Useful for business analytics in Pakistan

Common Mistakes & How to Avoid Them

Mistake 1: Not Handling Missing Data

Problem:

df["Marks"].mean()

Gives wrong result if missing values exist

Fix:

df["Marks"] = df["Marks"].fillna(df["Marks"].mean())

Replaces missing values with average

Mistake 2: Forgetting `inplace=True`

Problem:

df.drop("Marks", axis=1)

Does not modify original DataFrame

Fix:

df.drop("Marks", axis=1, inplace=True)

Applies changes directly

Mistake 3: Wrong Column Names

Problem:

df["marks"]

Error if actual column is "Marks"

Fix:

print(df.columns)

Check correct column names

Practice Exercises

Exercise 1: Filter High Salary Employees

Problem:

Create a DataFrame with employee names and salaries. Show employees earning more than 50,000 PKR.

Solution:

import pandas as pd

data = {
    "Name": ["Ali", "Sara", "Ahmad"],
    "Salary": [40000, 60000, 55000]
}

df = pd.DataFrame(data)

high_salary = df[df["Salary"] > 50000]
print(high_salary)

Explanation:

Creates employee dataset
Filters salary > 50,000
Displays result

Exercise 2: Average Marks by City

Problem:

Find average marks of students grouped by city.

Solution:

import pandas as pd

data = {
    "City": ["Lahore", "Karachi", "Lahore"],
    "Marks": [80, 90, 70]
}

df = pd.DataFrame(data)

avg_marks = df.groupby("City")["Marks"].mean()
print(avg_marks)

Explanation:

Groups data by city
Calculates average marks

Frequently Asked Questions

What is pandas in Python?

Pandas is a Python library used for data analysis and manipulation. It provides powerful tools like DataFrames to handle structured data easily.

How do I install pandas?

You can install pandas using pip: pip install pandas. It works in Jupyter Notebook, VS Code, and other Python environments.

What is a DataFrame in pandas?

A DataFrame is a table-like structure with rows and columns. It is the most commonly used data structure in pandas for analyzing data.

How do I read a CSV file in pandas?

Use pd.read_csv("file.csv") to load a CSV file into a DataFrame. You can then explore it using functions like .head() and .info().

Why is pandas important for data science?

Pandas simplifies data cleaning, analysis, and transformation. It is widely used in real-world projects, making it essential for data science careers.

Summary & Key Takeaways

Pandas is a powerful library for data analysis in Python
DataFrames are the core structure used in pandas
You can easily clean, filter, and manipulate data
Functions like groupby, merge, and dropna are essential
Real-world applications include business analytics and student data analysis
Mastering pandas opens doors to data science and freelancing opportunities

Now that you’ve completed this pandas dataframe guide, continue your learning with:

Learn array operations in our NumPy Tutorial
Strengthen basics with the Python Tutorial
Explore data visualization using Matplotlib
Dive into machine learning with Scikit-learn

These tutorials on theiqra.edu.pk will help you become a complete data science professional 🚀

Practice the code examples from this tutorial

Open Compiler

Python

Test Your Python Knowledge!

Finished reading? Take a quick quiz to see how much you've learned from this tutorial.

Start Python Quiz

Previous Next

Introduction

Prerequisites

Core Concepts & Explanation

DataFrames & Series (The Foundation of Pandas)

Example:

Explanation:

Reading & Exploring Data

Example:

Explanation:

Data Cleaning & Manipulation

Handling Missing Values:

Filling Missing Values:

Renaming Columns:

Filtering & Selection

GroupBy Operations

Merging DataFrames

Practical Code Examples

Example 1: Student Result Analysis (Pakistan Context)

Line-by-line Explanation:

Example 2: Real-World Application (Sales Data in PKR)

Explanation:

Common Mistakes & How to Avoid Them

Mistake 1: Not Handling Missing Data

Problem:

Fix:

Mistake 2: Forgetting inplace=True

Problem:

Fix:

Mistake 3: Wrong Column Names

Problem:

Fix:

Practice Exercises

Exercise 1: Filter High Salary Employees

Problem:

Solution:

Explanation:

Exercise 2: Average Marks by City

Problem:

Solution:

Explanation:

Frequently Asked Questions

What is pandas in Python?

How do I install pandas?

What is a DataFrame in pandas?

How do I read a CSV file in pandas?

Why is pandas important for data science?

Summary & Key Takeaways

Next Steps & Related Tutorials

Test Your Python Knowledge!

About Zaheer Ahmad

Mistake 2: Forgetting `inplace=True`