Data Analysis with Pandas
Imagine you’re playing with a big box of LEGO blocks. Each block has a label like “color,” “shape,” or “size,” and you want to sort them, count how many you have, or even build something cool! Pandas is like a magical LEGO sorting machine for data—it helps you organize, analyze, and play with your data.
What is Pandas?
Pandas is a Python library that makes handling data easy. It’s great for:
- Reading and writing data from files like Excel, CSV, or databases.
- Cleaning messy data (fixing missing values, renaming columns).
- Analyzing data (counting, grouping, finding patterns).
- Visualizing data (though it works better when combined with libraries like Matplotlib).
Getting Started
Before we begin, you need to install Pandas. Run this in your terminal:
pip install pandas
Now, let’s explore how to use Pandas step by step.
1. Importing Pandas
import pandas as pd
Here, we’re giving Pandas a nickname pd
. It’s shorter and easier to type!
2. Working with Data
Imagine you have a school report with names, ages, and grades. You can create this data in Pandas like this:
Create a DataFrame
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [10, 12, 11],
"Grade": ["A", "B", "A"]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Grade
0 Alice 10 A
1 Bob 12 B
2 Charlie 11 A
Here’s what happened:
- The
data
dictionary holds the information. pd.DataFrame(data)
turns it into a neat table.
3. Reading Data from a File
If you have a CSV file (students.csv
) like this:
Name,Age,Grade
Alice,10,A
Bob,12,B
Charlie,11,A
You can load it:
df = pd.read_csv("students.csv")
print(df)
4. Exploring the Data
Look at the first few rows:
print(df.head())
Find the number of rows and columns:
print(df.shape) # (rows, columns)
Get information about the data:
print(df.info())
See summary statistics:
print(df.describe())
5. Analyzing the Data
Filter data (e.g., students older than 10):
older_students = df[df["Age"] > 10]
print(older_students)
Add a new column:
df["Passed"] = df["Grade"] != "F"
print(df)
Group data (e.g., count grades):
grade_counts = df["Grade"].value_counts()
print(grade_counts)
6. Cleaning the Data
Handle missing values:
df = pd.DataFrame({
"Name": ["Alice", "Bob", None],
"Age": [10, 12, None],
"Grade": ["A", "B", None]
})
# Fill missing values
df.fillna("Unknown", inplace=True)
print(df)
7. Saving Data
You can save your cleaned or analyzed data:
df.to_csv("cleaned_students.csv", index=False)
Practice Questions
1. Create a DataFrame
Make a DataFrame with data about fruits:
- Name: [“Apple”, “Banana”, “Cherry”]
- Color: [“Red”, “Yellow”, “Red”]
- Weight: [150, 120, 10]
2. Filter Red Fruits
Filter and display only the red fruits.
3. Add a Column
Add a column Is_Heavy
that says True
if the weight is more than 100, otherwise False
.
Solutions
1. Create a DataFrame
fruits = {
"Name": ["Apple", "Banana", "Cherry"],
"Color": ["Red", "Yellow", "Red"],
"Weight": [150, 120, 10]
}
fruit_df = pd.DataFrame(fruits)
print(fruit_df)
2. Filter Red Fruits
red_fruits = fruit_df[fruit_df["Color"] == "Red"]
print(red_fruits)
3. Add a Column
fruit_df["Is_Heavy"] = fruit_df["Weight"] > 100
print(fruit_df)
Pandas is like a superhero that saves you from boring, repetitive data tasks. Once you learn it, you can handle data faster and smarter! So keep practicing, and soon you’ll be a data master. 💪