Interview Questions, Answers and Tutorials

Data Analysis with Pandas

Data Analysis with Pandas

Imagine you’re playing with a big box of LEGO blocks. Each block has a label like “color,” “shape,” or “size,” and you want to sort them, count how many you have, or even build something cool! Pandas is like a magical LEGO sorting machine for data—it helps you organize, analyze, and play with your data.


What is Pandas?

Pandas is a Python library that makes handling data easy. It’s great for:

  • Reading and writing data from files like Excel, CSV, or databases.
  • Cleaning messy data (fixing missing values, renaming columns).
  • Analyzing data (counting, grouping, finding patterns).
  • Visualizing data (though it works better when combined with libraries like Matplotlib).

Getting Started

Before we begin, you need to install Pandas. Run this in your terminal:

pip install pandas

Now, let’s explore how to use Pandas step by step.


1. Importing Pandas

import pandas as pd




Here, we’re giving Pandas a nickname pd. It’s shorter and easier to type!


2. Working with Data

Imagine you have a school report with names, ages, and grades. You can create this data in Pandas like this:

Create a DataFrame

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [10, 12, 11],
    "Grade": ["A", "B", "A"]
}

df = pd.DataFrame(data)
print(df)




Output:

      Name  Age Grade
0    Alice   10     A
1      Bob   12     B
2  Charlie   11     A

Here’s what happened:

  • The data dictionary holds the information.
  • pd.DataFrame(data) turns it into a neat table.

3. Reading Data from a File

If you have a CSV file (students.csv) like this:

Name,Age,Grade
Alice,10,A
Bob,12,B
Charlie,11,A

You can load it:

df = pd.read_csv("students.csv")
print(df)


4. Exploring the Data

Look at the first few rows:

print(df.head())




Find the number of rows and columns:

print(df.shape)  # (rows, columns)




Get information about the data:

print(df.info())




See summary statistics:

print(df.describe())





5. Analyzing the Data

Filter data (e.g., students older than 10):

older_students = df[df["Age"] > 10]
print(older_students)




Add a new column:

df["Passed"] = df["Grade"] != "F"
print(df)




Group data (e.g., count grades):

grade_counts = df["Grade"].value_counts()
print(grade_counts)





6. Cleaning the Data

Handle missing values:

df = pd.DataFrame({
    "Name": ["Alice", "Bob", None],
    "Age": [10, 12, None],
    "Grade": ["A", "B", None]
})

# Fill missing values
df.fillna("Unknown", inplace=True)
print(df)



7. Saving Data

You can save your cleaned or analyzed data:

df.to_csv("cleaned_students.csv", index=False)


Practice Questions

1. Create a DataFrame

Make a DataFrame with data about fruits:

  • Name: [“Apple”, “Banana”, “Cherry”]
  • Color: [“Red”, “Yellow”, “Red”]
  • Weight: [150, 120, 10]

2. Filter Red Fruits

Filter and display only the red fruits.

3. Add a Column

Add a column Is_Heavy that says True if the weight is more than 100, otherwise False.


Solutions

1. Create a DataFrame

fruits = {
    "Name": ["Apple", "Banana", "Cherry"],
    "Color": ["Red", "Yellow", "Red"],
    "Weight": [150, 120, 10]
}

fruit_df = pd.DataFrame(fruits)
print(fruit_df)




2. Filter Red Fruits

red_fruits = fruit_df[fruit_df["Color"] == "Red"]
print(red_fruits)




3. Add a Column

fruit_df["Is_Heavy"] = fruit_df["Weight"] > 100
print(fruit_df)



Pandas is like a superhero that saves you from boring, repetitive data tasks. Once you learn it, you can handle data faster and smarter! So keep practicing, and soon you’ll be a data master. 💪