Correlated Subqueries: Solving Puzzles Inside Puzzles!
Hey everyone! Imagine you have a big box of LEGOs, and inside that big box, you have several smaller boxes. Now, imagine you need to find a specific red brick in each of those smaller boxes. To do that, you have to open each small box one by one and look inside.
Well, in the world of databases and MySQL, we have something similar called correlated subqueries. It’s like having a small question (the subquery) inside a bigger question (the main query), and the answer to the small question depends on something in the bigger question!
Think of it this way:
Main Query (The Big Question): “Tell me about all the students who scored higher than the average score of students in their own class.”
Subquery (The Small Question Inside): “What is the average score of this specific student’s class?”
See how the small question (“average score”) needs to know which class we’re talking about from the big question (“this specific student”)? That’s the “correlation” – they are linked!
Let’s See Some Action (Query Examples!)
Imagine we have two tables:
students
table:
student_id | student_name | class_id |
---|---|---|
1 | Alice | 101 |
2 | Bob | 102 |
3 | Charlie | 101 |
4 | David | 102 |
scores
table:
score_id | student_id | subject | marks |
---|---|---|---|
10 | 1 | Math | 85 |
11 | 1 | Science | 92 |
12 | 2 | Math | 78 |
13 | 2 | Science | 88 |
14 | 3 | Math | 90 |
15 | 3 | Science | 86 |
16 | 4 | Math | 82 |
17 | 4 | Science | 95 |
Now, let’s find the students who scored higher than the average score in their class for Math:
SELECT
s.student_name,
sc.marks
FROM
students s
JOIN
scores sc ON s.student_id = sc.student_id
WHERE
sc.subject = 'Math' AND sc.marks > (
SELECT
AVG(marks)
FROM
scores sc2
WHERE
sc2.subject = 'Math' AND sc2.student_id IN (SELECT student_id FROM students WHERE class_id = s.class_id)
);
Whoa, that looks a bit long! Let’s break it down:
SELECT s.student_name, sc.marks FROM students s JOIN scores sc ON s.student_id = sc.student_id WHERE sc.subject = 'Math'
: This part is like saying, “Give me the names and Math scores of all students.”AND sc.marks > (...)
: This adds a condition: “But only if their Math score is greater than…”(SELECT AVG(marks) FROM scores sc2 WHERE sc2.subject = 'Math' AND sc2.student_id IN (SELECT student_id FROM students WHERE class_id = s.class_id))
: This is our correlated subquery!SELECT AVG(marks) FROM scores sc2 WHERE sc2.subject = 'Math'
: This part calculates the average Math score.AND sc2.student_id IN (SELECT student_id FROM students WHERE class_id = s.class_id)
: Here’s the magic! For each student in the main query (s
), this inner query finds all thestudent_id
s belonging to the sameclass_id
as that student. Then, it calculates the average Math score only for those students in that specific class.
So, for Alice (class 101), the subquery will calculate the average Math score for students in class 101 (Alice and Charlie). Then, it will check if Alice’s Math score is higher than that average. This happens for Bob, Charlie, and David too!
Another Example: Finding Students with More Than One Score
Let’s find students who have more than one entry in the scores
table:
SELECT
s.student_name
FROM
students s
WHERE
(SELECT COUNT(*) FROM scores WHERE student_id = s.student_id) > 1;
Here, for each student in the students
table (s
), the subquery counts how many entries exist for that student_id
in the scores
table. If the count is greater than 1, that student’s name is selected.
Practice Time!
- Write a query to find all students whose Science score is below the average Science score of students in their class.
- Write a query to find all classes where at least one student scored 100 in any subject. (Hint: You might need to correlate based on
class_id
).
Interview Corner!
Interviewer: What is a correlated subquery, and how does it differ from a non-correlated subquery?
You (the Expert!): A correlated subquery is a subquery that depends on the outer query for its values. It executes once for each row processed by the outer query. In contrast, a non-correlated subquery executes only once and its result is used by the outer query for all rows. Think of a correlated subquery as asking a question that changes slightly depending on the current item you’re looking at in the main question. A non-correlated subquery asks a fixed question first, and then the main question uses that single answer.
Interviewer: Can you give an example of when you might use a correlated subquery?
You: Absolutely! A common use case is when you need to compare a row’s value with some aggregate (like an average, maximum, or minimum) that is specific to a group that the current row belongs to. For example, finding customers whose order amount is greater than their average order amount, or, as we saw, students whose score is higher than the average score in their class.
Key Takeaway
Correlated subqueries are powerful tools when you need to perform row-by-row comparisons based on related data within your tables. They might seem a bit tricky at first, but once you understand the idea of the inner query depending on the outer query, you’ll be solving database puzzles like a pro!
Keep practicing, and you’ll master them in no time! Happy querying!