SQL Interview Questions and Answers (2026) Interview Questions

BACKEND INTERVIEW PREPARATION

SQL Interview Questions and Answers

Master 100 SQL interview questions covering joins, subqueries, indexes, window functions & tricky queries asked in Java backend developer interviews at Amazon, Google, Infosys & TCS for 2026.

⏳ 60 min read 📝 100 Q&As 🎯 Beginner to Advanced

⚡ Quick Reference

Join types	INNER, LEFT, RIGHT, FULL OUTER, CROSS, SELF
WHERE vs HAVING	WHERE filters rows before grouping; HAVING filters groups after aggregation
DELETE vs TRUNCATE vs DROP	DELETE: DML, row-by-row, rollback-able; TRUNCATE: DDL, deallocates pages; DROP: removes table + structure
UNION vs UNION ALL	UNION removes duplicates (sort/hash cost); UNION ALL keeps all rows (faster)
Logical execution order	FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
PRIMARY KEY vs UNIQUE	PK: one per table, no NULLs; UNIQUE: multiple allowed, NULLs allowed
Clustered index	Defines physical row order; only one per table (non-clustered: many)
ACID	Atomicity, Consistency, Isolation, Durability — transaction guarantees

SQL Query Logical Execution Order

FROM

→

WHERE

→

GROUP BY

→

HAVING

→

SELECT

→

ORDER BY

→

LIMIT

SQL Basics & DDL/DML/DCL

Q1. What is SQL and what are its main sublanguages?

A: SQL (Structured Query Language) is the standard language for defining, manipulating, and querying relational databases. Its sublanguages: DDL (Data Definition — CREATE, ALTER, DROP, TRUNCATE), DML (Data Manipulation — INSERT, UPDATE, DELETE), DQL (Data Query — SELECT), DCL (Data Control — GRANT, REVOKE), and TCL (Transaction Control — COMMIT, ROLLBACK, SAVEPOINT).

Q2. What is the difference between DDL and DML?

A: DDL defines or changes database structure (tables, indexes, schemas) and in most databases auto-commits — it cannot be rolled back. DML changes the data inside tables, runs within transactions, and can be rolled back. DDL is metadata-level; DML is row-level.

-- DDL: structure
CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100), salary DECIMAL(10,2));
ALTER TABLE employees ADD COLUMN dept_id INT;

-- DML: data
INSERT INTO employees (id, name, salary) VALUES (1, 'Raj', 85000);
UPDATE employees SET salary = 90000 WHERE id = 1;

Q3. What is the difference between DELETE, TRUNCATE, and DROP?

A: DELETE is DML: removes rows one by one (optionally filtered by WHERE), fires triggers, is fully logged, and can be rolled back. TRUNCATE is DDL: deallocates data pages, removes all rows fast, resets identity counters, cannot use WHERE, and typically cannot be rolled back (auto-commits in MySQL/Oracle). DROP removes the table itself — data, structure, indexes, and constraints.

DELETE FROM orders WHERE status = 'CANCELLED';  -- selective, rollback-able
TRUNCATE TABLE orders;                          -- all rows, fast, resets AUTO_INCREMENT
DROP TABLE orders;                              -- table is gone entirely

Q4. What is DCL? Explain GRANT and REVOKE.

A: DCL (Data Control Language) manages access permissions. GRANT gives a user privileges on database objects; REVOKE takes them back. Privileges include SELECT, INSERT, UPDATE, DELETE, EXECUTE, and can be granted with WITH GRANT OPTION to allow re-granting.

GRANT SELECT, INSERT ON hr.employees TO 'app_user'@'%';
REVOKE INSERT ON hr.employees FROM 'app_user'@'%';

Q5. What are TCL commands? Explain COMMIT, ROLLBACK, and SAVEPOINT.

A: TCL (Transaction Control Language) manages transactions. COMMIT makes all changes since the transaction began permanent. ROLLBACK undoes them. SAVEPOINT marks a point within a transaction you can roll back to without aborting the whole transaction.

START TRANSACTION;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
SAVEPOINT after_debit;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- something went wrong with the credit:
ROLLBACK TO after_debit;
COMMIT;  -- debit persists, credit was undone

Q6. What are ACID properties?

A: Atomicity: a transaction executes fully or not at all. Consistency: a transaction moves the database from one valid state to another, honoring all constraints. Isolation: concurrent transactions do not see each other's intermediate states (controlled by isolation levels). Durability: once committed, changes survive crashes (via write-ahead logs). Interviewers often follow up with isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE.

Q7. What is the difference between SQL and NoSQL databases?

A: SQL databases (MySQL, PostgreSQL, Oracle) store structured data in tables with fixed schemas, support joins, and guarantee ACID transactions. NoSQL databases (MongoDB, Cassandra, Redis) use flexible models (document, key-value, column-family, graph), scale horizontally more easily, and often relax consistency (BASE) for availability and partition tolerance. Choose SQL for relational integrity and complex queries; NoSQL for massive scale, flexible schemas, or specific access patterns.

Q8. What is a database schema?

A: A schema is the logical container and blueprint of a database: table definitions, columns, data types, constraints, views, indexes, and relationships. In PostgreSQL/SQL Server, a schema is also a namespace inside a database (e.g., hr.employees); in MySQL, schema and database are synonymous.

Q9. What is the difference between CHAR and VARCHAR?

A: CHAR(n) is fixed-length: always stores n characters, right-padding with spaces — good for fixed-size codes (country codes, flags). VARCHAR(n) is variable-length: stores only the actual characters plus a 1-2 byte length prefix — good for names, emails. CHAR can be marginally faster for fixed-width data; VARCHAR saves space for variable data.

Q10. What is normalization? Explain 1NF, 2NF, and 3NF.

A: Normalization organizes tables to reduce redundancy and update anomalies. 1NF: atomic column values, no repeating groups. 2NF: 1NF + no partial dependency (every non-key column depends on the whole composite key). 3NF: 2NF + no transitive dependency (non-key columns depend only on the key, not on other non-key columns). Denormalization deliberately reintroduces redundancy for read performance — common in reporting schemas.

SELECT & Filtering

Q11. What is the syntactic order of clauses in a SELECT statement?

A: You must write clauses in this order: SELECT, FROM, JOIN...ON, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT/OFFSET. Writing them out of order is a syntax error. Note this differs from the logical execution order, which starts at FROM.

SELECT dept_id, AVG(salary) AS avg_sal
FROM employees
WHERE hire_date >= '2024-01-01'
GROUP BY dept_id
HAVING AVG(salary) > 60000
ORDER BY avg_sal DESC
LIMIT 5;

Q12. What is the logical order of execution of a SQL query?

A: FROM (and JOINs) → WHERE → GROUP BY → HAVING → SELECT (expressions and aliases evaluated) → DISTINCT → ORDER BY → LIMIT/OFFSET. This explains classic gotchas: you cannot use a SELECT alias in WHERE (WHERE runs first), but you usually can in ORDER BY (it runs after SELECT). Window functions are computed at the SELECT stage, so you cannot filter them in WHERE.

SELECT salary * 12 AS annual
FROM employees
-- WHERE annual > 1000000;   -- ERROR: alias not visible in WHERE
WHERE salary * 12 > 1000000  -- correct
ORDER BY annual DESC;        -- OK: ORDER BY runs after SELECT

Q13. What is the difference between WHERE and HAVING?

A: WHERE filters individual rows before grouping and cannot reference aggregate functions. HAVING filters groups after aggregation and exists precisely to apply conditions on aggregates. Performance tip: push every non-aggregate condition into WHERE so fewer rows reach the GROUP BY.

SELECT dept_id, COUNT(*) AS headcount
FROM employees
WHERE status = 'ACTIVE'        -- row filter (before grouping)
GROUP BY dept_id
HAVING COUNT(*) > 10;          -- group filter (after aggregation)

Q14. What does the DISTINCT keyword do?

A: DISTINCT removes duplicate rows from the result set, comparing the full combination of selected columns. It requires a sort or hash operation, so it has a cost on large result sets. SELECT DISTINCT a, b returns unique (a, b) pairs — not distinct a and distinct b independently. COUNT(DISTINCT col) counts unique non-NULL values.

Q15. How does the LIKE operator work? What do % and _ mean?

A: LIKE performs pattern matching: % matches zero or more characters, _ matches exactly one character. To match a literal % or _, escape it with ESCAPE. A leading wildcard ('%abc') prevents index range scans — important for performance questions.

SELECT * FROM employees WHERE name LIKE 'Ra%';    -- starts with Ra (index-friendly)
SELECT * FROM employees WHERE name LIKE '_aj';    -- 3 chars ending in aj
SELECT * FROM docs WHERE title LIKE '%50\%%' ESCAPE '\';  -- contains literal "50%"

Q16. What is the difference between BETWEEN and IN?

A: BETWEEN tests a continuous inclusive range (x BETWEEN 10 AND 20 means x >= 10 AND x <= 20). IN tests membership in a discrete list of values and is shorthand for a chain of ORed equalities. Watch out: with dates, BETWEEN '2026-01-01' AND '2026-01-31' on a DATETIME column misses times after midnight on Jan 31 — prefer >= '2026-01-01' AND < '2026-02-01'.

SELECT * FROM orders WHERE amount BETWEEN 100 AND 500;
SELECT * FROM orders WHERE status IN ('NEW', 'PAID', 'SHIPPED');

Q17. How do you check for NULL values, and why does column = NULL not work?

A: NULL means "unknown", and any comparison with NULL (=, <>, <) yields UNKNOWN, not TRUE — so the row is filtered out. Use IS NULL and IS NOT NULL. This three-valued logic (TRUE/FALSE/UNKNOWN) is one of the most common SQL interview traps, especially combined with NOT IN.

SELECT * FROM employees WHERE manager_id = NULL;    -- returns 0 rows, always!
SELECT * FROM employees WHERE manager_id IS NULL;   -- correct
SELECT * FROM employees WHERE manager_id IS NOT NULL;

Q18. What is COALESCE and how does it differ from IFNULL/NVL?

A: COALESCE(a, b, c, ...) is ANSI standard and returns the first non-NULL argument from any number of arguments. IFNULL(a, b) (MySQL) and NVL(a, b) (Oracle) are two-argument vendor variants. Prefer COALESCE for portability. NULLIF(a, b) is the inverse trick: returns NULL if a = b — handy to avoid division by zero.

SELECT name, COALESCE(mobile, office_phone, 'N/A') AS contact FROM employees;
SELECT total / NULLIF(quantity, 0) AS unit_price FROM order_items;  -- no div-by-zero

Q19. How does ORDER BY work with multiple columns and NULLs?

A: ORDER BY a DESC, b ASC sorts by a descending, breaking ties by b ascending. NULL placement varies: PostgreSQL/Oracle treat NULLs as largest by default (last in ASC) and support NULLS FIRST/LAST; MySQL treats NULLs as smallest (first in ASC) and needs a workaround like ORDER BY col IS NULL, col.

SELECT name, dept_id, salary
FROM employees
ORDER BY dept_id ASC, salary DESC;

-- PostgreSQL: push NULL salaries to the end explicitly
SELECT name, salary FROM employees ORDER BY salary DESC NULLS LAST;

Q20. How do you implement pagination in SQL?

A: Use LIMIT/OFFSET (MySQL/PostgreSQL) or OFFSET...FETCH (SQL Server/Oracle 12c+). OFFSET-based pagination degrades on deep pages because the database still scans and discards skipped rows — keyset (seek) pagination using a WHERE on the last-seen key is O(1) per page and preferred for infinite scroll APIs.

-- Offset pagination: page 3, 20 per page (slow for large offsets)
SELECT id, name FROM products ORDER BY id LIMIT 20 OFFSET 40;

-- Keyset pagination: fast, uses index seek
SELECT id, name FROM products WHERE id > 12345 ORDER BY id LIMIT 20;

Joins

Q21. What are the different types of joins in SQL?

A: INNER JOIN: only matching rows from both tables. LEFT (OUTER) JOIN: all rows from the left table + matches (NULLs where no match). RIGHT (OUTER) JOIN: all rows from the right table + matches. FULL OUTER JOIN: all rows from both sides. CROSS JOIN: Cartesian product (every row with every row). SELF JOIN: a table joined to itself using aliases.

Q22. Explain INNER JOIN with an example.

A: INNER JOIN returns only rows where the join condition matches in both tables. Employees without a department and departments without employees are excluded. It is the default when you write just JOIN.

SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;

Q23. What is the difference between LEFT JOIN and RIGHT JOIN?

A: LEFT JOIN preserves every row of the left (first) table, filling unmatched right-side columns with NULL. RIGHT JOIN is the mirror image — it preserves every row of the right table. Any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping table order, and most teams standardize on LEFT JOIN for readability.

-- All employees, with department name if any
SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;

-- Equivalent to: departments RIGHT JOIN employees

Q24. What is a FULL OUTER JOIN and how do you emulate it in MySQL?

A: FULL OUTER JOIN returns all rows from both tables, matching where possible and padding with NULLs otherwise. PostgreSQL, SQL Server, and Oracle support it directly; MySQL (before 8.4's lack of it persists) does not, so you emulate it with LEFT JOIN UNION a right-anti-join.

-- MySQL emulation of FULL OUTER JOIN
SELECT e.name, d.dept_name
FROM employees e LEFT JOIN departments d ON e.dept_id = d.id
UNION
SELECT e.name, d.dept_name
FROM employees e RIGHT JOIN departments d ON e.dept_id = d.id;

Q25. What is a CROSS JOIN and when would you use it?

A: CROSS JOIN produces the Cartesian product: every row of table A paired with every row of table B (m × n rows). Legitimate uses: generating combinations (all sizes × all colors), calendar/date scaffolding, and test data generation. An accidental cross join (forgotten ON condition in old comma syntax) is a classic production bug.

SELECT s.size, c.color
FROM sizes s
CROSS JOIN colors c;   -- 4 sizes x 5 colors = 20 rows

Q26. What is a SELF JOIN? Give a real example.

A: A self join joins a table to itself using two aliases, typically to compare rows within the same table — the canonical case is an employees table where manager_id references the same table's id.

SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;  -- LEFT keeps the CEO (no manager)

Q27. In an outer join, what is the difference between putting a condition in ON vs WHERE?

A: In a LEFT JOIN, a condition in ON only restricts which right-side rows match — left rows are still kept (with NULLs). The same condition in WHERE filters the final result, silently converting the LEFT JOIN into an INNER JOIN when it references right-table columns. This is a favorite senior-level interview question.

-- All customers, with only their 2026 orders attached:
SELECT c.name, o.id
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id AND o.year = 2026;

-- WRONG for that goal: drops customers with no 2026 orders
SELECT c.name, o.id
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.year = 2026;   -- behaves like INNER JOIN

Q28. What is a NATURAL JOIN and why is it discouraged?

A: NATURAL JOIN automatically joins on all columns with identical names in both tables. It is fragile: adding a coincidentally same-named column (e.g., created_at) silently changes the join semantics. The USING (col) clause is a safer middle ground, and an explicit ON clause is the most maintainable and universally preferred.

Q29. What is a non-equi join?

A: A join whose condition uses operators other than equality — <, >, BETWEEN, <>. Common uses: mapping values to ranges (tax slabs, grade bands) and finding pairs of rows. Non-equi joins usually cannot use hash joins, so they can be expensive on large tables.

SELECT e.name, e.salary, g.grade
FROM employees e
JOIN salary_grades g
  ON e.salary BETWEEN g.min_salary AND g.max_salary;

Q30. How do you find rows in one table that have no match in another?

A: Three idioms: LEFT JOIN + IS NULL check, NOT EXISTS (generally safest and often fastest), and NOT IN (dangerous: returns zero rows if the subquery yields any NULL). Prefer NOT EXISTS in interviews and production.

-- Customers who never placed an order
SELECT c.* FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;

-- Equivalent, NULL-safe:
SELECT c.* FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

Subqueries & CTEs

Q31. What is a subquery and what types exist?

A: A subquery is a query nested inside another query. Types by result shape: scalar (single value — usable anywhere an expression is), row (single row of columns), column/table (multiple rows — used with IN, EXISTS, or in FROM as a derived table). Types by dependency: non-correlated (runs once, independent) and correlated (references the outer query, conceptually runs per outer row).

Q32. What is the difference between a correlated and a non-correlated subquery?

A: A non-correlated subquery is independent of the outer query and can be evaluated once. A correlated subquery references columns from the outer query, so it is logically re-evaluated for each outer row — potentially expensive, though optimizers often rewrite them as joins.

-- Non-correlated: inner query runs once
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Correlated: compares each employee to their own department's average
SELECT e.name FROM employees e
WHERE e.salary > (SELECT AVG(e2.salary)
                  FROM employees e2
                  WHERE e2.dept_id = e.dept_id);

Q33. What is the difference between EXISTS and IN? What is the NOT IN NULL trap?

A: IN compares a value against the subquery's result list; EXISTS just checks whether the subquery returns any row (short-circuits on the first match). The trap: if the subquery under NOT IN returns even one NULL, the whole predicate becomes UNKNOWN and the outer query returns zero rows. NOT EXISTS does not have this problem — prefer it.

-- If any orders.customer_id is NULL, this returns NO rows at all:
SELECT * FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);

-- NULL-safe version:
SELECT * FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

Q34. What is a CTE (Common Table Expression)?

A: A CTE, introduced with WITH, is a named temporary result set scoped to a single statement. Benefits: readability (break complex logic into named steps), reuse of the same subresult multiple times in one query, and enabling recursion. Unlike a view it isn't stored; unlike a temp table it lives only for one statement.

WITH dept_stats AS (
  SELECT dept_id, AVG(salary) AS avg_sal
  FROM employees
  GROUP BY dept_id
)
SELECT e.name, e.salary, ds.avg_sal
FROM employees e
JOIN dept_stats ds ON ds.dept_id = e.dept_id
WHERE e.salary > ds.avg_sal;

Q35. What is a recursive CTE? Show how to walk an org hierarchy.

A: A recursive CTE references itself: an anchor member produces the starting rows, and a recursive member repeatedly joins back to the CTE until no new rows are produced. Classic uses: org charts, category trees, bill-of-materials, and generating number/date series.

WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS level
  FROM employees WHERE manager_id IS NULL      -- anchor: the CEO
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.level + 1
  FROM employees e
  JOIN org o ON e.manager_id = o.id            -- recursive step
)
SELECT * FROM org ORDER BY level, name;

Q36. What is a derived table?

A: A derived table is a subquery in the FROM clause that acts like an inline, unnamed table for the duration of the query. Most databases require it to have an alias. CTEs are usually preferred for readability, but derived tables are equivalent for single use.

SELECT t.dept_id, t.avg_sal
FROM (SELECT dept_id, AVG(salary) AS avg_sal
      FROM employees GROUP BY dept_id) AS t   -- alias required
WHERE t.avg_sal > 60000;

Q37. Can you use a subquery in the SELECT list?

A: Yes — a scalar subquery in SELECT must return at most one row and one column per outer row (a runtime error occurs if it returns more than one row). It is convenient but runs per row when correlated; a JOIN with GROUP BY often performs better for large results.

SELECT c.name,
       (SELECT COUNT(*) FROM orders o
        WHERE o.customer_id = c.id) AS order_count
FROM customers c;

Q38. What do the ANY and ALL operators do?

A: They compare a value against a set: x > ANY (subquery) is true if x exceeds at least one value (i.e., x > the minimum); x > ALL (subquery) is true only if x exceeds every value (i.e., x > the maximum). = ANY is equivalent to IN; <> ALL is equivalent to NOT IN (with the same NULL caveat).

-- Employees paid more than EVERY employee in dept 30
SELECT name FROM employees
WHERE salary > ALL (SELECT salary FROM employees WHERE dept_id = 30);

Q39. When do you choose a CTE vs a temp table vs a view?

A: CTE: single-statement scope, best for readability and recursion; may be re-evaluated if referenced multiple times (database-dependent; PostgreSQL 12+ can inline or materialize). Temp table: persists for the session, can be indexed and analyzed — best for large intermediate results reused across multiple statements. View: a stored, reusable query definition shared across sessions and applications; no data stored (unless materialized).

Q40. How do you UPDATE or DELETE using data from another table?

A: Use an UPDATE with a JOIN (MySQL/SQL Server) or UPDATE ... FROM (PostgreSQL), or a correlated subquery (portable). The same applies to DELETE. MySQL quirk: you cannot directly reference the target table in a subquery of its own UPDATE/DELETE without wrapping it in a derived table.

-- MySQL: give a 10% raise to everyone in Engineering
UPDATE employees e
JOIN departments d ON d.id = e.dept_id
SET e.salary = e.salary * 1.10
WHERE d.dept_name = 'Engineering';

-- PostgreSQL equivalent
UPDATE employees e SET salary = salary * 1.10
FROM departments d
WHERE d.id = e.dept_id AND d.dept_name = 'Engineering';

Aggregates & GROUP BY

Q41. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?

A: COUNT(*) counts all rows, including rows with NULLs. COUNT(column) counts rows where that column is NOT NULL. COUNT(DISTINCT column) counts unique non-NULL values. On an empty table, all return 0 — unlike other aggregates, which return NULL.

SELECT COUNT(*)              AS all_rows,      -- 1000
       COUNT(manager_id)     AS with_manager,  -- 999 (CEO's is NULL)
       COUNT(DISTINCT dept_id) AS departments  -- 12
FROM employees;

Q42. What are the rules of GROUP BY regarding the SELECT list?

A: Every column in SELECT must either appear in GROUP BY or be inside an aggregate function. Standard SQL and MySQL's default ONLY_FULL_GROUP_BY mode enforce this; legacy MySQL silently picked an arbitrary value for ungrouped columns — a notorious source of bugs. If you need a non-grouped column (like an employee name alongside a department max), use a window function or join back.

-- Valid
SELECT dept_id, MAX(salary) FROM employees GROUP BY dept_id;

-- Invalid under ONLY_FULL_GROUP_BY: name is neither grouped nor aggregated
-- SELECT dept_id, name, MAX(salary) FROM employees GROUP BY dept_id;

Q43. Write a query using HAVING to find departments whose average salary exceeds 75,000.

A: HAVING is applied after aggregation, so it can reference aggregates directly or via aliases in some databases (MySQL allows the alias; standard SQL requires repeating the expression).

SELECT dept_id, AVG(salary) AS avg_sal, COUNT(*) AS headcount
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > 75000
ORDER BY avg_sal DESC;

Q44. How do aggregate functions treat NULL values?

A: All aggregates except COUNT(*) ignore NULLs. Subtle consequence: AVG(col) divides by the count of non-NULL values, not total rows — so AVG(col) and SUM(col)/COUNT(*) differ when NULLs exist. If NULL should mean zero, wrap it: AVG(COALESCE(col, 0)). SUM over an empty set or all-NULL column returns NULL, not 0.

Q45. Can you GROUP BY multiple columns? What does it mean?

A: Yes — GROUP BY a, b creates one group per distinct (a, b) combination, like a two-level breakdown. Order of columns in GROUP BY does not change the result (unlike ORDER BY).

SELECT dept_id, job_title, COUNT(*) AS n, AVG(salary) AS avg_sal
FROM employees
GROUP BY dept_id, job_title;   -- one row per (dept, title) pair

Q46. What are ROLLUP, CUBE, and GROUPING SETS?

A: Extensions of GROUP BY that add subtotal rows in one pass. ROLLUP(a, b) produces groups (a,b), (a), and grand total — hierarchical subtotals. CUBE(a, b) produces all combinations: (a,b), (a), (b), and total. GROUPING SETS lets you specify exactly which groupings you want. Subtotal rows show NULL in the rolled-up column; use GROUPING() to distinguish real NULLs from subtotal NULLs.

SELECT dept_id, job_title, SUM(salary) AS total
FROM employees
GROUP BY ROLLUP(dept_id, job_title);
-- rows: per (dept, title), per dept subtotal, grand total

Q47. What is conditional aggregation?

A: Combining CASE expressions inside aggregates to compute several filtered metrics in one scan — the standard way to pivot rows into columns without vendor PIVOT syntax.

SELECT dept_id,
       SUM(CASE WHEN gender = 'F' THEN 1 ELSE 0 END) AS women,
       SUM(CASE WHEN gender = 'M' THEN 1 ELSE 0 END) AS men,
       AVG(CASE WHEN hire_year = 2026 THEN salary END) AS avg_new_hire_sal
FROM employees
GROUP BY dept_id;

Q48. What is the FILTER clause for aggregates?

A: PostgreSQL (and standard SQL:2003) supports agg(...) FILTER (WHERE condition) — a cleaner equivalent of the CASE-inside-aggregate pattern. MySQL does not support FILTER, so the CASE idiom remains the portable answer.

SELECT dept_id,
       COUNT(*) FILTER (WHERE status = 'ACTIVE')   AS active,
       COUNT(*) FILTER (WHERE status = 'RESIGNED') AS resigned
FROM employees
GROUP BY dept_id;

Q49. Do MIN and MAX work on strings and dates?

A: Yes. For dates, MIN/MAX return earliest/latest — the common way to find first and last order dates. For strings, they compare by collation order (dictionary order under the column's collation), so MAX(name) returns the alphabetically last name. They ignore NULLs like other aggregates.

Q50. Write a query to find departments with more than 5 employees hired since 2024.

A: Combine a WHERE row filter with GROUP BY and HAVING — this tests understanding of the WHERE-then-HAVING pipeline.

SELECT d.dept_name, COUNT(*) AS hires
FROM employees e
JOIN departments d ON d.id = e.dept_id
WHERE e.hire_date >= '2024-01-01'
GROUP BY d.dept_name
HAVING COUNT(*) > 5;

Set Operations

Q51. What is the difference between UNION and UNION ALL?

A: Both stack result sets vertically. UNION removes duplicate rows (requires a sort or hash — extra cost). UNION ALL keeps everything, including duplicates, and is faster. Rule of thumb: use UNION ALL unless you specifically need deduplication — a frequent performance-review finding.

SELECT city FROM customers
UNION            -- unique cities across both
SELECT city FROM suppliers;

SELECT city FROM customers
UNION ALL        -- all rows, duplicates kept, no sort cost
SELECT city FROM suppliers;

Q52. What does INTERSECT do?

A: INTERSECT returns only the rows that appear in both result sets, with duplicates removed. Supported by PostgreSQL, SQL Server, Oracle, and MySQL 8.0.31+.

-- Customers who are also suppliers (by email)
SELECT email FROM customers
INTERSECT
SELECT email FROM suppliers;

Q53. What does EXCEPT (MINUS in Oracle) do?

A: EXCEPT returns distinct rows from the first query that do not appear in the second — set difference. Oracle calls it MINUS. Note that unlike UNION, EXCEPT is not commutative: A EXCEPT B differs from B EXCEPT A.

-- Emails of customers who never subscribed to the newsletter
SELECT email FROM customers
EXCEPT
SELECT email FROM newsletter_subscribers;

Q54. What are the rules for combining queries with set operators?

A: Each SELECT must return the same number of columns, in the same order, with compatible (implicitly convertible) data types. Column names in the final result come from the first SELECT. NULLs are considered equal to each other for deduplication purposes in UNION/INTERSECT/EXCEPT — unlike in WHERE comparisons.

Q55. How do you emulate INTERSECT and EXCEPT in older MySQL versions?

A: Before MySQL 8.0.31, use INNER JOIN or IN/EXISTS for INTERSECT, and LEFT JOIN ... IS NULL or NOT EXISTS for EXCEPT, adding DISTINCT to match set semantics.

-- INTERSECT emulation
SELECT DISTINCT c.email FROM customers c
INNER JOIN suppliers s ON s.email = c.email;

-- EXCEPT emulation
SELECT DISTINCT c.email FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM suppliers s WHERE s.email = c.email);

Q56. Where does ORDER BY go in a UNION query?

A: A single ORDER BY at the very end, applying to the combined result. You cannot ORDER BY individual branches (except inside a parenthesized subquery with LIMIT). To sort by source, add a literal discriminator column.

SELECT name, 'customer' AS src FROM customers
UNION ALL
SELECT name, 'supplier' AS src FROM suppliers
ORDER BY src, name;   -- one ORDER BY for the whole result

Q57. When should you use a set operation instead of a JOIN?

A: JOINs combine tables horizontally (adding columns, matching rows by keys); set operations combine vertically (stacking rows of the same shape). Use UNION ALL to merge similar rows from different tables (e.g., current + archived orders); use JOIN to enrich rows with related data. INTERSECT/EXCEPT compare whole rows, which is cleaner than multi-column join conditions when checking full-row existence.

Q58. How do set operations handle duplicates within a single input?

A: UNION, INTERSECT, and EXCEPT all deduplicate the final output — even duplicates that come from a single input query. Only the ALL variants (UNION ALL; INTERSECT ALL / EXCEPT ALL in PostgreSQL) preserve duplicate cardinality, where INTERSECT ALL keeps the minimum count of each row across inputs and EXCEPT ALL subtracts counts.

Constraints & Keys

Q59. What types of constraints does SQL support?

A: PRIMARY KEY (unique + not null row identifier), FOREIGN KEY (referential integrity to another table), UNIQUE (no duplicate values), NOT NULL (value required), CHECK (boolean condition on column values), and DEFAULT (value used when none supplied). Constraints are enforced by the database itself, making them the last line of defense against bad data regardless of application bugs.

Q60. What is the difference between a PRIMARY KEY and a UNIQUE constraint?

A: Both enforce uniqueness via an index. Differences: a table can have only one primary key but many unique constraints; primary key columns are implicitly NOT NULL, while unique columns allow NULLs (MySQL/PostgreSQL allow multiple NULLs since NULL != NULL; SQL Server allows only one). In InnoDB, the primary key is also the clustered index that orders the table physically.

Q61. What is a FOREIGN KEY and what referential actions exist?

A: A foreign key ensures a child column's value exists in the parent table's referenced key. On parent UPDATE/DELETE you can specify: CASCADE (propagate the change), SET NULL, SET DEFAULT, RESTRICT/NO ACTION (reject if children exist). Use CASCADE deliberately — an innocent parent delete can wipe large child subtrees.

CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT NOT NULL,
  CONSTRAINT fk_cust FOREIGN KEY (customer_id)
    REFERENCES customers(id)
    ON DELETE RESTRICT
    ON UPDATE CASCADE
);

Q62. What is a composite key?

A: A key made of two or more columns whose combination must be unique — common in junction tables for many-to-many relationships. Column order matters for the underlying index (leftmost prefix rule applies to lookups).

CREATE TABLE enrollments (
  student_id INT,
  course_id  INT,
  enrolled_on DATE NOT NULL,
  PRIMARY KEY (student_id, course_id)   -- composite key
);

Q63. Define candidate key, super key, and alternate key.

A: A super key is any set of columns that uniquely identifies a row (may contain extra columns). A candidate key is a minimal super key — no column can be removed without losing uniqueness. One candidate key is chosen as the primary key; the remaining candidate keys are alternate keys (typically enforced with UNIQUE constraints, e.g., email when id is the PK).

Q64. What is a CHECK constraint?

A: A CHECK constraint enforces a boolean condition on each row at INSERT/UPDATE time. MySQL only began enforcing CHECK in 8.0.16 — before that it parsed but ignored them, a well-known gotcha.

CREATE TABLE employees (
  id INT PRIMARY KEY,
  salary DECIMAL(10,2) CHECK (salary > 0),
  hire_date DATE,
  CONSTRAINT chk_dates CHECK (hire_date >= '2000-01-01')
);

Q65. How do NOT NULL and DEFAULT work together?

A: NOT NULL rejects explicit and implicit NULLs; DEFAULT supplies a value when the column is omitted from the INSERT. Together they guarantee a sensible value is always present. Note: INSERT ... VALUES (NULL) still fails on a NOT NULL DEFAULT column — DEFAULT only applies when the column is omitted (or with the DEFAULT keyword).

CREATE TABLE audit_log (
  id BIGINT PRIMARY KEY AUTO_INCREMENT,
  action VARCHAR(50) NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO audit_log (action) VALUES ('LOGIN');  -- created_at auto-filled

Q66. What is the difference between a surrogate key and a natural key?

A: A natural key is a real-world attribute that is inherently unique (email, ISBN, national ID). A surrogate key is a meaningless generated identifier (auto-increment integer, UUID). Surrogates are preferred in practice: they never change (natural attributes do — people change emails), are compact for joins and indexes, and decouple the schema from business rules. Keep the natural key as a UNIQUE constraint.

Q67. How do AUTO_INCREMENT, IDENTITY, and sequences differ?

A: All generate sequential surrogate keys. MySQL: AUTO_INCREMENT column attribute. SQL Server: IDENTITY(seed, increment). PostgreSQL/Oracle: standalone SEQUENCE objects (PostgreSQL wraps them via GENERATED ... AS IDENTITY or legacy SERIAL). Sequences can be shared across tables and fetched independently. All can leave gaps after rollbacks — never assume gap-free IDs.

-- MySQL
CREATE TABLE t1 (id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(50));
-- PostgreSQL
CREATE TABLE t2 (id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name TEXT);
-- Oracle / PostgreSQL sequence
CREATE SEQUENCE order_seq START WITH 1000 INCREMENT BY 1;

Q68. Can a primary key be NULL? Can a foreign key be NULL?

A: A primary key can never be NULL — NOT NULL is part of its definition (every part of a composite PK too). A foreign key can be NULL unless you add NOT NULL — a NULL FK means "no parent assigned" (e.g., an employee with no manager) and is not validated against the parent table.

Views, Indexes & Stored Procedures

Q69. What is a view and why use one?

A: A view is a stored, named SELECT query that behaves like a virtual table — no data is stored. Uses: simplifying complex joins for consumers, security (expose only certain columns/rows and grant access to the view, not the tables), and providing a stable interface while underlying tables evolve.

CREATE VIEW active_employee_summary AS
SELECT e.id, e.name, d.dept_name, e.salary
FROM employees e
JOIN departments d ON d.id = e.dept_id
WHERE e.status = 'ACTIVE';

SELECT * FROM active_employee_summary WHERE salary > 80000;

Q70. Can you INSERT/UPDATE through a view?

A: Only through updatable views — generally single-table views without DISTINCT, GROUP BY, aggregates, UNION, or window functions. WITH CHECK OPTION prevents writes that would make the row invisible to the view (e.g., updating status so it no longer matches the view's WHERE). Multi-table views are usually read-only (some databases allow it via INSTEAD OF triggers).

Q71. What is a materialized view and how does it differ from a regular view?

A: A materialized view physically stores the query result and must be refreshed (on demand, on schedule, or incrementally) — reads are fast but data can be stale. A regular view re-executes its query every time — always fresh but no performance gain. PostgreSQL and Oracle support them natively; MySQL requires manual emulation with tables + triggers/jobs. Classic trade-off question: freshness vs read latency.

-- PostgreSQL
CREATE MATERIALIZED VIEW daily_sales AS
SELECT order_date, SUM(amount) AS total
FROM orders GROUP BY order_date;

REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;

Q72. What is an index and how does a B-tree index work?

A: An index is an auxiliary sorted structure that lets the database find rows without scanning the whole table. B-tree (technically B+ tree) indexes keep keys sorted in a balanced tree: lookups, range scans, and ORDER BY on the key run in O(log n) page reads, with leaf pages linked for fast range traversal. Writes pay the cost: every INSERT/UPDATE/DELETE must also maintain each index.

CREATE INDEX idx_emp_dept ON employees(dept_id);
CREATE UNIQUE INDEX idx_emp_email ON employees(email);
DROP INDEX idx_emp_dept ON employees;   -- MySQL syntax

Q73. What is the difference between a clustered and a non-clustered index?

A: A clustered index defines the physical storage order of the table — the leaf level is the row data — so there can be only one (InnoDB always clusters by primary key). A non-clustered (secondary) index is a separate structure whose leaves hold the indexed key plus a row locator (the PK in InnoDB, RID in SQL Server); a lookup may require an extra hop to fetch the full row. This is why huge primary keys make all secondary indexes bigger in InnoDB.

Q74. What is a composite index and the leftmost prefix rule?

A: A composite index covers multiple columns in a defined order, e.g., (last_name, first_name, dob). The B-tree can only be seeked using a leftmost prefix: queries filtering on last_name, or last_name + first_name, use the index; filtering only on first_name cannot seek it. Once a range condition is used on a column, columns to its right can't be used for seeking. Column order should follow equality-first, range-last, and most selective usage patterns.

CREATE INDEX idx_name_dob ON employees(last_name, first_name, dob);

-- Uses the index (leftmost prefix):
SELECT * FROM employees WHERE last_name = 'Kumar';
SELECT * FROM employees WHERE last_name = 'Kumar' AND first_name = 'Raj';

-- Cannot seek the index (skips leading column):
SELECT * FROM employees WHERE first_name = 'Raj';

Q75. What is a covering index?

A: An index that contains every column a query needs, so the database answers entirely from the index without touching the table ("index-only scan" / "Using index" in MySQL EXPLAIN). Dramatically reduces I/O for hot queries. SQL Server has INCLUDE columns for this; in MySQL you simply add the columns to the index key.

CREATE INDEX idx_cover ON orders(customer_id, order_date, amount);

-- Fully served from the index — no table access:
SELECT order_date, amount
FROM orders
WHERE customer_id = 42;

Q76. When will the optimizer NOT use an index?

A: Common cases: a function or expression wraps the indexed column (WHERE YEAR(created_at) = 2026); a leading wildcard (LIKE '%son'); an implicit type conversion (comparing a VARCHAR column to a number); low selectivity where a full scan is cheaper; OR conditions across different columns; and stale statistics. Fix by rewriting predicates to be sargable or adding function-based/expression indexes.

-- Index on created_at is IGNORED:
SELECT * FROM orders WHERE YEAR(created_at) = 2026;

-- Sargable rewrite — index range scan:
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

Q77. What is the difference between a stored procedure and a function?

A: A stored procedure is invoked with CALL, may return zero or many result sets, supports OUT parameters, and can run transaction control and DML freely. A function must return exactly one value (or a table in some databases), is callable inside SQL expressions (SELECT/WHERE), and is typically restricted from side effects. Use functions for reusable computations, procedures for multi-step business operations.

DELIMITER //
CREATE PROCEDURE give_raise(IN emp_id INT, IN pct DECIMAL(5,2))
BEGIN
  UPDATE employees
  SET salary = salary * (1 + pct / 100)
  WHERE id = emp_id;
END //
DELIMITER ;

CALL give_raise(101, 8.5);

Q78. What is a trigger and when should you avoid one?

A: A trigger is a procedure that fires automatically BEFORE/AFTER an INSERT, UPDATE, or DELETE on a table — used for auditing, derived columns, and enforcing rules that constraints can't express. Avoid them for core business logic: they are invisible to application developers, complicate debugging, can cascade, and add latency to every write. Prefer application-level logic or constraints where possible.

CREATE TRIGGER trg_salary_audit
AFTER UPDATE ON employees
FOR EACH ROW
INSERT INTO salary_audit(emp_id, old_salary, new_salary, changed_at)
VALUES (OLD.id, OLD.salary, NEW.salary, NOW());

Q79. How do you analyze a slow query?

A: Run EXPLAIN (or EXPLAIN ANALYZE for actual runtimes) to inspect the execution plan: access type (full scan vs index/range/ref), join order, estimated vs actual rows, and sort/temporary-table usage. Red flags: type=ALL (full scan) on big tables, Using filesort, Using temporary, and huge row estimates. Then fix with better indexes, sargable predicates, rewritten joins, or updated statistics.

EXPLAIN ANALYZE
SELECT c.name, SUM(o.amount)
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.order_date >= '2026-01-01'
GROUP BY c.name;

Q80. What are the downsides of adding too many indexes?

A: Every index must be updated on each INSERT/UPDATE/DELETE, slowing writes and increasing lock/latch contention; indexes consume disk and buffer-pool memory (evicting hot data pages); and more indexes give the optimizer more choices, occasionally leading to worse plans. Guideline: index to match actual query patterns, monitor unused indexes (e.g., sys.schema_unused_indexes in MySQL), and drop the dead weight.

Window Functions

Q81. What is a window function and how does it differ from GROUP BY?

A: A window function computes a value across a set of rows related to the current row (OVER (...)) while keeping every row in the output. GROUP BY collapses rows into one per group. Window functions let you show detail rows alongside aggregates — e.g., each employee's salary next to their department average — which GROUP BY alone cannot do.

SELECT name, dept_id, salary,
       AVG(salary) OVER (PARTITION BY dept_id) AS dept_avg,
       salary - AVG(salary) OVER (PARTITION BY dept_id) AS diff_from_avg
FROM employees;   -- one row per employee, no collapsing

Q82. What is the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?

A: All number rows by an ORDER BY within the window. ROW_NUMBER() assigns unique sequential numbers (ties broken arbitrarily). RANK() gives ties the same rank and then skips numbers (1,2,2,4). DENSE_RANK() gives ties the same rank with no gaps (1,2,2,3). Choosing between them is the crux of "Nth highest" problems: DENSE_RANK treats equal salaries as one level.

SELECT name, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,   -- 1,2,3,4
       RANK()       OVER (ORDER BY salary DESC) AS rnk,  -- 1,2,2,4
       DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk  -- 1,2,2,3
FROM employees;

Q83. What does PARTITION BY do in a window function?

A: PARTITION BY splits the rows into independent windows — the function restarts for each partition, like a per-group GROUP BY that doesn't collapse rows. Omitting it makes the whole result set one window. It's how you express "per department", "per customer", "per month" rankings and aggregates.

SELECT name, dept_id, salary,
       RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rank_in_dept
FROM employees;   -- ranking restarts at 1 for each department

Q84. What do LAG() and LEAD() do?

A: LAG(col, n, default) reads a value from n rows before the current row in the window's order; LEAD() reads ahead. They're the standard tools for row-to-row comparisons: month-over-month growth, time between events, detecting gaps — without self joins.

SELECT month, revenue,
       LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
       revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change
FROM monthly_sales;

Q85. How do you compute a running total?

A: Use SUM as a window function with ORDER BY inside OVER. With ORDER BY present, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which accumulates from the start of the partition to the current row (including peers with equal sort keys).

SELECT order_date, amount,
       SUM(amount) OVER (ORDER BY order_date) AS running_total,
       SUM(amount) OVER (PARTITION BY customer_id
                         ORDER BY order_date) AS running_per_customer
FROM orders;

Q86. What is a window frame (ROWS BETWEEN)?

A: The frame clause narrows the window to a sliding subset of the partition relative to the current row: ROWS BETWEEN x PRECEDING AND y FOLLOWING. ROWS counts physical rows; RANGE groups peers by value; GROUPS counts peer groups. Frames power moving averages and sliding sums.

SELECT order_date, amount,
       SUM(amount) OVER (ORDER BY order_date
                         ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS sum_7_rows
FROM daily_totals;

Q87. What does NTILE(n) do?

A: NTILE(n) distributes the ordered rows of a partition into n roughly equal buckets, numbered 1..n — used for quartiles, deciles, and percentile-band reporting. When rows don't divide evenly, earlier buckets get one extra row.

SELECT name, salary,
       NTILE(4) OVER (ORDER BY salary DESC) AS salary_quartile
FROM employees;   -- quartile 1 = top 25% earners

Q88. Why does LAST_VALUE() often return the "wrong" result?

A: Because with ORDER BY, the default frame ends at CURRENT ROW — so LAST_VALUE sees only rows up to the current one and returns the current row's value. Fix by widening the frame to ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. FIRST_VALUE is unaffected since the default frame starts at the partition beginning.

SELECT name, salary,
       LAST_VALUE(name) OVER (ORDER BY salary
         ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS top_earner
FROM employees;   -- without the frame clause this would just echo each row's name

Q89. Can you filter on a window function in the WHERE clause?

A: No — window functions are evaluated at the SELECT stage, after WHERE (and after HAVING). To filter on a window result, wrap the query in a CTE or derived table and filter in the outer query. (Some engines like DuckDB add a QUALIFY clause for exactly this.)

WITH ranked AS (
  SELECT name, dept_id, salary,
         ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
  FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;   -- top 3 earners per department

Q90. How do you compute a 7-day moving average?

A: Use AVG with a ROWS frame over date-ordered rows. If dates can be missing, either densify with a calendar table first or use a RANGE frame with an interval (PostgreSQL supports RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW).

SELECT order_date, daily_total,
       AVG(daily_total) OVER (ORDER BY order_date
                              ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7d
FROM daily_totals
ORDER BY order_date;

Tricky Interview Queries

Q91. Write a query to find the second highest salary.

A: Multiple accepted approaches; know at least two. The subquery version handles duplicates and returns NULL when no second salary exists (which LIMIT/OFFSET does not — it returns an empty set unless wrapped).

-- 1) MAX below MAX — duplicate-safe, returns NULL if absent
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- 2) DISTINCT + LIMIT/OFFSET (wrap to return NULL when missing)
SELECT (SELECT DISTINCT salary FROM employees
        ORDER BY salary DESC LIMIT 1 OFFSET 1) AS second_highest;

Q92. Generalize it: find the Nth highest salary.

A: The clean modern answer uses DENSE_RANK, which handles duplicate salaries correctly (equal salaries count as one level). LIMIT/OFFSET with DISTINCT also works where LIMIT is supported.

WITH ranked AS (
  SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
  FROM employees
)
SELECT DISTINCT salary FROM ranked WHERE drnk = 4;   -- N = 4

Q93. How do you find duplicate rows in a table?

A: Group by the columns that define "duplicate" and keep groups with COUNT > 1. To see the full duplicate rows (not just the keys), join back or use COUNT as a window function.

-- Which emails are duplicated, and how many times
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Full duplicate rows via window function
SELECT * FROM (
  SELECT u.*, COUNT(*) OVER (PARTITION BY email) AS cnt FROM users u
) t WHERE cnt > 1;

Q94. How do you delete duplicates while keeping one copy?

A: Keep the row with the smallest id per duplicate group. Portable idioms: self-join delete (MySQL) or ROW_NUMBER + delete (SQL Server/PostgreSQL). Afterward, add a UNIQUE constraint so duplicates can't return.

-- MySQL: keep the lowest id per email
DELETE u1 FROM users u1
JOIN users u2
  ON u1.email = u2.email AND u1.id > u2.id;

-- PostgreSQL/SQL Server: window-function approach
DELETE FROM users
WHERE id IN (
  SELECT id FROM (
    SELECT id, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM users
  ) t WHERE rn > 1
);

Q95. Find the highest-paid employee in each department (with name).

A: You can't just GROUP BY dept and select the name (ungrouped column). Use a window function, or join back to a per-department MAX. Note the window version returns all ties; decide with the interviewer whether ties should be included (RANK) or one picked (ROW_NUMBER).

-- Window function (includes ties)
SELECT name, dept_id, salary FROM (
  SELECT e.*, RANK() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS r
  FROM employees e
) t WHERE r = 1;

-- Join-back (classic pre-window answer)
SELECT e.name, e.dept_id, e.salary
FROM employees e
JOIN (SELECT dept_id, MAX(salary) AS max_sal
      FROM employees GROUP BY dept_id) m
  ON m.dept_id = e.dept_id AND m.max_sal = e.salary;

Q96. Find employees who earn more than their manager.

A: A self join on the manager relationship, comparing salaries across the two aliases — a LeetCode classic (Employees Earning More Than Their Managers).

SELECT e.name AS employee, e.salary, m.name AS manager, m.salary AS mgr_salary
FROM employees e
JOIN employees m ON m.id = e.manager_id
WHERE e.salary > m.salary;

Q97. Find numbers that appear at least three times consecutively.

A: The LAG/LEAD (or triple self join) pattern — LeetCode "Consecutive Numbers". Compare each row with the two rows around it in id order.

SELECT DISTINCT num AS consecutive_num
FROM (
  SELECT num,
         LAG(num)  OVER (ORDER BY id) AS prev_num,
         LEAD(num) OVER (ORDER BY id) AS next_num
  FROM logs
) t
WHERE num = prev_num AND num = next_num;

Q98. Swap all 'M' and 'F' values in a gender column with a single statement.

A: A single UPDATE with a CASE expression — no temp table or two-pass update needed, because the CASE evaluates against each row's original value.

UPDATE employees
SET gender = CASE gender
               WHEN 'M' THEN 'F'
               WHEN 'F' THEN 'M'
               ELSE gender
             END;

Q99. How do you pivot rows into columns (monthly sales per product as columns)?

A: Portable answer: conditional aggregation with CASE (SQL Server/Oracle also offer a PIVOT operator). Each output column is an aggregate filtered to one category.

SELECT product_id,
       SUM(CASE WHEN MONTH(order_date) = 1 THEN amount ELSE 0 END) AS jan,
       SUM(CASE WHEN MONTH(order_date) = 2 THEN amount ELSE 0 END) AS feb,
       SUM(CASE WHEN MONTH(order_date) = 3 THEN amount ELSE 0 END) AS mar
FROM orders
WHERE YEAR(order_date) = 2026
GROUP BY product_id;

Q100. Count orders per customer — including customers with zero orders.

A: The trap: an INNER JOIN (or counting in WHERE) silently drops zero-order customers, and COUNT(*) would count the customer row itself as 1. Use LEFT JOIN and count a column from the right table so unmatched rows count as 0.

SELECT c.id, c.name, COUNT(o.id) AS order_count   -- COUNT(o.id), not COUNT(*)
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name
ORDER BY order_count DESC;