Database Indexing & Performance Tuning Interview Questions and Answers (2026) Interview Questions

BACKEND INTERVIEW PREPARATION

Database Indexing & Performance Tuning

Master 100 database indexing, query optimization and performance tuning interview questions asked at product companies and enterprises for senior Java backend roles in 2026.

⏳ 50 min read 📝 100 Q&As 🎯 Beginner to Advanced

⚡ Quick Reference

Clustered vs Non-Clustered	Clustered sorts & stores table rows by key (1 per table); non-clustered is a separate structure pointing back to rows (many per table)
Covering Index	Index contains every column the query needs — satisfies the query without touching the base table
Composite Left-Prefix Rule	Index on (A,B,C) helps queries filtering on A, or A+B, or A+B+C — but NOT on B or C alone
When Indexes Hurt	Slow down INSERT/UPDATE/DELETE; waste space; optimizer may ignore low-selectivity indexes
EXPLAIN Basics	Shows access type, rows examined, key used — EXPLAIN ANALYZE also runs the query for real timing
Isolation vs Performance	Stricter isolation (SERIALIZABLE) = more locking/lower throughput; READ COMMITTED is the common default
Connection Pooling	Reuses physical DB connections (HikariCP) — avoids costly TCP + auth handshake per request
N+1 Problem	1 query for parent + N queries for each child's lazy association — fix with JOIN FETCH / batch fetching

B+Tree Index Lookup

Root Node

↓

Internal Node

↓

Leaf Nodes (sorted)

↓

Table Rows / RID

Internal Node

↓

Leaf Nodes (sorted)

↓

Table Rows / RID

Index Fundamentals & B+Tree Internals

Q1. What is a database index and why does it speed up reads?

A: An index is a separate, ordered data structure that maps column values to row locations, so the engine can find matching rows without scanning every row (a full table scan). Without an index, a lookup is O(n); with a balanced tree index it becomes roughly O(log n). The trade-off is extra storage and slower writes because the index must be kept in sync with the table.

Q2. Why do most relational databases use B+Trees for indexes instead of plain binary trees?

A: A B+Tree is a balanced, wide (high fan-out) tree where each node holds many keys, matching the disk/page size (typically 4-16 KB). This minimizes tree height and disk I/O — a binary tree with millions of rows would be too tall, requiring many random disk reads. B+Trees also chain leaf nodes together, enabling fast ordered range scans.

Q3. What is the structure of a B+Tree index?

A: A B+Tree has a root node, internal (branch) nodes that only hold routing keys, and leaf nodes that hold the actual indexed values plus a pointer to the row (or the row itself, for a clustered index). Leaf nodes are linked in a doubly linked list, so range queries (BETWEEN, >, ORDER BY) can scan sequentially without revisiting the tree.

Q4. How many disk reads does a B+Tree lookup typically require?

A: Roughly equal to the tree's height — typically 3-4 levels even for tables with hundreds of millions of rows, because each node fans out to hundreds of children. In practice the root and upper internal nodes are cached in the buffer pool/memory, so a warm lookup often costs just 1-2 physical disk reads for the leaf level.

Q5. What is the difference between a B-Tree and a B+Tree?

A: A classic B-Tree stores data (or pointers to data) in internal nodes as well as leaves, and there's no leaf-to-leaf linked list. A B+Tree stores all actual data references only in leaf nodes, keeping internal nodes purely for routing — this makes internal nodes smaller (higher fan-out) and enables efficient sequential range scans via the linked leaves. Nearly all production databases (MySQL InnoDB, PostgreSQL, SQL Server, Oracle) use B+Tree variants.

Q6. What is index selectivity and why does it matter?

A: Selectivity is the ratio of distinct values to total rows for a column. A gender column (2 distinct values) has very low selectivity; an email column (near-unique) has very high selectivity. The optimizer favors indexes with high selectivity because they filter out most rows quickly; a low-selectivity index often triggers a full table scan instead, since scanning may touch fewer total pages than random lookups through the index.

-- selectivity = distinct values / total rows
SELECT COUNT(DISTINCT status) / COUNT(*) AS selectivity FROM orders;
-- close to 0   -> low selectivity, index rarely useful alone
-- close to 1   -> high selectivity, index very useful

Q7. What is a heap table and how does it differ from an index-organized table?

A: A heap table stores rows in no particular order (insertion order, roughly) with a separate rowid used by secondary indexes to locate rows — this is PostgreSQL's and Oracle heap-organized default. An index-organized table (IOT), such as MySQL InnoDB's default, physically stores rows sorted by the primary key inside the clustered index itself, so the primary key lookup requires no extra pointer-following step.

Q8. What happens internally when you run CREATE INDEX on a large existing table?

A: The database scans the entire table, extracts the indexed column values with row pointers, sorts them, and builds the B+Tree bottom-up (leaf level first, then internal levels). This is I/O- and CPU-intensive and can lock the table (or use a lighter lock with ONLINE/CONCURRENT options in Postgres/MySQL) for the duration. On very large tables this should be scheduled during low-traffic windows or done with a non-blocking online DDL tool.

-- PostgreSQL: build index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

-- MySQL: online DDL (default algorithm=INPLACE for most index ops)
ALTER TABLE orders ADD INDEX idx_orders_customer_id (customer_id), ALGORITHM=INPLACE, LOCK=NONE;

Q9. Why does an index slow down INSERT, UPDATE, and DELETE operations?

A: Every write to an indexed column must also update every index that includes that column: locate the correct leaf page, insert/remove the key in sorted order, and possibly split or merge B+Tree pages if they overflow or underflow. A table with 6 indexes means each insert does up to 7 structural writes (1 table + 6 indexes) instead of 1, plus additional write-ahead-log/redo overhead.

Q10. What is a page split in a B+Tree and why is it expensive?

A: When a leaf page is full and a new key must be inserted in sorted position, the engine allocates a new page, moves roughly half the entries to it, and updates the parent node's routing pointers — this can cascade upward if the parent also overflows. Page splits cause fragmentation (pages become half-full, wasting space and increasing I/O) and are one reason sequential, ever-increasing keys (like an auto-increment PK) are cheaper to index than random keys (like a UUID), which cause splits scattered across the whole tree.

-- Random UUID PK: inserts hit random leaf pages -> constant splits, poor cache locality
CREATE TABLE events (id UUID PRIMARY KEY, payload JSON);

-- Sequential PK: inserts append to the rightmost leaf -> few splits, cache-friendly
CREATE TABLE events (id BIGINT AUTO_INCREMENT PRIMARY KEY, payload JSON);

Index Types: Clustered, Non-Clustered, Covering, Composite, Partial & Function-Based

Q11. What is a clustered index?

A: A clustered index determines the physical storage order of table rows — the leaf pages of the B+Tree ARE the actual table data. Because the data can only be sorted one way, a table can have at most one clustered index, typically built on the primary key. Range scans on the clustered key are extremely fast since matching rows are physically adjacent.

Q12. What is a non-clustered (secondary) index?

A: A non-clustered index is a separate B+Tree whose leaf nodes store the indexed column value plus a pointer back to the row — a row ID in a heap table, or the clustered key value in InnoDB. A table can have many non-clustered indexes. Looking up via a non-clustered index that isn't covering requires a second step ("bookmark lookup" or "key lookup") to fetch the full row.

-- InnoDB: secondary index leaf stores the PRIMARY KEY value, not a physical row pointer
CREATE TABLE orders (
  id BIGINT PRIMARY KEY,          -- clustered index
  customer_id BIGINT,
  status VARCHAR(20),
  INDEX idx_customer (customer_id)  -- secondary index; leaf stores (customer_id -> id)
);
-- Lookup by customer_id: 1) find id in secondary index, 2) find full row in clustered index by id

Q13. What is a covering index?

A: A covering index contains every column referenced by a query (in SELECT, WHERE, ORDER BY, GROUP BY), so the engine can answer the query entirely from the index without a lookup into the base table — often shown as "Using index" in MySQL's EXPLAIN. This eliminates the extra I/O of the bookmark lookup and can turn a slow query into a fast one on large tables.

-- Query only needs customer_id and status
SELECT status FROM orders WHERE customer_id = 42;

-- Covering index includes both columns -> no trip to the base table
CREATE INDEX idx_customer_status ON orders(customer_id, status);

Q14. What is a composite (multi-column) index and how does column order matter?

A: A composite index is built on two or more columns concatenated in a defined order, e.g. (last_name, first_name). Column order determines which query patterns benefit: put the most selective / most commonly-filtered-equality column first, and range-filtered or sort columns after. Order also drives the left-prefix rule described below.

Q15. What is the left-prefix rule for composite indexes?

A: An index on (A, B, C) can be used for predicates on A alone, A+B, or A+B+C, because the B+Tree is sorted first by A, then by B within each A, then by C within each B. It CANNOT be used efficiently to filter on B alone or C alone, since those values aren't contiguous in the tree without also constraining A. This is the single most common composite-index misunderstanding in interviews.

CREATE INDEX idx_a_b_c ON t(a, b, c);

SELECT * FROM t WHERE a = 1;                 -- uses index (prefix: a)
SELECT * FROM t WHERE a = 1 AND b = 2;       -- uses index (prefix: a,b)
SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3; -- uses index fully
SELECT * FROM t WHERE b = 2;                 -- CANNOT use index effectively
SELECT * FROM t WHERE c = 3;                 -- CANNOT use index effectively

Q16. What is a partial (filtered) index?

A: A partial index only indexes rows matching a WHERE condition, making it much smaller and faster to scan/maintain than a full index — ideal for columns where you almost always query a small subset, like "active" orders out of a mostly-archived table. Supported natively in PostgreSQL and SQL Server; MySQL lacks true partial indexes but can approximate with a generated column.

-- PostgreSQL: index only the small "pending" subset of a huge orders table
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'PENDING';

-- Query must match the same predicate to use the partial index
SELECT * FROM orders WHERE status = 'PENDING' ORDER BY created_at DESC LIMIT 20;

Q17. What is a function-based (expression) index?

A: A function-based index stores the result of a function or expression applied to column(s), rather than the raw column value — necessary because a plain index on email cannot be used by a query filtering on LOWER(email). The optimizer can only use the index when the query expression matches the indexed expression exactly (or is equivalent after normalization).

-- PostgreSQL / Oracle expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'raj@example.com';  -- uses the index

SELECT * FROM users WHERE email = 'raj@example.com';         -- does NOT use it

Q18. What is a unique index and how does it differ from a unique constraint?

A: A unique index enforces that no two rows share the same indexed value (NULLs are typically treated as distinct in most databases) and, as a side effect, also speeds up lookups on that column. A unique constraint is the logical/declarative rule; databases usually implement it internally by creating a unique index — so in practice they are two views of the same mechanism.

Q19. What is a bitmap index and when is it used?

A: A bitmap index stores one bit vector per distinct value, with a 1 marking rows that have that value — extremely compact and fast for AND/OR combinations of low-cardinality columns (gender, boolean flags, status codes), common in data warehouses (Oracle, some analytical engines). Bitmap indexes are a poor fit for high-write OLTP tables because updating a bit vector for concurrent writers causes heavy lock contention, unlike the row-level locking of a B+Tree index.

Q20. How do you decide which columns to index on a table?

A: Index columns used in WHERE, JOIN ON, ORDER BY, and GROUP BY clauses of your most frequent/expensive queries; favor high-selectivity columns; prefer covering composite indexes for hot query paths; and avoid indexing columns that are rarely filtered or that change on almost every write (heavy write amplification for little read benefit). Always validate the decision with EXPLAIN and production query statistics rather than guessing — over-indexing is as harmful as under-indexing.

Query Optimization & Execution Plans (EXPLAIN / EXPLAIN ANALYZE)

Q21. What does the EXPLAIN command do?

A: EXPLAIN asks the query optimizer to show the execution plan it would use for a query — access method per table (full scan, index scan, index-only scan), join order and algorithm, estimated rows, and estimated cost — WITHOUT actually running the query. It's the first tool to reach for when a query is slow.

EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- MySQL sample columns: id, select_type, table, type, possible_keys, key, rows, Extra

Q22. What is the difference between EXPLAIN and EXPLAIN ANALYZE?

A: Plain EXPLAIN shows the optimizer's *estimated* plan and row counts based on statistics, without executing the query. EXPLAIN ANALYZE actually runs the query and reports real elapsed time and actual row counts per plan node, letting you compare estimated vs. actual and spot bad cardinality estimates. Because it executes the query, be cautious using EXPLAIN ANALYZE on production with INSERT/UPDATE/DELETE (wrap in a transaction and roll back, or use dry-run variants where available).

-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 42;
-- MySQL 8+
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Q23. What does the "type" (access type) column mean in MySQL's EXPLAIN output?

A: It shows how MySQL accesses each table, roughly from best to worst: system/const (single row by primary/unique key), eq_ref (one row per join via unique index), ref (non-unique index lookup), range (index range scan), index (full index scan), ALL (full table scan — usually the red flag to fix). Seeing ALL on a large table in a hot query path is the classic performance-tuning trigger.

Q24. What does "Using filesort" and "Using temporary" in EXPLAIN Extra mean, and why are they concerning?

A: "Using filesort" means MySQL must sort rows outside of an index order (in memory or on disk) rather than reading them pre-sorted from an index — expensive for large result sets. "Using temporary" means MySQL builds a temporary table (often for GROUP BY/DISTINCT with no supporting index) — both are strong hints to add or reorder an index so ORDER BY/GROUP BY can be satisfied directly from index order.

-- Before: no supporting index -> Using filesort
SELECT * FROM orders WHERE customer_id = 42 ORDER BY created_at DESC;

-- After: composite index matches WHERE + ORDER BY -> avoids filesort
CREATE INDEX idx_customer_created ON orders(customer_id, created_at DESC);

Q25. How do you read a PostgreSQL query plan's cost numbers?

A: PostgreSQL shows cost=startup..total in arbitrary units (not milliseconds) — startup cost is work before the first row can be returned, total cost is work to return all rows. It also shows estimated rows and average width in bytes. Compare estimated rows to EXPLAIN ANALYZE's actual rows; a big mismatch signals stale statistics or a hard-to-estimate predicate (like a correlated subquery).

Q26. What is a sequential scan (Seq Scan) in PostgreSQL and when is it actually the right choice?

A: A Seq Scan reads every page of a table in physical order. It's actually optimal (and chosen deliberately by the planner) when a query returns a large fraction of the table's rows — an index scan with many random-access lookups can be slower than one efficient sequential read. Seeing a Seq Scan is only a problem when the query is selective (returns a small fraction of rows) yet still full-scans, which usually indicates a missing or unusable index.

Q27. What is a query execution plan's join order and why does it matter for performance?

A: For multi-table joins, the optimizer decides which table to read first and how to combine the rest (nested loop, hash join, merge join), based on cardinality estimates. A poor join order — e.g., starting from the largest table and nested-looping into others — can turn a millisecond query into a multi-second one. Ensuring accurate statistics and appropriate indexes on join columns lets the optimizer choose efficient orders/algorithms.

Q28. What is the difference between a nested loop join, hash join, and merge join?

A: Nested loop: for each outer row, scan (or index-lookup) the inner table — efficient when the outer set is small and the inner join column is indexed. Hash join: build an in-memory hash table from the smaller input, then probe it with the other — good for large unsorted inputs without indexes. Merge join: both inputs are sorted on the join key and merged in one pass — efficient when data is already sorted (e.g., via an index).

-- Small outer + indexed inner join column -> optimizer favors nested loop
SELECT o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'PENDING';   -- few pending rows, customers.id is PK (indexed)

Q29. What is a query hint and when is it appropriate to use one?

A: A hint forces the optimizer to use a specific index, join type, or join order, overriding its own cost-based decision (e.g., MySQL's USE INDEX/FORCE INDEX, Oracle's /*+ INDEX(...) */, SQL Server's WITH (INDEX(...))). Use hints sparingly and only after confirming with EXPLAIN that the optimizer is consistently making a bad choice (often due to stale statistics or a known optimizer limitation) — hints are brittle and can become wrong as data grows.

SELECT /*+ INDEX(orders idx_customer_created) */ *
FROM orders FORCE INDEX (idx_customer_created)
WHERE customer_id = 42 ORDER BY created_at DESC;

Q30. Why can wrapping a column in a function in the WHERE clause disable index usage?

A: An index is built on the raw column values in sorted order; wrapping the column in a function (YEAR(created_at) = 2026, UPPER(name) = 'RAJ') means the database must evaluate the function on every row to compare — a plain index can't be searched by the transformed value (unless there's a matching function-based index). Rewrite the predicate to keep the raw column on one side of the comparison so the index remains usable.

-- Bad: disables index on created_at
SELECT * FROM orders WHERE YEAR(created_at) = 2026;

-- Good: sargable, uses index range scan
SELECT * FROM orders WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

Statistics & the Query Optimizer

Q31. What are table/column statistics and why does the optimizer need them?

A: Statistics are metadata the database maintains about tables — row counts, distinct value counts per column, value distribution histograms, and index depth/cardinality. The cost-based optimizer uses these to estimate how many rows each step of a plan will produce, which drives every decision: which index to use, which join algorithm, and in what order to join tables. Stale statistics are one of the most common causes of a "suddenly slow" query after data changes significantly.

Q32. How do you refresh statistics, and how often should it happen?

A: PostgreSQL's autovacuum daemon runs ANALYZE automatically based on a percentage of rows changed; you can also run it manually. MySQL's InnoDB recalculates index statistics automatically on certain thresholds, or via ANALYZE TABLE. Run it manually after bulk loads, large deletes, or schema changes, since the automatic thresholds may lag behind a sudden bulk operation.

-- PostgreSQL
ANALYZE orders;
-- MySQL
ANALYZE TABLE orders;

Q33. What is a histogram in the context of query statistics?

A: A histogram approximates the distribution of values in a column by dividing the value range into buckets and recording row counts per bucket, letting the optimizer estimate selectivity for non-uniform data (e.g., 80% of orders have status='COMPLETED', 5% have status='REFUNDED'). Without a histogram, the optimizer often assumes uniform distribution, which can badly misestimate skewed columns and pick a poor plan.

Q34. What is cardinality estimation and why does it sometimes go wrong?

A: Cardinality estimation predicts how many rows a predicate or join will produce. It goes wrong with: correlated columns (optimizer assumes independence between predicates, e.g. city='NYC' AND zip='10001' — treated as if independent, drastically underestimating selectivity), functions/expressions on columns, complex subqueries, and stale statistics after bulk changes. Poor estimates cascade — an error at the bottom of a plan tree can make the whole plan wrong.

Q35. What is a cost-based optimizer vs a rule-based optimizer?

A: A rule-based optimizer (legacy, mostly retired) picks a plan using fixed heuristics ranked by priority, ignoring actual data distribution. A cost-based optimizer (CBO), used by all modern databases, assigns a numeric cost estimate to each candidate plan using statistics (I/O, CPU, memory) and picks the cheapest. CBOs adapt to real data but are only as good as their statistics — which is why keeping stats fresh matters so much.

Q36. What is parameter sniffing and how can it hurt performance?

A: Parameter sniffing occurs when a database caches an execution plan compiled for the first parameter value it saw, then reuses that plan for subsequent calls with very different, less-suitable parameter values — a plan great for a highly selective value can be terrible for a common one. Mitigations: query hints to force recompilation, optimizer hints for a "generic" plan, or splitting into separate statements/stored procedures for very different value distributions.

Q37. Why might the optimizer ignore an existing index even though it looks useful?

A: The optimizer estimates it's cheaper to full-scan when: the predicate matches a large fraction of rows (low selectivity), the table is small enough to fit in a few pages (index overhead not worth it), the index doesn't cover the query and would need many random bookmark lookups, or the query wraps the column in a function/implicit type conversion, making the index unusable. Always verify with EXPLAIN rather than assuming an index is being used.

-- Implicit type conversion silently disables an index
-- customer_id is BIGINT, comparing to a string forces a cast on every row
SELECT * FROM orders WHERE customer_id = '42';   -- may skip the index

-- Correct: match column type exactly
SELECT * FROM orders WHERE customer_id = 42;

Q38. What is plan caching and what is a plan cache invalidation storm?

A: Databases cache compiled execution plans keyed by query text/parameters to avoid re-parsing and re-optimizing identical queries repeatedly. A cache invalidation storm happens when schema changes, statistics updates, or a flood of unique literal-valued queries (instead of parameterized queries) constantly evict/rebuild the cache, burning CPU on repeated optimization. Always use bind parameters (PreparedStatement) instead of string-concatenated literals to maximize plan cache hits.

Q39. What is the difference between logical and physical query optimization?

A: Logical optimization rewrites the query into an equivalent but potentially cheaper form — predicate pushdown, subquery-to-join conversion, constant folding, removing redundant joins — independent of physical storage. Physical optimization then picks concrete access paths and algorithms (which index, which join method, which order) based on statistics and cost estimates. Both stages happen before a single row is read.

Q40. How can you tell the optimizer's estimated rows are wrong, and what do you do about it?

A: Run EXPLAIN ANALYZE and compare the "estimated rows" from the plan against "actual rows" — a large gap (e.g., estimated 10, actual 500,000) means the optimizer's model is off, usually from stale/missing statistics or an unmodelable predicate. Fixes: run ANALYZE/UPDATE STATISTICS, increase histogram bucket count/statistics target for that column, or rewrite the predicate into a form the optimizer can estimate more accurately.

Locking, Blocking & Deadlocks

Q41. What is the difference between a shared lock and an exclusive lock?

A: A shared (S) lock allows multiple concurrent readers on the same row/page but blocks writers; several transactions can hold shared locks on the same resource simultaneously. An exclusive (X) lock is held by a single writer and blocks all other readers and writers on that resource until released. Most write statements (UPDATE/DELETE) acquire exclusive locks on the affected rows.

Q42. What is the difference between row-level locking and table-level locking?

A: Row-level locking (InnoDB, PostgreSQL MVCC) locks only the specific rows a statement touches, allowing high concurrency — other transactions can freely modify unrelated rows. Table-level locking (MyISAM, or explicit LOCK TABLE statements) locks the entire table, blocking all concurrent writers even on unrelated rows — far less scalable and generally avoided in modern OLTP workloads except for rare maintenance operations.

Q43. What is a deadlock and how does the database resolve it?

A: A deadlock occurs when two (or more) transactions each hold a lock the other needs, forming a cycle where neither can proceed — e.g., Tx1 locks row A then waits for row B, while Tx2 locks row B then waits for row A. The database's deadlock detector periodically scans the wait-for graph, finds the cycle, and kills one transaction (the "victim," typically the one with less work done or lowest cost to roll back) with an error, letting the other proceed.

-- Tx1
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- locks row 1
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- waits for row 2

-- Tx2 (concurrently)
UPDATE accounts SET balance = balance - 50 WHERE id = 2;   -- locks row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1;   -- waits for row 1 -> DEADLOCK

Q44. How do you prevent deadlocks at the application level?

A: Always acquire locks/update rows in a consistent global order across all transactions (e.g., always update the lower account ID first), keep transactions short, avoid interactive user input inside a transaction, use the lowest isolation level that satisfies correctness, and add retry logic with exponential backoff for the (expected, occasional) deadlock exception rather than treating it as fatal.

// Consistent lock ordering prevents the classic transfer deadlock
Long first = Math.min(fromId, toId);
Long second = Math.max(fromId, toId);
accountRepository.lockForUpdate(first);
accountRepository.lockForUpdate(second);

Q45. What is lock escalation and why can it hurt concurrency?

A: Lock escalation is when the engine converts many fine-grained row locks into a single coarser table (or page) lock to save memory/overhead, typically triggered after a transaction acquires locks beyond a threshold. This dramatically reduces concurrency because other transactions that would only conflict on unrelated rows now get blocked by the coarse lock. Breaking large bulk updates into smaller batches avoids triggering escalation.

Q46. What is blocking (lock waiting) and how do you diagnose it in production?

A: Blocking happens when one transaction holds a lock that another transaction needs, forcing the second to wait — normal and expected up to a point, but a problem when wait chains grow long or a "head blocker" transaction stalls for a long time (e.g., an idle-in-transaction connection). Diagnose with database-specific views: MySQL's performance_schema.data_lock_waits, PostgreSQL's pg_locks joined to pg_stat_activity, or SQL Server's sys.dm_tran_locks.

-- PostgreSQL: find blocking sessions
SELECT blocked.pid AS blocked_pid, blocking.pid AS blocking_pid, blocked.query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks kl ON kl.locktype = bl.locktype AND kl.granted
JOIN pg_stat_activity blocking ON blocking.pid = kl.pid
WHERE bl.pid <> kl.pid;

Q47. What is a "lock timeout" and how does it differ from deadlock detection?

A: A lock timeout aborts a transaction that has waited longer than a configured threshold (e.g., MySQL's innodb_lock_wait_timeout) regardless of whether a true cycle exists — it's a blunt safety net for long waits. Deadlock detection is smarter: it actively looks for a genuine circular wait and resolves it immediately, often faster than any timeout would trigger. Both mechanisms typically run together in production databases.

Q48. What is optimistic locking and how is it typically implemented in a Java/JPA application?

A: Optimistic locking assumes conflicts are rare: it reads a row without locking it, and on update checks that a version number (or timestamp) hasn't changed since it was read — if it has, the update fails and the application retries or reports a conflict. This avoids holding database locks for the whole "think time" of a user request, giving much higher throughput than pessimistic locking for low-contention data.

@Entity
class Account {
    @Id Long id;
    BigDecimal balance;
    @Version Long version;   // JPA optimistic locking column
}
// Generated SQL: UPDATE account SET balance=?, version=version+1
//                WHERE id=? AND version=?  -- 0 rows updated -> OptimisticLockException

Q49. What is pessimistic locking and when is it preferred over optimistic locking?

A: Pessimistic locking acquires a real database lock (SELECT ... FOR UPDATE) at read time, blocking other transactions from modifying (or sometimes reading) that row until the current transaction commits or rolls back. Prefer it when conflicts are frequent (high contention on the same rows, like a shared inventory counter) — retrying failed optimistic updates repeatedly under heavy contention wastes more work than blocking briefly.

@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT i FROM Inventory i WHERE i.productId = :id")
Inventory findForUpdate(@Param("id") Long id);
// Generated SQL: SELECT * FROM inventory WHERE product_id = ? FOR UPDATE

Q50. What is a phantom lock / gap lock and why does InnoDB use it?

A: A gap lock locks the space BETWEEN index records (not the records themselves) to prevent other transactions from inserting new rows into that range — used by InnoDB under REPEATABLE READ to prevent phantom reads (new rows appearing on re-query within the same transaction). Gap locks can cause unexpected blocking on INSERTs that don't touch any existing row, which surprises developers used to purely row-level locking semantics.

Transaction Isolation Levels & Performance Trade-offs

Q51. What are the four standard SQL isolation levels?

A: From weakest to strongest: READ UNCOMMITTED (allows dirty reads), READ COMMITTED (no dirty reads, but non-repeatable reads possible), REPEATABLE READ (no dirty or non-repeatable reads, but phantom reads possible in the strict standard definition), and SERIALIZABLE (fully isolated, transactions behave as if run one at a time). Each stronger level trades concurrency/throughput for stronger consistency guarantees.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- or per-connection in JDBC
connection.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);

Q52. What is a dirty read and which isolation level prevents it?

A: A dirty read occurs when a transaction reads data written by another transaction that hasn't committed yet — if that other transaction rolls back, the reader saw data that never really existed. READ UNCOMMITTED is the only level that allows dirty reads; READ COMMITTED and every stronger level prevent them. Most production systems never use READ UNCOMMITTED for this reason.

Q53. What is a non-repeatable read and which isolation level prevents it?

A: A non-repeatable read happens when a transaction reads the same row twice and gets different values because another committed transaction modified it in between. REPEATABLE READ and SERIALIZABLE prevent this by ensuring a transaction sees a consistent snapshot of rows it has already read. READ COMMITTED does NOT prevent it, since each statement sees the latest committed data at the time it runs.

Q54. What is a phantom read and which isolation level prevents it?

A: A phantom read occurs when a transaction re-runs a range query and sees NEW rows that another transaction inserted and committed in between — the original rows didn't change, but the result set's membership did. Standard SQL says only SERIALIZABLE fully prevents phantoms, though InnoDB's REPEATABLE READ also prevents most phantoms in practice via gap locking and MVCC snapshots (a notable deviation from the ANSI SQL standard's strict definition).

Q55. Why is READ COMMITTED the default isolation level in most production systems (Oracle, PostgreSQL, SQL Server)?

A: It offers a practical balance: each statement sees a fresh, consistent snapshot as of its own start (via MVCC), preventing dirty reads while allowing high concurrency, since readers never block writers and writers never block readers. Stricter levels add real overhead (more locking, more version-chain tracking, transaction retries under SERIALIZABLE), which most applications don't need for correctness.

Q56. How does MVCC (Multi-Version Concurrency Control) enable readers to avoid blocking writers?

A: MVCC keeps multiple physical versions of a row, each tagged with the transaction that created it. A reader is given a consistent snapshot (based on transaction IDs / commit timestamps) and reads the version of each row visible as of that snapshot, ignoring uncommitted or newer versions — so it never needs a lock that would conflict with a concurrent writer. PostgreSQL, MySQL InnoDB, and Oracle all rely heavily on MVCC for read scalability.

Q57. What is write skew and why can SERIALIZABLE (or its absence) matter for it?

A: Write skew occurs when two transactions each read overlapping data, then each writes to a DIFFERENT row based on what they read, and the combined effect violates an invariant that neither transaction's individual write would have violated alone (classic example: two on-call doctors each check "is at least one other doctor on call" before going off duty, and both go off duty simultaneously). Only true SERIALIZABLE isolation (or explicit locking/constraints) prevents this — REPEATABLE READ does not, because it only guards single-row consistency, not cross-row invariants.

Q58. What is the performance cost of SERIALIZABLE isolation?

A: Implementations either use heavy locking (predicate locks covering ranges, blocking many concurrent transactions) or optimistic Serializable Snapshot Isolation (PostgreSQL, tracks read/write dependencies and aborts transactions that would create a serialization anomaly, forcing retries). Both approaches reduce throughput under contention — more blocking or more transaction retries — so SERIALIZABLE should be reserved for the specific operations that truly need it, not applied globally.

Q59. How do you choose an appropriate isolation level for a given feature?

A: Default to READ COMMITTED for most CRUD operations. Use REPEATABLE READ (or explicit row locks) when a transaction reads a value and later writes based on it and must not see it change mid-transaction (e.g., inventory decrement). Reserve SERIALIZABLE for operations with genuine cross-row invariants that other levels can't guard (rare, and often better solved with a unique constraint or explicit application-level lock instead).

Q60. What is the difference between isolation level and lock granularity in terms of tuning knobs?

A: Isolation level controls WHAT anomalies are permitted (dirty/non-repeatable/phantom reads) and is set per transaction/connection. Lock granularity controls HOW locks are taken (row vs page vs table) to enforce whatever isolation guarantee is required, and is largely an engine implementation detail, though it's influenced by index design — a query that can't use a selective index locks (or scans) far more rows/pages than one that can, regardless of isolation level chosen.

Connection Pooling (HikariCP) & JDBC Tuning

Q61. Why is connection pooling essential for database performance?

A: Opening a raw database connection involves a TCP handshake, authentication, and session setup — often tens of milliseconds of pure overhead, unacceptable if repeated on every request. A connection pool creates a fixed set of physical connections up front and hands them out/reclaims them per request, so application code pays only the (near-zero) cost of borrowing an already-open connection.

Q62. Why does the HikariCP documentation recommend a small pool size rather than a large one?

A: Database connections are limited by the number of CPU cores and disk I/O available on the DB server — beyond a certain pool size, additional connections just cause more context-switching and lock contention among threads competing for the same finite hardware, actually reducing throughput. HikariCP's own guidance formula is roughly connections = cores * 2 + effective_spindle_count as a starting point, then tune with load testing.

HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://db:5432/app");
config.setMaximumPoolSize(10);       // not thousands — tune per hardware
config.setMinimumIdle(10);
config.setConnectionTimeout(3000);   // fail fast rather than queue forever
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);      // recycle before DB/LB drops stale conns
HikariDataSource ds = new HikariDataSource(config);

Q63. What does HikariCP's maximumPoolSize actually control, and what happens when it's exhausted?

A: It caps the total number of physical connections the pool will ever open (idle + in-use). When all connections are busy and a new request needs one, the requesting thread blocks (queues) until either a connection is returned or connectionTimeout elapses, at which point it throws a SQLTransientConnectionException. Sizing too low causes queuing/timeouts under load; sizing too high overwhelms the database.

Q64. What is connection leak detection in HikariCP and why is it important?

A: A connection leak happens when application code borrows a connection but never returns it (e.g., missing close() in a finally block or try-with-resources), silently shrinking the effective pool until it's exhausted and everything times out. HikariCP's leakDetectionThreshold logs a stack trace if a connection is held longer than the configured milliseconds, pinpointing the leaking code path before it takes down production.

config.setLeakDetectionThreshold(60000);  // warn if held > 60s

// Always use try-with-resources so the connection is returned even on exception
try (Connection conn = dataSource.getConnection();
     PreparedStatement ps = conn.prepareStatement(sql)) {
    ps.executeQuery();
}

Q65. What is a PreparedStatement and why does it matter for both security and performance?

A: A PreparedStatement pre-compiles SQL with placeholder parameters, separating code from data — this prevents SQL injection since parameter values are never interpreted as SQL syntax. For performance, it lets the database (and driver-side statement cache) reuse a single cached execution plan across many calls with different parameter values, instead of re-parsing and re-optimizing a brand-new SQL string every time.

// Bad: string concatenation -> SQL injection risk + no plan reuse
String sql = "SELECT * FROM users WHERE email = '" + email + "'";

// Good: PreparedStatement -> safe + plan cached & reused
PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE email = ?");
ps.setString(1, email);
ResultSet rs = ps.executeQuery();

Q66. What is JDBC batch updating and how much can it improve write performance?

A: Batch updating groups multiple INSERT/UPDATE statements into a single network round-trip to the database instead of sending them one at a time — for bulk inserts this can be an order of magnitude faster because network latency, not the database itself, often dominates single-row round-trip cost. JDBC and Hibernate (hibernate.jdbc.batch_size) both support batching.

PreparedStatement ps = conn.prepareStatement("INSERT INTO logs(msg) VALUES (?)");
for (String msg : messages) {
    ps.setString(1, msg);
    ps.addBatch();
}
int[] results = ps.executeBatch();   // one round trip for the whole batch

Q67. What is fetch size in JDBC and how does it affect memory and performance?

A: Fetch size controls how many rows the driver retrieves per network round-trip when streaming a large ResultSet, rather than pulling the entire result into client memory at once (which is actually the JDBC/MySQL Connector default behavior unless you set streaming mode explicitly). A small fetch size means more round-trips (slower); too large risks high client memory usage for huge result sets. Tune it for reporting/export queries that return millions of rows.

Statement stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(500);   // stream 500 rows at a time instead of loading all
ResultSet rs = stmt.executeQuery("SELECT * FROM huge_table");

Q68. What is the difference between autoCommit true and false, and how does it affect throughput?

A: With autoCommit=true (the JDBC default), every single statement is its own committed transaction — meaning a fsync-to-disk (transaction log flush) after every statement, which is very slow for multi-statement units of work. Setting autoCommit=false and explicitly committing after a batch of related statements groups them into one transaction with one flush, drastically improving throughput for bulk operations.

conn.setAutoCommit(false);
for (Order o : orders) {
    insertOrder(conn, o);
}
conn.commit();   // one flush for the whole batch instead of one per row

Q69. What is the N+1 connection acquisition anti-pattern, and how does connection pool exhaustion cascade under load?

A: If code acquires and releases a connection repeatedly within a loop (instead of once per logical unit of work), it multiplies pool checkout/checkin overhead and increases the chance of contention. Under load, if requests hold connections longer than expected (slow queries, leaks), the pool exhausts, new requests queue and time out, and — because those failed requests often get retried by upstream callers — the retry storm compounds the load on an already-struggling database, a classic cascading failure pattern.

Q70. What database-side setting complements connection pool sizing to protect the database from being overwhelmed?

A: The database's own max_connections (PostgreSQL) or max_connections (MySQL) setting is a hard ceiling; the sum of all application instances' pool sizes should stay comfortably under it, since each raw connection consumes real server memory and each connection is not free even when idle. Many teams add a proxy layer (PgBouncer, ProxySQL) in front of the database to multiplex many application-side logical connections onto fewer physical backend connections.

Caching Strategies: Query Cache, Second-Level Cache & Redis

Q71. Why was MySQL's built-in query cache removed in MySQL 8.0?

A: MySQL's query cache stored the exact result set keyed by the exact SQL text, but ANY write to a referenced table invalidated all cached results for that table — under concurrent write-heavy workloads (the common case in OLTP) this caused constant cache thrashing and a global mutex became a serialization bottleneck, often making it SLOWER than no cache at all. It was deprecated in 5.7 and removed entirely in 8.0 in favor of application-level or proxy-level caching.

Q72. What is Hibernate's first-level cache?

A: The first-level (L1) cache is the Hibernate/JPA Session/EntityManager's built-in, mandatory, per-transaction cache — once an entity is loaded by ID within a session, subsequent lookups for that same ID return the cached instance without hitting the database, and it's automatically cleared when the session closes. It's always on and cannot be disabled, but its scope is limited to a single persistence context.

Q73. What is Hibernate's second-level cache and how does it differ from the first-level cache?

A: The second-level (L2) cache is shared across sessions/transactions within an entire application (or cluster, with a distributed cache provider like Ehcache, Infinispan, or Redis), caching entities and query results beyond a single unit of work. It's opt-in per entity and must be configured carefully — stale L2 cache entries after external/direct database writes are a real correctness risk, so it fits best for read-heavy, rarely-changing reference data.

@Entity
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
class Country {
    @Id Long id;
    String name;   // rarely changes -> good L2 cache candidate
}

Q74. What is cache-aside (lazy loading) and how is it implemented with Redis?

A: Cache-aside: application checks the cache first; on a miss, it reads from the database, then populates the cache for next time. It's the most common pattern because the cache only holds what's actually been requested, and the database remains the single source of truth. The trade-off is a "thundering herd" risk if many requests miss simultaneously for the same hot key.

Product getProduct(Long id) {
    String cached = redis.get("product:" + id);
    if (cached != null) return deserialize(cached);       // cache hit

    Product product = productRepository.findById(id);      // cache miss -> DB
    redis.setex("product:" + id, 3600, serialize(product)); // populate with TTL
    return product;
}

Q75. What is cache invalidation and why is it famously called one of the two hardest problems in computer science?

A: Cache invalidation means removing or updating stale cache entries after the underlying data changes — hard because the write path and the cache may be in different processes/services, updates can race with concurrent reads, and forgetting an invalidation path silently serves wrong data for the TTL duration. Common strategies: explicit invalidation on write, short TTLs as a safety net, or a version/tag-based scheme that invalidates whole groups of keys at once.

void updateProduct(Product p) {
    productRepository.save(p);
    redis.del("product:" + p.getId());   // explicit invalidation on write
}

Q76. What is a write-through cache vs a write-behind (write-back) cache?

A: Write-through writes to the cache AND the database synchronously as part of the same operation, keeping them always consistent but paying both write costs on the critical path. Write-behind writes to the cache immediately and asynchronously flushes to the database later (batched), giving very low write latency but risking data loss if the cache node fails before the flush completes — used only when some data loss risk is acceptable for the latency gain.

Q77. What is a thundering herd (cache stampede) and how do you prevent it?

A: A thundering herd happens when a hot cache key expires and many concurrent requests all miss simultaneously, all hammering the database to recompute the same value at once — potentially overwhelming it. Mitigations: a distributed lock so only one request recomputes while others wait or serve stale data briefly, staggered/jittered TTLs so keys don't all expire at the same instant, or probabilistic early expiration/recomputation before the actual TTL.

// Jittered TTL avoids many keys expiring at the exact same second
int baseTtl = 3600;
int jitter = ThreadLocalRandom.current().nextInt(300);
redis.setex(key, baseTtl + jitter, value);

Q78. What is the difference between caching a query result and caching an entity/object?

A: Caching a query result stores the exact rows returned for a specific query+parameters combination — fast for that exact query, but a poor fit for arbitrary later queries and hard to invalidate precisely (any relevant write invalidates the whole cached set). Caching individual entities by ID is more reusable (any query needing that entity can hit the cache) and invalidation is precise (just remove that one ID) — generally the better default for OLTP-style applications.

Q79. Why should you set a TTL (time-to-live) on cache entries even when you have explicit invalidation logic?

A: Explicit invalidation can be missed due to bugs, out-of-band writes (direct SQL, another service, a batch job) that bypass the invalidation code path, or partial failures (write succeeded, invalidation call failed). A TTL acts as a safety net bounding how long any stale data can survive, trading a small amount of eventual consistency for much stronger protection against permanently stale cache entries.

Q80. What kinds of data are poor candidates for caching, and why?

A: Highly volatile data that changes on nearly every read (real-time stock prices, live inventory counts under high contention) gains little from caching and risks frequent staleness; data requiring strict, immediate consistency (financial balances used for a transfer decision) is dangerous to cache without careful invalidation; and low-reuse data (a report requested exactly once) wastes cache memory without ever producing a hit. Good candidates are read-heavy, relatively stable, and reused across many requests.

Partitioning & Sharding

Q81. What is table partitioning and what problem does it solve?

A: Partitioning splits one logical table into multiple physical segments (partitions) based on a partition key, while the database still presents it as a single table to queries. It helps performance and manageability at scale: queries that filter on the partition key can skip entire partitions (partition pruning), and maintenance operations (archiving, dropping old data) can operate on a whole partition instantly instead of a slow row-by-row DELETE.

-- PostgreSQL: range partitioning by month
CREATE TABLE orders (id BIGINT, created_at DATE, amount NUMERIC) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2026_01 PARTITION OF orders
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE orders_2026_02 PARTITION OF orders
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Q82. What is the difference between range, list, and hash partitioning?

A: Range partitioning assigns rows based on a value falling within a defined range (dates, ID ranges) — great for time-series data and easy pruning/archiving. List partitioning assigns rows based on an explicit list of discrete values (region = 'US', 'EU', 'APAC'). Hash partitioning applies a hash function to the key to distribute rows evenly across a fixed number of partitions — good for spreading load uniformly when there's no natural range/list grouping, at the cost of losing easy range-based pruning.

Q83. What is partition pruning and how does it speed up queries?

A: Partition pruning is the optimizer recognizing, from the query's WHERE clause and the partitioning scheme, that only a subset of partitions could possibly contain matching rows, and skipping the rest entirely at plan time — so a query filtering on a single month against a table partitioned by month only scans that one partition's data and indexes, not the whole table's history. This is one of partitioning's biggest performance wins for time-series and log-style tables.

-- Only orders_2026_02 is scanned; other monthly partitions are pruned
EXPLAIN SELECT * FROM orders WHERE created_at >= '2026-02-01' AND created_at < '2026-03-01';

Q84. What is the difference between vertical partitioning and horizontal partitioning?

A: Horizontal partitioning (what most people mean by "partitioning") splits rows across partitions/tables — every partition has the same columns but different rows. Vertical partitioning splits columns instead — moving rarely-accessed or large columns (e.g., a big TEXT/BLOB) into a separate table joined by a foreign key, so the frequently-queried "hot" columns stay compact and cache-friendly.

Q85. What is database sharding and how does it differ from partitioning?

A: Partitioning splits data within a single database instance (still one server, one point of failure, shared resources). Sharding splits data across multiple independent database instances/servers — each shard is a complete, separate database holding a subset of the overall data, enabling horizontal scale-out beyond what a single machine's CPU/memory/disk can handle. Sharding adds significant complexity: cross-shard joins/transactions are hard or impossible without extra coordination.

Q86. What is a shard key and what makes a good one?

A: The shard key determines which shard a given row lives on (e.g., customer_id % number_of_shards, or a consistent hash of a tenant ID). A good shard key has high cardinality (avoids hot shards), distributes write and read load evenly, and — critically — matches the access pattern of your most common queries, since queries that don't include the shard key must fan out to every shard (a "scatter-gather" query), which is slow and doesn't scale.

// Simple hash-based shard routing
int shardId = Math.abs(tenantId.hashCode()) % NUM_SHARDS;
DataSource shard = shardRoutingDataSource.get(shardId);

Q87. What is a scatter-gather query and why is it a sharding anti-pattern?

A: A scatter-gather query has to be sent to every shard (because it doesn't filter on the shard key), then results are merged/aggregated by the application or a coordinator — latency is bounded by the SLOWEST shard, and the coordination overhead grows with shard count. Frequent scatter-gather queries indicate the shard key doesn't match real query patterns, and often signal the need for a secondary index service, a denormalized read model, or a different sharding strategy.

Q88. What is resharding and why is it operationally difficult?

A: Resharding means changing the number of shards or the shard-assignment scheme (e.g., growing from 4 to 8 shards) and physically moving data to match — it's difficult because it must happen with minimal downtime, keep data consistent during the migration window, and update all routing metadata atomically. Consistent hashing schemes minimize how much data must move on resharding compared to naive modulo-based schemes, where nearly all keys can remap when the shard count changes.

Q89. How do foreign keys and joins work (or not work) across shards?

A: Traditional foreign-key constraints and SQL JOINs generally only work within a single database instance, so cross-shard relationships can't be enforced or joined natively by the database — the application must denormalize related data onto the same shard (co-locate by the same shard key, e.g., keep a customer and all their orders on the same shard), or perform the join in application code across two shard queries, accepting the extra complexity and eventual-consistency risk.

Q90. When should a team consider sharding versus just scaling vertically or adding read replicas?

A: Reach for sharding only after read replicas (for read scale) and vertical scaling (bigger instance) are exhausted or hit diminishing returns — sharding solves write-throughput and total-data-volume limits that a single primary instance simply cannot handle, but it adds substantial application, operational, and consistency complexity. Most systems never need to shard; premature sharding is a common and costly over-engineering mistake.

Diagnosing Slow Queries & the N+1 Problem (Hibernate/JPA)

Q91. What is the N+1 select problem?

A: It happens when an ORM lazily loads a collection or association: fetching N parent rows with 1 query, then triggering 1 additional query PER parent to lazily load each one's related child collection or association — N+1 total queries instead of 1 or 2. It's the single most common Hibernate/JPA performance bug, invisible in development with small datasets but devastating at production scale.

List<Author> authors = authorRepository.findAll();   // 1 query
for (Author a : authors) {
    System.out.println(a.getBooks().size());          // N queries, one per author!
}

Q92. How do you fix the N+1 problem with JOIN FETCH in JPQL?

A: JOIN FETCH tells Hibernate to eagerly load the association in the SAME query using a SQL join, instead of issuing a separate query per parent — collapsing N+1 queries into exactly 1. It's the most direct fix when you know upfront that you'll need the association for every result.

@Query("SELECT a FROM Author a JOIN FETCH a.books WHERE a.active = true")
List<Author> findActiveAuthorsWithBooks();
// 1 SQL query with a JOIN instead of 1 + N

Q93. What is Hibernate's @BatchSize annotation and how does it mitigate N+1 without a full JOIN FETCH?

A: @BatchSize(size = n) tells Hibernate, when lazily loading a collection/entity for multiple parents, to batch the loads together — instead of 1 query per parent, it issues 1 query per BATCH of n parents (e.g., WHERE author_id IN (?, ?, ?, ...)), reducing N+1 to roughly N/batchSize+1 queries. It's a good middle ground when a full eager JOIN FETCH would return a wastefully large duplicated result set.

@Entity
class Author {
    @OneToMany(mappedBy = "author")
    @BatchSize(size = 20)
    List<Book> books;
}
// Generated: SELECT * FROM book WHERE author_id IN (?,?,...,?) -- batches of 20

Q94. What is the difference between FetchType.LAZY and FetchType.EAGER in JPA, and which should be the default?

A: LAZY defers loading an association until it's actually accessed; EAGER loads it immediately as part of the parent's query, every single time, whether needed or not. LAZY should be the default for almost everything (especially collections and to-many relations) — EAGER associations that aren't always needed silently bloat every query that loads the parent entity, even in code paths that never touch the association.

@Entity
class Author {
    @OneToMany(mappedBy = "author", fetch = FetchType.LAZY)  // preferred default
    List<Book> books;

    @ManyToOne(fetch = FetchType.LAZY)   // even @ManyToOne defaults to EAGER — override it!
    Publisher publisher;
}

Q95. What is a LazyInitializationException and why does it commonly occur?

A: It's thrown when code tries to access a lazily-loaded association AFTER the Hibernate session/persistence context that loaded the parent entity has already closed (e.g., in a web layer after the transaction boundary ended) — Hibernate can no longer issue the extra query to fetch it. Fixes: fetch what you need within the transaction (JOIN FETCH, explicit access before the session closes), use DTO projections instead of exposing entities past the transaction boundary, or in limited cases use OSIV (Open Session In View) — generally discouraged because it hides N+1 problems rather than fixing them.

Q96. How do you use Hibernate statistics or SQL logging to detect N+1 problems during development?

A: Enable hibernate.generate_statistics=true and inspect query counts per session, or simply turn on show_sql/format_sql and watch for a suspicious repeating query pattern (same SQL shape, different parameter) in the logs when iterating over a list. Tools like p6spy, datasource-proxy, or Hypersistence Utils' statement inspector make this visible in test suites, and some teams add an automated test asserting a maximum query count for a given code path.

spring.jpa.properties.hibernate.generate_statistics=true
logging.level.org.hibernate.stat=DEBUG
# Watch for: "HHH000117: HQL: ... , time: ..., rows: 1" repeated N times

Q97. What is a DTO projection and why can it outperform loading full entities for read-only use cases?

A: A DTO (Data Transfer Object) projection selects only the specific columns a view/API response actually needs, mapped directly into a lightweight non-entity object — avoiding the overhead of hydrating full entity graphs, lazy-loading proxies, and persistence-context tracking (dirty checking) for data that will only ever be read, never modified. For read-heavy endpoints, this can significantly reduce both query cost and memory/CPU overhead compared to loading and discarding full managed entities.

public interface OrderSummary {
    Long getId();
    BigDecimal getTotal();
}

@Query("SELECT o.id AS id, o.total AS total FROM Order o WHERE o.customerId = :cid")
List<OrderSummary> findSummariesByCustomer(@Param("cid") Long customerId);

Q98. What is a slow query log and how do you use it to prioritize tuning work?

A: The slow query log (MySQL's slow_query_log, PostgreSQL's log_min_duration_statement) records every query exceeding a configured duration threshold, along with execution time and (optionally) rows examined. Aggregating this log (via tools like pt-query-digest or pgBadger) by normalized query pattern reveals which queries cost the MOST cumulative time across all executions — often not the single slowest query, but a moderately-slow query run thousands of times per minute — which is where tuning effort should focus first.

-- PostgreSQL: log any statement taking longer than 200ms
ALTER SYSTEM SET log_min_duration_statement = 200;

-- MySQL: enable the slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.2;

Q99. What is the "SELECT *" anti-pattern and why does it hurt performance beyond readability?

A: SELECT * retrieves every column even when only a few are needed, increasing network payload, preventing covering-index usage (since the index rarely contains ALL columns), and making the result vulnerable to breaking silently when the table's column set changes. Explicitly listing needed columns lets the optimizer consider covering indexes and reduces I/O and serialization cost, especially significant for wide tables with large TEXT/BLOB columns.

-- Bad: fetches large 'description' column even though it's unused, defeats covering index
SELECT * FROM products WHERE category_id = 5;

-- Good: matches a covering index on (category_id, name, price)
SELECT name, price FROM products WHERE category_id = 5;

Q100. What is a holistic checklist for diagnosing and fixing a slow production query end to end?

A: 1) Identify it via the slow query log or APM tooling, aggregated by total cumulative time, not just single-execution duration. 2) Run EXPLAIN ANALYZE to see the real plan and spot full scans, filesorts, or bad cardinality estimates. 3) Check statistics freshness (ANALYZE) if estimated vs actual rows diverge wildly. 4) Add or adjust indexes (composite/covering) to match the query's WHERE/ORDER BY/JOIN columns, respecting the left-prefix rule. 5) Rewrite non-sargable predicates (functions on indexed columns, implicit casts). 6) Check for N+1 patterns if the query originates from an ORM. 7) Consider caching for read-heavy, rarely-changing results. 8) Re-measure with EXPLAIN ANALYZE and load testing before and after to confirm the fix actually helped in production-like conditions, not just on a small local dataset.