MongoDB Interview Questions and Answers (2026) Interview Questions

BACKEND INTERVIEW PREPARATION

MongoDB Interview Questions and Answers

Master 100 MongoDB interview questions and answers for Java backend developer roles — covering BSON, CRUD, aggregation, indexes, replication, sharding, transactions, schema design and Spring Data MongoDB for 2026 interviews.

⏳ 50 min read 📝 100 Q&As 🎯 Beginner to Advanced

⚡ Quick Reference

BSON	Binary JSON — MongoDB's on-disk/wire storage format; adds types like Date, ObjectId, Binary not in plain JSON
_id / ObjectId	Mandatory primary key per document; ObjectId is a 12-byte timestamp+machine+process+counter value
Replica Set	Group of mongod nodes (1 primary + secondaries) providing high availability via automatic failover
Shard Key	Field(s) used to distribute documents across shards; choice is immutable and critical for scaling
Aggregation Pipeline	Sequence of stages ($match, $group, $project…) that transform documents like a data-processing pipeline
Index Types	Single field, compound, multikey, text, hashed, TTL, wildcard, partial, sparse
WiredTiger	Default storage engine since MongoDB 3.2 — document-level locking, compression, checkpointing
Transactions	Multi-document ACID transactions supported since MongoDB 4.0 (replica sets) and 4.2 (sharded clusters)

MongoDB Sharded Cluster Architecture

Application / Java Driver

↓

mongos Router

↓

Config Servers (Replica Set)

Cluster metadata & chunk map

Shard 1 (Replica Set)

Primary

Secondary + Secondary

Shard 2 (Replica Set)

Primary

Secondary + Secondary

MongoDB Basics & BSON

Q1. What is MongoDB and how does it differ from a relational database?

A: MongoDB is a document-oriented NoSQL database that stores data as flexible, JSON-like BSON documents grouped into collections instead of rows in tables. Unlike RDBMS, it doesn't enforce a fixed schema, supports horizontal scaling via sharding out of the box, and models relationships through embedding or referencing rather than joins (though $lookup provides join-like behavior in aggregation).

Q2. What is BSON and how does it differ from JSON?

A: BSON (Binary JSON) is the binary-encoded serialization format MongoDB uses to store documents and communicate over the wire. Unlike text-based JSON, BSON is binary, which makes it faster to parse and traverse, and it adds richer types not present in JSON — Date, ObjectId, Binary, Decimal128, Int32/Int64 — plus length prefixes that speed up field skipping.

// JSON representation
{ "name": "Raj", "age": 30, "joined": "2026-01-01" }

// Equivalent BSON document (conceptually) has typed fields:
{ "name": "Raj", "age": Int32(30), "joined": ISODate("2026-01-01T00:00:00Z") }

Q3. What are the key features of MongoDB?

A: Flexible schema (documents in a collection can differ), rich query language with a powerful aggregation framework, horizontal scaling via automatic sharding, high availability via replica sets, native support for geospatial, text and array indexes, ACID multi-document transactions (since 4.0), and drivers for virtually every language including Java.

Q4. What is a document, collection and database in MongoDB?

A: A document is the basic unit of data — a BSON object similar to a JSON object (equivalent to a row). A collection is a group of documents (equivalent to a table), typically with related data but not required to share a schema. A database holds multiple collections and maps to its own set of files/namespaces on disk.

use jiquestdb
db.createCollection("users")
db.users.insertOne({ name: "Raj", role: "Java Developer" })

Q5. What is the maximum size of a BSON document?

A: The maximum BSON document size is 16 MB. This limit ensures a single document can't hog excessive RAM or bandwidth. If your data set naturally grows larger (e.g., logs, file content), use GridFS, which splits large binary data into chunks stored as smaller documents.

Q6. What are the common BSON data types?

A: String, Int32, Int64, Double, Decimal128 (for precise decimal math), Boolean, Date, Null, ObjectId, Array, Embedded Document (Object), Binary Data, Regular Expression, and Timestamp (internal use, distinct from Date). Decimal128 is important for financial data where float/double rounding errors are unacceptable.

{
  _id: ObjectId("64fa1c2e5b3c9a0012345678"),
  price: NumberDecimal("19.99"),
  inStock: true,
  tags: ["java", "mongodb"],
  createdAt: ISODate("2026-07-03T09:20:00Z")
}

Q7. How does MongoDB compare to other NoSQL database categories?

A: MongoDB is a document store, best for semi-structured, hierarchical data queried by varied fields. Contrast with key-value stores (Redis) optimized for simple lookups, columnar stores (Cassandra) optimized for write-heavy wide-column analytics, and graph databases (Neo4j) optimized for traversing relationships. Choose MongoDB when you need flexible schema plus rich querying and secondary indexes.

Q8. What is mongosh?

A: mongosh is the modern MongoDB Shell — a Node.js-based interactive JavaScript environment for connecting to and administering MongoDB deployments, running CRUD/aggregation commands, and scripting admin tasks. It replaced the legacy mongo shell (deprecated and removed in MongoDB 6.0+).

Q9. What is the difference between MongoDB Community Server and MongoDB Atlas?

A: Community Server is the free, self-managed, open-source distribution you install and operate yourself. MongoDB Atlas is the fully managed cloud DBaaS (available on AWS, Azure, GCP) that automates provisioning, scaling, backups, patching, and monitoring, and adds enterprise features like Atlas Search and Atlas Data Federation. Enterprise Advanced adds on-prem enterprise features like LDAP/Kerberos auth and auditing.

Q10. What is a namespace in MongoDB?

A: A namespace is the fully qualified name of a collection or index, formed as database.collection (e.g., jiquestdb.users). Prior to MongoDB 4.4, namespaces had a 120-byte limit; this was raised to 255 bytes in 4.4+. Namespaces are how the server internally identifies where documents/indexes live on disk.

CRUD Operations & Query Operators

Q11. How do you insert documents into a MongoDB collection?

A: Use insertOne() for a single document or insertMany() for multiple. If _id is omitted, MongoDB auto-generates an ObjectId. insertMany() is ordered by default (stops at first error) unless you pass { ordered: false } to continue inserting remaining documents on failure.

db.users.insertOne({ name: "Asha", age: 28 });
db.users.insertMany([
  { name: "Ravi", age: 32 },
  { name: "Neha", age: 26 }
], { ordered: false });

Q12. How do you query documents using find()?

A: find(filter, projection) returns a cursor over matching documents. An empty filter {} matches all documents. Chain .sort(), .limit(), .skip() for pagination. Use findOne() to get a single document directly instead of a cursor.

db.users.find({ age: { $gte: 25 } }, { name: 1, age: 1, _id: 0 })
         .sort({ age: -1 })
         .limit(10);

Q13. What are the comparison query operators?

A: $eq, $ne, $gt, $gte, $lt, $lte compare a field to a value. $in and $nin match against an array of possible values. These are the building blocks of most filter documents and work with any comparable BSON type following BSON's type-ordering rules.

db.products.find({ price: { $gte: 100, $lte: 500 } });
db.products.find({ category: { $in: ["electronics", "books"] } });

Q14. What are the logical query operators?

A: $and, $or, $nor, $not combine multiple conditions. Implicit AND happens when you list multiple fields in one filter object; explicit $and is needed when combining multiple conditions on the SAME field, or for clarity with complex nested logic.

db.orders.find({
  $or: [ { status: "pending" }, { status: "processing" } ],
  $and: [ { total: { $gt: 0 } }, { total: { $lt: 10000 } } ]
});

Q15. What are element query operators ($exists, $type)?

A: $exists tests whether a field is present (or absent) in the document, regardless of its value — useful for sparse/optional fields in a schema-flexible model. $type filters by BSON type, useful when a field's type is inconsistent across documents (a common real-world data-quality issue).

db.users.find({ middleName: { $exists: false } });
db.users.find({ age: { $type: "int" } });

Q16. What are array query operators ($all, $elemMatch, $size)?

A: $all matches arrays containing ALL specified values (order irrelevant). $elemMatch matches documents where at least one array element satisfies multiple conditions simultaneously (without it, different conditions could match different elements). $size matches arrays of an exact length.

db.students.find({ scores: { $elemMatch: { subject: "Math", value: { $gte: 90 } } } });
db.students.find({ tags: { $all: ["java", "mongodb"] } });
db.students.find({ scores: { $size: 3 } });

Q17. How do you update documents (updateOne, updateMany, replaceOne)?

A: updateOne() modifies the first matching document, updateMany() modifies all matches, both apply update operators like $set. replaceOne() replaces the ENTIRE document (except _id) with a new one — any fields not in the replacement document are removed.

db.users.updateOne({ _id: id }, { $set: { status: "active" } });
db.users.updateMany({ status: "inactive" }, { $set: { archived: true } });
db.users.replaceOne({ _id: id }, { name: "Raj", age: 31 });

Q18. What are common update operators ($set, $unset, $inc, $push, $pull, $addToSet)?

A: $set sets/overwrites a field, $unset removes a field, $inc atomically increments/decrements a number, $push appends to an array (optionally with $each/$sort/$slice), $pull removes matching array elements, $addToSet adds only if not already present (set semantics).

db.products.updateOne({ _id: id }, {
  $inc: { views: 1 },
  $push: { history: { action: "viewed", at: new Date() } },
  $addToSet: { tags: "trending" }
});

Q19. How do you delete documents in MongoDB?

A: deleteOne(filter) removes the first matching document; deleteMany(filter) removes all matches. Passing {} to deleteMany removes every document in the collection but does NOT drop the collection or its indexes — use drop() for that.

db.sessions.deleteMany({ expiresAt: { $lt: new Date() } });

Q20. What is upsert in MongoDB?

A: Upsert ({ upsert: true } option on update operations) inserts a new document based on the filter and update if no document matches, otherwise it updates the existing match. It's ideal for idempotent "create-or-update" logic, avoiding a separate find-then-insert round trip and its race condition.

db.counters.updateOne(
  { _id: "orderId" },
  { $inc: { seq: 1 } },
  { upsert: true }
);

Q21. What is findOneAndUpdate() and when do you use it over updateOne()?

A: findOneAndUpdate() atomically finds a document and updates it in a single operation, returning either the original or the updated document (controlled via returnDocument). Use it when you need the resulting/previous document value atomically — e.g., dequeuing a job, generating a sequence — where a separate find + update would be racy.

Document job = collection.findOneAndUpdate(
  eq("status", "queued"),
  set("status", "processing"),
  new FindOneAndUpdateOptions().returnDocument(ReturnDocument.AFTER)
);

Q22. Why is the legacy save() method deprecated, and what should you use instead?

A: The old save() method (present in earlier shell/driver versions) ambiguously inserted or replaced a whole document based on _id presence, with inconsistent behavior across drivers. Modern MongoDB drivers and shell recommend explicit insertOne/insertMany for creation and updateOne/replaceOne with upsert for create-or-update — it's clearer about intent and safer.

Aggregation Pipeline

Q23. What is the MongoDB aggregation pipeline?

A: The aggregation pipeline is a framework for data processing where documents pass through a sequence of stages ($match, $group, $project, $sort, $lookup, etc.), each transforming the stream of documents. It's MongoDB's equivalent of SQL's GROUP BY/JOIN/HAVING combined, and is far more powerful than simple find() queries for analytics-style reporting.

Q24. What does the $match stage do and why should it appear early?

A: $match filters documents, similar to a WHERE clause, and can use an index just like find(). Placing $match as early as possible in the pipeline reduces the number of documents flowing into later, more expensive stages ($group, $lookup, $sort) — a key aggregation performance practice.

db.orders.aggregate([
  { $match: { status: "completed", createdAt: { $gte: ISODate("2026-01-01") } } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } }
]);

Q25. What does the $group stage do?

A: $group groups documents by an _id expression (like SQL GROUP BY) and computes accumulator values per group — sums, averages, counts, min/max, or arrays of values. Grouping by null aggregates across the entire collection into one result document.

db.sales.aggregate([
  { $group: { _id: "$region", totalSales: { $sum: "$amount" }, avgOrder: { $avg: "$amount" }, count: { $sum: 1 } } }
]);

Q26. What does the $project stage do?

A: $project reshapes documents — including, excluding, renaming, or computing new fields via expressions. Unlike the projection argument of find(), $project supports full aggregation expressions, letting you derive computed fields (e.g., concatenation, arithmetic, date formatting) within the pipeline.

db.users.aggregate([
  { $project: { fullName: { $concat: ["$firstName", " ", "$lastName"] }, ageGroup: { $cond: [{ $gte: ["$age", 18] }, "adult", "minor"] } } }
]);

Q27. What do $sort, $limit and $skip do in a pipeline?

A: $sort orders documents (1 ascending, -1 descending); it can use an index if placed near the start. $limit caps the number of documents passed downstream. $skip discards the first N documents — useful for pagination but inefficient for deep pagination (prefer range-based/cursor pagination for large skip values).

db.products.aggregate([
  { $sort: { price: -1 } },
  { $skip: 20 },
  { $limit: 10 }
]);

Q28. What does $lookup do and how does it emulate a JOIN?

A: $lookup performs a left outer join to another collection in the same database, matching a local field to a foreign field and adding the matched documents as an array field. Since MongoDB 3.6+, a more flexible "uncorrelated" form with a sub-pipeline supports multiple join conditions and additional filtering/aggregation on the joined collection.

db.orders.aggregate([
  { $lookup: { from: "customers", localField: "customerId", foreignField: "_id", as: "customer" } },
  { $unwind: "$customer" }
]);

Q29. What does $unwind do?

A: $unwind deconstructs an array field, outputting one document per array element (all other fields duplicated). It's commonly used before $group to aggregate over array elements, or after $lookup to flatten a joined array into a single embedded object when a 1:1 relationship is expected.

db.orders.aggregate([
  { $unwind: "$items" },
  { $group: { _id: "$items.productId", totalQty: { $sum: "$items.qty" } } }
]);

Q30. What are accumulator operators used with $group?

A: $sum, $avg, $min, $max, $first, $last, $push (collect values into an array), $addToSet (collect distinct values). Since MongoDB 5.0, many accumulators ($sum, $avg, etc.) can also be used inside $project/$set as window-style expressions without a preceding $group.

Q31. What does the $facet stage do?

A: $facet runs multiple independent aggregation sub-pipelines within a single stage against the same input documents, returning all results in one document — perfect for building a search results page that needs both the paginated items AND facet counts (e.g., counts per category) in one round trip.

db.products.aggregate([
  { $match: { category: "electronics" } },
  { $facet: {
      results: [ { $skip: 0 }, { $limit: 10 } ],
      totalCount: [ { $count: "count" } ],
      byBrand: [ { $group: { _id: "$brand", count: { $sum: 1 } } } ]
  }}
]);

Q32. What do $bucket and $bucketAuto do?

A: $bucket groups documents into buckets defined by explicit boundaries on an expression (e.g., price ranges you specify). $bucketAuto automatically determines bucket boundaries to distribute documents evenly across N buckets. Both are useful for building histograms directly in the database.

db.products.aggregate([
  { $bucket: { groupBy: "$price", boundaries: [0, 50, 100, 500, 1000], default: "1000+", output: { count: { $sum: 1 } } } }
]);

Q33. What is the difference between the aggregation pipeline and MapReduce?

A: The aggregation pipeline is declarative, expressed as composable stages, executed natively in C++ and highly optimized (can use indexes). MapReduce (deprecated and removed in MongoDB 8.0/replaced by aggregation) required writing JavaScript map/reduce functions executed in a slower JS engine. Any MapReduce job can be rewritten with $group/$accumulator/$function in the aggregation framework, typically much faster.

Q34. How do you optimize aggregation pipeline performance?

A: Put $match/$sort as early as possible so they can use indexes; project only needed fields early with $project to reduce document size flowing downstream; avoid unnecessary $unwind on huge arrays; use allowDiskUse: true for large sorts/groups exceeding the 100MB memory limit per stage; and inspect the plan with .explain("executionStats") on the aggregate call.

db.orders.aggregate(pipeline, { allowDiskUse: true }).explain("executionStats");

Indexes (Single, Compound, Multikey, Text, TTL)

Q35. What is an index in MongoDB and why is it needed?

A: An index is a separate on-disk data structure (a B-tree by default) that stores a sorted subset of field values with pointers back to documents, allowing the query planner to avoid scanning every document (a "collection scan"). Without a supporting index, queries on large collections perform a full COLLSCAN, which is slow and I/O-heavy.

Q36. What is a single field index?

A: A single field index is built on one field, in ascending (1) or descending (-1) order. Every collection automatically has a single field index on _id. Single field indexes support equality, range, and sort operations on that field efficiently.

db.users.createIndex({ email: 1 }, { unique: true });

Q37. What is a compound index and what is the "index prefix" rule?

A: A compound index covers multiple fields in a specified order, e.g. { status: 1, createdAt: -1 }. It can support queries on the leading field(s) alone or the full combination — this is the "prefix rule": a compound index on (A, B, C) supports queries on (A), (A, B) and (A, B, C), but NOT efficiently on (B) or (C) alone. Field order should match equality fields first, then sort fields, then range fields (ESR rule).

db.orders.createIndex({ customerId: 1, status: 1, createdAt: -1 });
// Supports: {customerId}, {customerId, status}, {customerId, status, createdAt}

Q38. What is a multikey index?

A: A multikey index is automatically created when you index a field that holds an array — MongoDB creates a separate index entry for each array element. This lets you efficiently query for a value contained anywhere in the array. Limitation: you cannot create a compound multikey index with more than one array field indexed simultaneously.

db.products.createIndex({ tags: 1 });
db.products.find({ tags: "java" }); // uses multikey index

Q39. What is a text index and how do you use it for search?

A: A text index supports full-text search across string fields — tokenizing, removing stop words, and stemming (language-aware). A collection can have only ONE text index, though it may span multiple fields. Use $text with $search in a query, and sort by textScore for relevance ranking. For production-grade search, Atlas Search (Lucene-based) is recommended over the basic text index.

db.articles.createIndex({ title: "text", body: "text" });
db.articles.find(
  { $text: { $search: "mongodb indexing" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } });

Q40. What is a TTL (Time-To-Live) index?

A: A TTL index automatically deletes documents after a specified number of seconds past a Date field's value — ideal for session data, logs, verification tokens, or caches that should expire. A background thread runs roughly every 60 seconds to remove expired documents, so exact deletion timing is not guaranteed to the second.

db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 });

Q41. What is a unique index?

A: A unique index enforces that indexed field values are distinct across the collection (like a UNIQUE constraint in SQL). Attempting to insert/update a document that would create a duplicate throws a DuplicateKeyError. For compound unique indexes, the combination of fields must be unique, not each field individually. Multiple documents CAN have a missing indexed field treated as null unless partialFilterExpression or sparse is used to exclude them.

db.users.createIndex({ email: 1 }, { unique: true });

Q42. What is a partial index?

A: A partial index only indexes documents that satisfy a specified filter expression, reducing index size and write overhead for fields that are only queried in a subset of documents (e.g., only "active" orders). It's more flexible than a sparse index because the filter can be any valid query expression, not just field existence.

db.orders.createIndex(
  { customerId: 1 },
  { partialFilterExpression: { status: "active" } }
);

Q43. What is a sparse index?

A: A sparse index only contains entries for documents that HAVE the indexed field (skips documents missing it entirely). Useful for optional fields to keep the index smaller. A sparse+unique index allows multiple documents to omit the field without violating uniqueness (unlike a plain unique index, where multiple missing fields would collide as null). Partial indexes are generally preferred in modern MongoDB as they're more flexible.

db.users.createIndex({ phoneNumber: 1 }, { sparse: true, unique: true });

Q44. What is a hashed index and where is it used?

A: A hashed index stores hashes of a field's value rather than the value itself, giving an even, random distribution of index keys. It supports equality queries efficiently but NOT range queries. Its primary use case is as a hashed shard key, which spreads writes evenly across shards and avoids hotspotting from monotonically increasing keys.

db.events.createIndex({ userId: "hashed" });
sh.shardCollection("appdb.events", { userId: "hashed" });

Q45. How do you use explain() to check whether a query uses an index?

A: Call .explain("executionStats") on a find or aggregate; inspect winningPlan.stage. IXSCAN means an index was used; COLLSCAN means a full collection scan happened (usually bad for large collections). Also check totalDocsExamined vs nReturned — a large ratio indicates the index isn't selective enough.

db.orders.find({ customerId: "C123" }).explain("executionStats");
// winningPlan.stage: "IXSCAN" (good) vs "COLLSCAN" (needs an index)

Q46. What is index intersection in MongoDB?

A: Index intersection lets MongoDB use two or more single-field indexes together to satisfy a query, intersecting the resulting sets of document pointers, instead of requiring one compound index that covers all filter fields. It's a fallback the planner may choose, but a well-designed compound index tailored to the query pattern is usually more efficient than relying on intersection.

Replication & Replica Sets

Q47. What is a replica set in MongoDB?

A: A replica set is a group of mongod instances maintaining the same data set — one primary that accepts all writes, and one or more secondaries that replicate the primary's oplog asynchronously. Replica sets provide high availability (automatic failover) and can offload read traffic to secondaries. Production deployments require a minimum of 3 data-bearing members for resilient elections.

Q48. What is the role of the primary versus secondaries in a replica set?

A: The primary is the only member that accepts write operations by default; it records every write in its oplog (operations log). Secondaries continuously replicate and apply oplog entries to stay in sync, and can serve reads if the client's read preference allows it. If the primary becomes unavailable, an eligible secondary is elected as the new primary.

Q49. How does a replica set election work?

A: When the primary becomes unreachable (heartbeat timeout, default 10s), eligible secondaries hold an election using the Raft-derived consensus protocol. Members vote based on factors including data freshness (highest oplog timestamp preferred) and configured priority. A member needs a majority of votes to become primary; if no majority is reachable (network partition), no primary is elected and the set becomes read-only.

Q50. What is an arbiter in a replica set?

A: An arbiter is a mongod process that participates in elections (votes) but holds no data and cannot become primary. It's used to achieve an odd number of voting members cheaply when adding another full data-bearing node isn't justified. Arbiters are generally discouraged in modern deployments in favor of additional data-bearing nodes, since they can complicate write-concern majority calculations.

Q51. What is the oplog?

A: The oplog (operations log) is a special capped collection (local.oplog.rs) on the primary that records every write operation in idempotent form. Secondaries tail this oplog and apply the same operations to replicate state. Oplog size determines the replication "window" — how far behind a secondary can fall before it needs a full resync.

rs.printReplicationInfo();  // shows oplog size and time window

Q52. What are read preference modes?

A: primary (default, always read from primary), primaryPreferred, secondary, secondaryPreferred, and nearest (lowest network latency member). Reading from secondaries can reduce load on the primary but risks reading stale data due to replication lag unless combined with causal consistency or read concern "majority".

MongoCollection<Document> coll = db.getCollection("reports")
    .withReadPreference(ReadPreference.secondaryPreferred());

Q53. What are write concern levels?

A: Write concern controls how many replica set members must acknowledge a write before it's considered successful: w: 1 (primary only), w: "majority" (majority of voting members — recommended for durability), w: 0 (fire-and-forget, no acknowledgment). A j: true option additionally requires the write to be journaled to disk.

collection.withWriteConcern(WriteConcern.MAJORITY).insertOne(doc);

Q54. What is read concern in MongoDB?

A: Read concern controls the consistency/isolation guarantee of data returned by a read: local (default, may return data that could later be rolled back), available, majority (only returns data acknowledged by a majority — won't be rolled back), linearizable, and snapshot (used with transactions for a consistent point-in-time view).

Q55. What happens during failover in a replica set?

A: When the primary goes down, remaining members detect the loss of heartbeats, trigger an election, and promote a new primary (usually within seconds). Any in-flight writes not yet replicated to a majority may be rolled back once the old primary rejoins as a secondary. Drivers with retryable writes automatically retry the write against the new primary, minimizing application-visible errors.

Q56. What is replication lag and how do you monitor it?

A: Replication lag is the delay between a write being applied on the primary and being applied on a secondary. High lag risks stale reads from secondaries and a larger data-loss window on failover. Monitor with rs.printSecondaryReplicationInfo(), the replSetGetStatus command's optimeDate per member, or Atlas/Ops Manager replication lag charts and alerts.

rs.printSecondaryReplicationInfo();

Sharding

Q57. What is sharding in MongoDB and why is it needed?

A: Sharding is MongoDB's method of horizontal scaling — distributing a collection's data across multiple servers (shards) so no single server needs to hold the entire data set or handle the entire query/write load. It's needed when data volume or throughput exceeds what a single replica set (vertical scaling) can handle economically.

Q58. What is a shard key and how do you choose a good one?

A: The shard key is the field (or fields) used to partition a collection's documents across shards; it is chosen at shard time and is largely immutable afterward. A good shard key has high cardinality (many distinct values), even distribution of both data and query load, and ideally matches your common query patterns to enable targeted queries. Poor choices (low cardinality, monotonically increasing) cause hotspotting and unbalanced chunks.

sh.shardCollection("appdb.orders", { customerId: "hashed" });

Q59. What is the role of mongos?

A: mongos is the query router that sits between client applications and the sharded cluster. It has no persistent data of its own; it caches cluster metadata from the config servers to route each query/write to the correct shard(s), and merges results from multiple shards when needed (e.g., for a sorted scatter-gather query).

Q60. What are config servers?

A: Config servers store the sharded cluster's metadata — which chunks live on which shards, cluster settings, and authentication data. Since MongoDB 3.4+, config servers must themselves be deployed as a replica set (CSRS) for high availability. If config servers are unavailable, the cluster metadata can't be updated (existing routing may still work from mongos cache, but chunk migrations and DDL fail).

Q61. What is a chunk and how does chunk splitting work?

A: A chunk is a contiguous range of shard key values, the basic unit of data movement in a sharded cluster. As a chunk grows beyond a configured size threshold (historically 64MB, now managed automatically in MongoDB 6.0+ via the "auto-splitter" merged into the balancer), MongoDB splits it into two smaller chunks to keep migrations efficient.

Q62. What is the balancer in a sharded cluster?

A: The balancer is a background process (running on the primary config server since 4.4+) that migrates chunks between shards to keep the number of chunks roughly even, preventing any one shard from becoming a hotspot. Migrations happen in the background and can be scheduled to specific time windows to avoid impacting peak traffic.

sh.status();               // view shard distribution
sh.setBalancerState(true); // enable balancer

Q63. What is the difference between ranged sharding and hashed sharding?

A: Ranged sharding partitions data by contiguous ranges of shard key values, which is efficient for range queries on the shard key but risks hotspotting if keys are monotonically increasing (e.g., timestamps, auto-increment IDs) — all new writes land on one shard. Hashed sharding indexes a hash of the key, distributing writes evenly, but range queries on the original field become scatter-gather across all shards.

Q64. What is zone sharding (tag-aware sharding)?

A: Zone sharding lets you associate ranges of shard key values with specific shards (zones) — for example, pinning documents for European customers to shards physically located in the EU for data residency/GDPR compliance, or isolating hot/recent data on faster hardware. It's configured with sh.addShardToZone() and sh.updateZoneKeyRange().

Q65. What are jumbo chunks and hotspotting, and how do you avoid them?

A: A jumbo chunk is a chunk that exceeds the size threshold but cannot be split further (usually because too many documents share the exact same shard key value) or cannot be migrated — this creates an imbalance the balancer can't fix. Hotspotting happens when a shard key causes disproportionate traffic on one shard (e.g., a monotonically increasing key sends all recent writes to the last chunk/shard). Avoid both by choosing a high-cardinality, well-distributed compound shard key, potentially hashed.

Q66. How do you shard an existing (already populated) collection?

A: Enable sharding on the database with sh.enableSharding("dbname"), ensure a supporting index exists on the intended shard key, then run sh.shardCollection("dbname.collection", { key: 1 }). MongoDB then distributes existing data into chunks across shards. Since the shard key is difficult to change later, carefully validate the choice against production query patterns before sharding (MongoDB 5.0+ allows limited shard key refinement).

sh.enableSharding("appdb");
db.orders.createIndex({ customerId: 1 });
sh.shardCollection("appdb.orders", { customerId: 1 });

Transactions & Consistency

Q67. What are multi-document transactions and when were they introduced?

A: Multi-document ACID transactions let you group multiple read/write operations across one or more documents (and since 4.2, across shards) into a single atomic unit — all succeed or all roll back. Introduced in MongoDB 4.0 for replica sets and extended to sharded clusters in 4.2. Before this, atomicity was guaranteed only at the single-document level.

Q68. How do you start a session and transaction using the Java driver?

A: Obtain a ClientSession from the MongoClient, call startTransaction(), perform operations passing the session as an argument, then commitTransaction() or abortTransaction() on error. Always wrap in try/catch and use try-with-resources for the session to ensure cleanup.

try (ClientSession session = mongoClient.startSession()) {
    session.startTransaction();
    try {
        accounts.updateOne(session, eq("_id", "A"), inc("balance", -100));
        accounts.updateOne(session, eq("_id", "B"), inc("balance", 100));
        session.commitTransaction();
    } catch (Exception e) {
        session.abortTransaction();
        throw e;
    }
}

Q69. What is causal consistency in MongoDB?

A: Causal consistency guarantees that within a client session, operations are observed in a cause-and-effect order — a read after a write in the same session will always reflect that write, even if routed to a different secondary. It's enabled by default within client sessions and is a foundation used by multi-document transactions and session-based read concerns.

Q70. How does MongoDB's ACID model differ from a traditional RDBMS?

A: MongoDB has always provided ACID guarantees at the single-document level, even for documents with nested arrays and sub-documents (which often eliminates the need for cross-document transactions if you model well). Since 4.0, it extends full ACID to multi-document/multi-collection/multi-shard transactions, functionally similar to RDBMS transactions, though best practice is still to keep transactions short-lived and minimize cross-shard transactions for performance.

Q71. What are the limitations of transactions in MongoDB?

A: Transactions have a default 60-second runtime limit, add latency/overhead versus single-document writes, cannot be used to create collections/indexes implicitly (collections must pre-exist within a transaction in most versions), and are more expensive on sharded clusters where multiple shards must coordinate via two-phase commit internally. Overuse of transactions instead of good embedding-based schema design is a common anti-pattern.

Q72. What is the two-phase commit application pattern used before MongoDB 4.0?

A: Before native transactions existed, developers simulated multi-document atomicity with a manual two-phase commit pattern: a coordinator document tracks a transaction's "pending" state, each affected document records the transaction ID and a pending change, then a second pass confirms/rolls back based on the coordinator's final state. This pattern is largely obsolete now that native transactions exist, but still appears in interviews as a design pattern question.

Q73. How does write concern "majority" relate to transaction durability?

A: MongoDB transactions use write concern "majority" by default for the commit — the transaction is only considered committed once a majority of replica set members acknowledge it, ensuring the change survives a primary failover. Using a weaker write concern for transactions is not recommended as it undermines the durability guarantee transactions are meant to provide.

Q74. What is read concern "snapshot" and how is it used with transactions?

A: Read concern "snapshot" gives a transaction a consistent point-in-time view of data across all its reads, as if the data doesn't change during the transaction (similar to SQL's SERIALIZABLE/SNAPSHOT isolation). It's typically paired with write concern "majority" for transactions to ensure both consistent reads and durable commits.

TransactionOptions txnOptions = TransactionOptions.builder()
    .readConcern(ReadConcern.SNAPSHOT)
    .writeConcern(WriteConcern.MAJORITY)
    .build();
session.startTransaction(txnOptions);

Schema Design & Data Modeling Patterns

Q75. Is MongoDB "schema-less"? How should you think about schema design?

A: MongoDB is schema-flexible, not schema-free — documents in a collection CAN have different shapes, but a well-designed application still enforces a consistent, intentional schema at the application or validation layer. Unlike RDBMS normalization-first design, MongoDB schema design starts from your application's query patterns ("design for your queries") and then decides how to structure documents to serve those queries efficiently.

Q76. When should you embed data versus reference it in another collection?

A: Embed when data is accessed together, has a "contains" relationship, doesn't grow unboundedly, and doesn't need independent querying (e.g., an order's line items). Reference (store an ObjectId and use $lookup or separate queries) when data is large, shared across many parent documents, changes independently, grows without bound, or would push a document toward the 16MB limit.

// Embedding (good: 1-to-few, accessed together)
{ _id: 1, name: "Order#1", items: [{ sku: "A1", qty: 2 }, { sku: "B2", qty: 1 }] }

// Referencing (good: 1-to-many/unbounded, independent access)
{ _id: 1, name: "Order#1", customerId: ObjectId("...") }

Q77. How do you model a one-to-many relationship in MongoDB?

A: For "one-to-few" (a blog post with a handful of comments), embed an array of sub-documents directly. For "one-to-many" or "one-to-squillions" (a customer with millions of log events), store a reference to the parent's _id on each child document instead — the reverse of embedding — and query children by that foreign key with a supporting index.

// One-to-squillions: child references parent
db.events.insertOne({ customerId: ObjectId("..."), type: "login", at: new Date() });
db.events.createIndex({ customerId: 1, at: -1 });

Q78. How do you model a many-to-many relationship in MongoDB?

A: Store an array of referenced IDs on one or both sides (e.g., a Student document with a courseIds array, and/or a Course document with a studentIds array), then use $lookup or application-level batch fetching to join. For very large many-to-many relationships, a dedicated "join collection" (similar to a relational join table) with compound indexes may scale better than large arrays on either side.

Q79. What is denormalization in MongoDB and what tradeoffs does it involve?

A: Denormalization duplicates data (e.g., embedding a customer's name inside every order document) to avoid extra lookups/joins at read time, trading storage space and update complexity (must update every copy) for faster, simpler reads. It's appropriate for data that rarely changes (or where slight staleness is acceptable) and is read far more often than it's written.

Q80. What is the "subset" pattern?

A: The subset pattern embeds only the most frequently accessed subset of a large related array (e.g., the 10 most recent reviews on a product document) while storing the full data set in a separate collection. This keeps the main document small and fast to load for the common case, while allowing a follow-up query for the rarely needed full list.

Q81. What is the "bucket" pattern and where is it used?

A: The bucket pattern groups many small, related time-series-like readings (e.g., one sensor's readings for an hour) into a single "bucket" document with an array of measurements, rather than one document per reading. This reduces index size and document count dramatically. MongoDB 5.0+ introduced native Time Series collections that implement this bucketing automatically and transparently.

db.createCollection("sensorReadings", {
  timeseries: { timeField: "timestamp", metaField: "sensorId", granularity: "minutes" }
});

Q82. What is the "extended reference" pattern?

A: The extended reference pattern embeds only the frequently accessed fields of a referenced document (e.g., a customer's name and shipping city inside an order) alongside the full reference (customerId), instead of embedding the entire referenced document or doing a full $lookup for every read. It reduces joins for common access patterns while keeping the source of truth in one place.

Q83. What is schema validation with $jsonSchema?

A: MongoDB supports optional document validation rules attached to a collection using the $jsonSchema operator (JSON Schema standard), enforcing required fields, types, and constraints on insert/update. This gives you RDBMS-like guardrails while keeping flexibility — you can set validationLevel: "moderate" to only validate new/modified documents, not pre-existing ones.

db.createCollection("users", {
  validator: { $jsonSchema: {
    bsonType: "object",
    required: ["email", "age"],
    properties: {
      email: { bsonType: "string", pattern: "^.+@.+$" },
      age: { bsonType: "int", minimum: 0 }
    }
  }}
});

Q84. What are common MongoDB schema anti-patterns?

A: Unbounded array growth (embedding a growing list forever, eventually hitting 16MB or degrading write performance), massive numbers of collections (one per user/tenant, exhausting the WiredTiger file-handle/cache overhead), overusing $lookup joins as if MongoDB were relational, bloated documents with rarely used fields loaded on every read, and case-sensitivity/naming inconsistency across documents. MongoDB's official Schema Design Anti-Patterns guide lists these as the top issues found in production clusters.

Performance & Profiling

Q85. How do you use explain() to analyze and improve query performance?

A: db.collection.find(filter).explain("executionStats") reveals the winning plan, whether an index was used (IXSCAN vs COLLSCAN), totalDocsExamined, totalKeysExamined, and execution time in milliseconds. A healthy query should have totalDocsExamined close to nReturned; a large gap suggests you need a better (more selective) index.

db.orders.find({ status: "shipped" }).explain("executionStats");

Q86. What is the database profiler and how do you enable it?

A: The profiler records detailed information about database operations (queries, writes, commands) into the system.profile capped collection, useful for finding slow operations in production. Profiling level 0 is off, 1 logs only slow operations (above slowms, default 100ms), 2 logs everything (heavy overhead, use only for short debugging windows).

db.setProfilingLevel(1, { slowms: 50 });
db.system.profile.find().sort({ ts: -1 }).limit(5);

Q87. What is a covered query?

A: A covered query is one where all the fields in the query filter AND all fields returned in the projection are present in the same index — so MongoDB can satisfy the entire query from the index alone, without fetching the actual documents. This is significantly faster, especially for large documents. Exclude _id in the projection (unless it's part of the index) since it's returned by default.

db.users.createIndex({ email: 1, name: 1 });
db.users.find({ email: "raj@example.com" }, { email: 1, name: 1, _id: 0 }); // covered

Q88. What is the WiredTiger cache and how does the "working set" affect performance?

A: WiredTiger maintains an in-memory cache (default: 50% of RAM minus 1GB, or 256MB, whichever is larger) holding frequently accessed data and indexes. The "working set" is the subset of data and indexes actively used by your application's queries. If the working set fits in the cache, reads are fast (in-memory); if it exceeds cache size, MongoDB must read from disk far more often, causing significant slowdowns.

Q89. What are common causes of slow queries in MongoDB?

A: Missing or wrong indexes (leading to COLLSCAN), unselective indexes examined but rarely matching (high docsExamined:nReturned ratio), large sort/group operations without index support (spilling to disk), unbounded array growth causing large document reads, inefficient $lookup joins across large collections without indexes on the foreign field, and a working set larger than available RAM causing disk thrashing.

Q90. What is connection pooling in the MongoDB Java driver?

A: The Java driver's MongoClient maintains an internal connection pool per server, reusing TCP connections instead of opening a new one per operation. Configure pool size via MongoClientSettings.applyToConnectionPoolSettings() (maxSize, minSize). A single MongoClient instance should be created once and shared/injected across your application (e.g., as a Spring bean) rather than recreated per request.

MongoClientSettings settings = MongoClientSettings.builder()
    .applyToConnectionPoolSettings(b -> b.maxSize(100).minSize(10))
    .applyConnectionString(new ConnectionString(uri))
    .build();
MongoClient client = MongoClients.create(settings);

Q91. What tools are used to monitor MongoDB performance?

A: mongostat gives a live per-second summary of ops, memory, and connections across the cluster. mongotop shows time spent reading/writing per collection. db.currentOp() lists in-progress operations (useful to spot long-running queries to kill). For production, MongoDB Atlas / Ops Manager dashboards, Prometheus exporters, and the profiler give richer historical visibility.

mongostat --host localhost:27017
mongotop --host localhost:27017

Q92. How do large documents or arrays impact performance?

A: Large documents consume more I/O and network bandwidth per read/write even when only a few fields are needed (unless the query is covered or uses a projection). Growing arrays cause WiredTiger to rewrite the entire document on each update (documents aren't updated in-place at the storage layer beyond certain optimizations), increasing write amplification and fragmentation. Keep documents reasonably sized and prefer the subset/bucket patterns for unbounded growth.

Spring Data MongoDB & Java Driver Integration

Q93. How do you connect to MongoDB using the native Java driver?

A: Use MongoClients.create(connectionString) to get a thread-safe MongoClient (create once, reuse for the app's lifetime), then obtain a MongoDatabase and MongoCollection<Document> from it. The driver handles connection pooling, server discovery/monitoring, and automatic failover to the current primary internally.

MongoClient mongoClient = MongoClients.create("mongodb://localhost:27017");
MongoDatabase db = mongoClient.getDatabase("jiquestdb");
MongoCollection<Document> users = db.getCollection("users");
users.insertOne(new Document("name", "Raj").append("age", 30));

Q94. What is Spring Data MongoDB and what does MongoRepository provide?

A: Spring Data MongoDB is Spring's abstraction over the MongoDB Java driver, providing object-document mapping (ODM), template-based operations, and repository interfaces. MongoRepository<T, ID> extends CrudRepository and PagingAndSortingRepository, giving you save/find/delete/paginate methods for free, plus derived query methods generated from method names.

public interface UserRepository extends MongoRepository<User, String> {
    List<User> findByStatusAndAgeGreaterThan(String status, int age);
    Optional<User> findByEmail(String email);
}

Q95. How do you map a Java class to a MongoDB document using annotations?

A: @Document(collection = "...") marks a class as a MongoDB entity, @Id marks the primary key field (mapped to _id), and @Field("name") customizes the stored field name. @Indexed and @CompoundIndex declare indexes that Spring Data can auto-create on startup.

@Document(collection = "users")
public class User {
    @Id
    private String id;

    @Indexed(unique = true)
    @Field("email_address")
    private String email;

    private int age;
    // getters/setters
}

Q96. How do you write custom queries using @Query and MongoTemplate?

A: @Query("{ 'field': ?0 }") on a repository method lets you write raw MongoDB-style JSON queries with positional parameter placeholders when derived query methods aren't expressive enough. MongoTemplate provides a fully programmatic, type-safe API (Query/Criteria builders) for complex dynamic queries, updates, and aggregations that can't be expressed declaratively.

@Query("{ 'status': ?0, 'age': { $gte: ?1 } }")
List<User> findActiveUsersOlderThan(String status, int age);

// MongoTemplate equivalent
Query query = new Query(Criteria.where("status").is("active").and("age").gte(18));
List<User> users = mongoTemplate.find(query, User.class);

Q97. How do you perform aggregation with Spring Data MongoDB?

A: Build an Aggregation object by chaining AggregationOperation stages (Aggregation.match(), .group(), .sort(), .lookup(), etc.) and execute it via mongoTemplate.aggregate(), mapping results to a target class. This gives you a type-safe, fluent Java API instead of hand-writing raw pipeline JSON.

Aggregation agg = Aggregation.newAggregation(
    Aggregation.match(Criteria.where("status").is("completed")),
    Aggregation.group("customerId").sum("amount").as("total")
);
AggregationResults<Document> results = mongoTemplate.aggregate(agg, "orders", Document.class);

Q98. How do you use multi-document transactions in Spring Data MongoDB?

A: Enable transaction management by registering a MongoTransactionManager bean, then annotate a service method with @Transactional — Spring binds a ClientSession to the thread and commits/rolls back automatically around the method boundary. The target MongoDB deployment must be a replica set or sharded cluster; standalone instances don't support transactions.

@Bean
MongoTransactionManager transactionManager(MongoDatabaseFactory dbFactory) {
    return new MongoTransactionManager(dbFactory);
}

@Transactional
public void transferFunds(String fromId, String toId, BigDecimal amount) {
    accountRepo.debit(fromId, amount);
    accountRepo.credit(toId, amount);
}

Q99. What is the difference between MongoRepository and MongoTemplate — when do you use each?

A: MongoRepository offers declarative, low-boilerplate CRUD and derived queries — ideal for straightforward, predictable access patterns. MongoTemplate offers full programmatic control for dynamic queries built at runtime, complex aggregations, bulk operations, and advanced options (read preference, collation) not easily expressed as a repository method name. Many real projects use both: repositories for simple cases, MongoTemplate injected for complex ones.

Q100. What are best practices for using MongoDB with Spring Boot in production?

A: Reuse a single MongoClient/connection pool via Spring's auto-configuration rather than creating new clients; define indexes explicitly (via @Indexed or migration scripts) rather than relying on auto-index-creation in production; use DTOs/projections to avoid over-fetching large documents; enable retryable writes/reads; set sensible connectTimeoutMS/socketTimeoutMS; use write concern "majority" for critical writes; and monitor via Spring Boot Actuator's MongoDB health indicator combined with Atlas/Ops Manager metrics.

spring.data.mongodb.uri=mongodb+srv://user:pass@cluster0.mongodb.net/jiquestdb?retryWrites=true&w=majority
management.health.mongo.enabled=true