• Follow Us On :
MongoDB Interview Questions

MongoDB Interview Questions: Complete Guide for 2026

Introduction to MongoDB Interview Preparation

MongoDB has become one of the most sought-after skills in the database and DevOps landscape, with companies across industries seeking professionals who can design, implement, and maintain NoSQL database solutions. As organizations increasingly adopt document-oriented databases for their scalability, flexibility, and performance advantages, demonstrating MongoDB expertise in interviews has become crucial for database administrators, backend developers, full-stack engineers, and DevOps professionals pursuing rewarding career opportunities.

MongoDB interviews assess multiple dimensions of knowledge including fundamental NoSQL concepts, document modeling strategies, query optimization techniques, replication and sharding architectures, security implementations, and performance tuning practices. Interviewers evaluate not just theoretical knowledge but practical problem-solving abilities, architectural decision-making skills, and understanding of real-world MongoDB deployment scenarios. Strong candidates demonstrate hands-on experience through specific examples, articulate trade-offs between different approaches, and show awareness of MongoDB best practices.

This comprehensive guide compiles essential MongoDB interview questions spanning basic concepts through advanced scenarios, organized by difficulty level and topic area. Each question includes detailed explanations, practical examples, code snippets where applicable, and insights into what interviewers are assessing. Whether you’re preparing for your first database role or targeting senior architect positions, this resource provides the knowledge foundation and confidence needed to excel in MongoDB technical interviews.

Basic MongoDB Interview Questions

What is MongoDB and what are its key features?

MongoDB is an open-source, document-oriented NoSQL database designed for scalability, high performance, and developer productivity. Unlike traditional relational databases with rigid table structures, MongoDB stores data in flexible JSON-like documents (BSON format) enabling schema evolution and natural data representation aligned with modern programming paradigms.

Key Features:

Document-Oriented Storage: MongoDB stores data in BSON (Binary JSON) documents containing field-value pairs. Documents can have nested structures, arrays, and varying schemas within the same collection. This flexibility eliminates complex object-relational mapping and enables rapid development.

Schema Flexibility: Collections don’t enforce uniform document structure. Documents can have different fields, data types, and nested structures. This dynamic schema supports agile development where requirements evolve without database migration overhead.

High Performance: MongoDB provides high read/write throughput through efficient indexing, document-level locking, and memory-mapped files. The aggregation framework enables complex data processing operations within the database rather than in application code.

Horizontal Scalability: Native sharding distributes data across multiple servers enabling horizontal scaling. As data and traffic grow, organizations add more servers to the cluster rather than upgrading to larger single servers.

High Availability: Replica sets provide automatic failover and data redundancy. If primary node fails, replica sets automatically elect new primary, typically within seconds, ensuring continuous availability.

Rich Query Language: MongoDB Query Language supports filtering, sorting, projections, aggregations, and geospatial queries. The expressive query syntax enables complex data retrieval without SQL limitations.

Indexing: Comprehensive indexing support including single-field, compound, multikey (array), text, geospatial, and hash indexes dramatically improve query performance.

Aggregation Framework: Powerful pipeline-based aggregation enables complex data processing, transformations, and analytics within the database.

GridFS: Specification for storing files exceeding 16MB document size limit by dividing files into chunks.

Interviewers ask this foundational question to assess your understanding of MongoDB’s fundamental value proposition and whether you can articulate why organizations choose MongoDB over relational alternatives.

Explain the difference between SQL and NoSQL databases.

Understanding the distinction between SQL (relational) and NoSQL databases is fundamental for making appropriate technology choices.

SQL (Relational) Databases:

Structure: Data organized in tables with rows and columns. Rigid schema requires predefined structure before inserting data. Tables are linked through foreign key relationships.

Schema: Fixed schema requires ALTER TABLE operations for modifications. All rows in a table must conform to defined structure.

Scalability: Primarily vertical scaling (bigger servers). Horizontal scaling (sharding) possible but complex and not native to most relational databases.

Transactions: ACID (Atomicity, Consistency, Isolation, Durability) transactions ensure data integrity across multiple tables. Strong consistency guarantees.

Query Language: Structured Query Language (SQL) standardized across vendors with minor variations. Mature, powerful, with decades of optimization.

Use Cases: Complex transactions, multi-entity relationships, reporting and analytics requiring complex JOINs, applications with stable, well-defined schemas.

Examples: MySQL, PostgreSQL, Oracle, SQL Server

NoSQL Databases:

Structure: Various models including document (MongoDB), key-value (Redis), column-family (Cassandra), graph (Neo4j). Flexible schemas accommodate varying structures.

Schema: Dynamic or schema-less. Documents can have different structures in the same collection. Schema evolves naturally with application requirements.

Scalability: Designed for horizontal scaling. Native sharding and partitioning distribute data across commodity servers.

Transactions: Historically focused on eventual consistency, though modern NoSQL databases (including MongoDB) support multi-document ACID transactions. Often trade consistency for availability and partition tolerance (CAP theorem).

Query Language: Database-specific query languages or APIs. MongoDB uses rich query language, while others use different approaches.

Use Cases: Rapid development with evolving schemas, hierarchical data, high-volume data requiring horizontal scaling, real-time applications, content management, IoT data.

Examples: MongoDB (document), Cassandra (column-family), Redis (key-value), Neo4j (graph)

Key Differences Summary:

Aspect SQL NoSQL
Schema Fixed, predefined Flexible, dynamic
Scaling Vertical (primarily) Horizontal (native)
Data Model Tables, rows, columns Documents, key-values, etc.
Relationships JOINs, foreign keys Embedding or referencing
Transactions Strong ACID guarantees Eventual consistency (mostly)
Query SQL (standardized) Database-specific

This question assesses whether you understand fundamental database paradigms and can explain trade-offs informing technology selection decisions.

What is a document in MongoDB?

A document is MongoDB’s fundamental data unit, analogous to a row in relational databases but with significant differences in structure and flexibility.

Document Structure:

MongoDB documents are BSON (Binary JSON) objects containing field-value pairs:

json
{
  "_id": ObjectId("507f1f77bcf86cd799439011"),
  "firstName": "Alice",
  "lastName": "Johnson",
  "email": "alice.johnson@example.com",
  "age": 28,
  "address": {
    "street": "123 Main Street",
    "city": "San Francisco",
    "state": "CA",
    "zipCode": "94102"
  },
  "interests": ["photography", "hiking", "cooking"],
  "registeredDate": ISODate("2024-01-15T09:30:00Z"),
  "isActive": true
}

Key Characteristics:

Field-Value Pairs: Documents consist of fields (keys) and their associated values. Field names are strings, while values can be various data types.

Nested Documents: Documents can contain embedded sub-documents (like the address object above), enabling hierarchical data representation without JOIN operations.

Arrays: Fields can contain arrays of values or arrays of documents, supporting one-to-many relationships within single documents.

Rich Data Types: BSON supports types including strings, integers, doubles, decimals, booleans, dates, timestamps, ObjectIds, arrays, embedded documents, binary data, and more.

_id Field: Every document requires unique _id field serving as primary key. If not provided during insertion, MongoDB automatically generates ObjectId.

Maximum Size: Documents are limited to 16MB maximum size preventing excessive memory usage. Large files should use GridFS.

Schema Flexibility: Documents in the same collection can have completely different fields and structures, though best practices suggest documents within collections should serve similar purposes.

Advantages Over Relational Rows:

No Impedance Mismatch: Document structure naturally maps to objects in programming languages (JavaScript objects, Python dictionaries, Java POJOs), eliminating complex ORM configurations.

Denormalization Benefits: Related data can be embedded within single documents, retrievable with single query rather than multiple JOINs.

Natural Representation: Complex data structures like hierarchies, trees, and graphs represented more naturally than in relational tables.

Atomic Operations: Updates to single documents are atomic, ensuring consistency for embedded data without distributed transactions.

This question tests your understanding of MongoDB’s core data structure and ability to contrast it with relational database concepts.

What is BSON and how does it differ from JSON?

BSON (Binary JSON) is MongoDB’s binary-encoded serialization format for storing documents and making remote procedure calls. While conceptually similar to JSON, BSON provides additional capabilities essential for database functionality.

BSON Characteristics:

Binary Format: BSON encodes documents in binary rather than text, providing:

  • Efficiency: More compact storage for most data types
  • Performance: Faster parsing and traversal
  • Space Optimization: Reduced storage requirements and network transmission overhead

Extended Data Types: BSON supports types unavailable in standard JSON:

javascript
// BSON-specific types
{
  "date": ISODate("2024-01-15T10:30:00Z"),        // Native date type
  "objectId": ObjectId("507f1f77bcf86cd799439011"), // Unique identifier
  "decimal": NumberDecimal("99.99"),               // Precise decimal
  "binary": BinData(0, "base64encodeddata"),      // Binary data
  "timestamp": Timestamp(1424023920, 1),          // Internal timestamp
  "regex": /pattern/i,                            // Regular expression
  "int32": NumberInt(32),                         // 32-bit integer
  "int64": NumberLong(64),                        // 64-bit integer
  "minKey": MinKey(),                             // Minimum value
  "maxKey": MaxKey()                              // Maximum value
}

Traversability: BSON documents encode length information enabling efficient document and field navigation without parsing entire documents. This improves query performance when accessing specific fields.

Ordered Fields: BSON preserves field order within documents (though application code shouldn’t depend on this).

JSON vs BSON Comparison:

Feature JSON BSON
Format Text-based Binary
Parsing Speed Slower Faster
Size Larger for most data More compact
Data Types Limited (string, number, boolean, array, object, null) Extended (Date, ObjectId, Binary, Decimal128, etc.)
Human Readable Yes No (binary)
Use Case Data interchange, APIs Database storage, internal operations

Why BSON Matters:

Database Operations: BSON’s binary nature and extended types enable efficient database operations including indexing, querying, and storage.

Type Preservation: BSON maintains type information (e.g., 32-bit vs 64-bit integers, Date vs string) ensuring accurate data representation and operations.

Performance: Binary format enables faster encoding/decoding compared to parsing text-based JSON, improving overall database performance.

ObjectId Structure: ObjectId, a BSON-specific type, provides unique identifiers containing:

  • 4-byte timestamp (creation time)
  • 5-byte random value (machine/process)
  • 3-byte counter (incrementing)

This structure ensures uniqueness across distributed systems without coordination while embedding creation timestamp.

Practical Impact:

When working with MongoDB, you interact with JSON in application code (MongoDB drivers automatically convert between JSON and BSON), but understanding BSON helps appreciate MongoDB’s performance characteristics and data type handling.

Interviewers ask this to assess your understanding of MongoDB’s underlying data representation and how it impacts performance and functionality.

Explain collections and databases in MongoDB.

Collections and databases provide organizational structure for MongoDB data, similar to tables and databases in relational systems but with important differences.

Databases:

A database is a container for collections, providing namespace separation and organizational grouping. MongoDB servers can host multiple databases, each with independent:

  • Collections
  • Permissions and access controls
  • File storage on disk
  • Configuration settings

Creating and Using Databases:

javascript
// Switch to or create database (created when first data inserted)
use ecommerce

// Show current database
db

// List all databases
show dbs

// Drop database
db.dropDatabase()

Databases are created implicitly when you first insert data. Empty databases don’t appear in database listings.

Database Naming Conventions:

  • Lowercase recommended
  • Alphanumeric characters
  • No special characters except underscore
  • Maximum 64 characters
  • Case-sensitive on Unix/Linux (not on Windows)

Collections:

Collections are groups of MongoDB documents, analogous to tables in relational databases but without enforced schema uniformity. Collections provide:

Document Grouping: Logical organization of related documents (e.g., users, products, orders)

Namespace: Collections exist within databases, referenced as database.collection (e.g., ecommerce.products)

Indexing Boundary: Indexes are defined at collection level

Query Scope: Queries operate on specific collections

Schema Flexibility: Unlike tables, collections don’t enforce uniform structure across documents

Creating and Managing Collections:

javascript
// Explicitly create collection
db.createCollection("products")

// Create with options
db.createCollection("logs", {
  capped: true,           // Fixed-size collection
  size: 5242880,         // 5MB max size
  max: 5000              // Maximum 5000 documents
})

// Implicitly created on first insert
db.customers.insertOne({ name: "Alice", email: "alice@example.com" })

// List collections
show collections

// Drop collection
db.products.drop()

Collection Types:

Standard Collections: Regular collections with dynamic size and no special constraints.

Capped Collections: Fixed-size collections maintaining insertion order. Once size limit reached, oldest documents are automatically overwritten. Useful for logs or caching.

javascript
db.createCollection("auditLog", {
  capped: true,
  size: 10485760,  // 10MB
  max: 10000       // Maximum documents
})

Time Series Collections: Optimized for time-stamped data like sensor readings, stock prices, or metrics:

javascript
db.createCollection("weather", {
  timeseries: {
    timeField: "timestamp",
    metaField: "location",
    granularity: "hours"
  }
})

Collection Naming Best Practices:

  • Use plural nouns (users, products, orders)
  • Lowercase with underscores for multi-word (user_profiles)
  • Descriptive names indicating contents
  • Avoid starting with system. (reserved for internal collections)

Namespaces:

Full collection reference uses dot notation: database.collection

Example: ecommerce.products refers to products collection in ecommerce database

Understanding namespaces is important for:

  • Database operations and commands
  • Backup and restore procedures
  • Access control configuration
  • Monitoring and logging

System Collections:

MongoDB maintains special system collections prefixed with system.:

  • system.indexes: Index information
  • system.users: User authentication data
  • system.profile: Profiling data
  • system.namespaces: Namespace information

This question assesses your understanding of MongoDB’s organizational hierarchy and how it differs from relational database structure.

Intermediate MongoDB Interview Questions

What are the different types of indexes in MongoDB?

Indexes are crucial for query performance in MongoDB. Understanding various index types and their use cases is essential for production deployments.

Single Field Index:

Indexes a single field in ascending (1) or descending (-1) order:

javascript
// Create ascending index on email field
db.users.createIndex({ email: 1 })

// Create descending index on age
db.users.createIndex({ age: -1 })

Single field indexes support queries filtering or sorting on that field. Direction matters only for sorting or compound indexes.

Also Read: MongoDB Tutorial

Compound Index:

Indexes multiple fields together, supporting queries on field combinations:

javascript
// Compound index on category and price
db.products.createIndex({ category: 1, price: -1 })

Compound indexes support queries on:

  • All indexed fields combined
  • Left-to-right prefixes (category alone, but NOT price alone)

Index Prefix Rule: For compound index on {a:1, b:1, c:1}, these queries use the index:

  • {a:1}
  • {a:1, b:1}
  • {a:1, b:1, c:1}

But NOT: {b:1} or {c:1} or {b:1, c:1}

Multikey Index:

Automatically created when indexing fields containing arrays. Each array element is indexed:

javascript
// Index tags array
db.articles.createIndex({ tags: 1 })

// Query using multikey index
db.articles.find({ tags: "mongodb" })  // Finds documents with "mongodb" in tags array

Limitations: Only one array field per compound index.

Text Index:

Enables full-text search across string fields:

javascript
// Create text index on title and content
db.articles.createIndex({ 
  title: "text", 
  content: "text" 
})

// Search for text
db.articles.find({ $text: { $search: "mongodb tutorial" } })

Text indexes support:

  • Word stemming and stop words
  • Multiple languages
  • Case-insensitive search
  • Phrase search with quotes

Geospatial Indexes:

Support location-based queries:

2dsphere (spherical geometry for Earth-like surfaces):

javascript
db.stores.createIndex({ location: "2dsphere" })

// Find stores near coordinates
db.stores.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [-122.4194, 37.7749] },
      $maxDistance: 5000  // meters
    }
  }
})

2d (flat Cartesian plane):

javascript
db.places.createIndex({ coordinates: "2d" })

Hashed Index:

Hashes field value enabling hash-based sharding:

javascript
db.users.createIndex({ userId: "hashed" })

Used primarily for shard key in hashed sharding. Not suitable for range queries.

Index Properties:

Unique Index: Enforces field uniqueness:

javascript
db.users.createIndex({ email: 1 }, { unique: true })

Partial Index: Indexes only documents matching filter:

javascript
db.orders.createIndex(
  { status: 1, orderDate: 1 },
  { partialFilterExpression: { status: { $eq: "active" } } }
)

Reduces index size and maintenance for filtered queries.

Sparse Index: Indexes only documents containing the indexed field:

javascript
db.users.createIndex({ phoneNumber: 1 }, { sparse: true })

Useful when field is optional.

TTL Index: Automatically deletes documents after specified time:

javascript
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 3600 }  // Delete after 1 hour
)

Works only on Date fields. Single-field indexes only.

Wildcard Index: Indexes all fields or fields matching pattern:

javascript
// Index all fields
db.products.createIndex({ "$**": 1 })

// Index all fields within userMetadata
db.users.createIndex({ "userMetadata.$**": 1 })

Useful for flexible schemas with unpredictable fields.

This question tests your understanding of MongoDB’s comprehensive indexing capabilities and ability to select appropriate index types for specific scenarios.

Explain MongoDB replication and replica sets.

Replication provides data redundancy and high availability, making it essential for production MongoDB deployments.

Replica Set Fundamentals:

A replica set is a group of MongoDB servers maintaining identical data copies. Replica sets provide:

  • High availability: Automatic failover if primary fails
  • Data redundancy: Multiple copies protect against hardware failure
  • Read scaling: Distribute read operations across members
  • Disaster recovery: Backup without impacting primary

Replica Set Architecture:

Primary Node:

  • Receives all write operations
  • Handles reads by default (configurable)
  • Replicates operations to secondaries via oplog
  • Only one primary per replica set

Secondary Nodes:

  • Replicate data from primary asynchronously
  • Can serve read operations (with read preference configuration)
  • Participate in primary elections
  • Maintain complete data copies

Arbiter (optional):

  • Participates in elections but doesn’t store data
  • Provides voting member without storage overhead
  • Used to create odd number of voting members
  • Not recommended for production (prefer data-bearing secondaries)

Minimum Recommended Configuration:

  • Three-member set: Primary + 2 Secondaries
  • Or: Primary + Secondary + Arbiter (less preferred)

Automatic Failover:

When primary becomes unavailable:

  1. Secondaries detect primary failure (typically within 12 seconds)
  2. Eligible secondaries hold election
  3. Member with most votes becomes new primary
  4. Clients automatically connect to new primary (with proper driver configuration)

Replica Set Configuration:

javascript
// Initialize replica set
rs.initiate({
  _id: "myReplicaSet",
  members: [
    { _id: 0, host: "mongodb0.example.com:27017" },
    { _id: 1, host: "mongodb1.example.com:27017" },
    { _id: 2, host: "mongodb2.example.com:27017" }
  ]
})

// Check replica set status
rs.status()

// Add member
rs.add("mongodb3.example.com:27017")

// Remove member
rs.remove("mongodb3.example.com:27017")

// Step down primary (force election)
rs.stepDown()

Member Configuration Options:

Priority: Determines election preference (0-1000):

javascript
// Higher priority member preferred for primary
rs.add({ 
  host: "mongodb3.example.com:27017",
  priority: 2  // Higher than default 1
})

// Priority 0 prevents member from becoming primary
rs.add({
  host: "analytics.example.com:27017",
  priority: 0  // Never becomes primary
})

Hidden Members: Not visible to application, used for analytics or backups:

javascript
rs.add({
  host: "backup.example.com:27017",
  priority: 0,
  hidden: true
})

Delayed Members: Maintain data from earlier point in time (rolling backup):

javascript
rs.add({
  host: "delayed.example.com:27017",
  priority: 0,
  hidden: true,
  slaveDelay: 3600  // 1 hour behind
})

Oplog (Operations Log):

Special capped collection storing all write operations:

  • Located in local.oplog.rs collection
  • Secondaries tail oplog to replicate operations
  • Finite size (configurable, typically 5% of free disk)
  • Old entries removed as new operations added

Read Preference:

Controls where read operations are directed:

javascript
// Read from primary (default)
db.products.find().readPref("primary")

// Read from primary if available, else secondary
db.products.find().readPref("primaryPreferred")

// Read from secondary
db.products.find().readPref("secondary")

// Read from secondary if available, else primary
db.products.find().readPref("secondaryPreferred")

// Read from nearest member (lowest latency)
db.products.find().readPref("nearest")

Write Concern:

Specifies acknowledgment level for write operations:

javascript
// Wait for acknowledgment from majority of members
db.orders.insertOne(
  { order: "12345", amount: 100 },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
)

// Write to primary only (faster but less durable)
db.logs.insertOne(
  { message: "Log entry" },
  { writeConcern: { w: 1 } }
)

This question evaluates understanding of MongoDB’s high availability mechanisms and operational knowledge essential for production deployments.

What is sharding in MongoDB and when should you use it?

Sharding is MongoDB’s horizontal scaling solution, distributing data across multiple machines to handle large datasets and high-throughput operations.

Sharding Fundamentals:

Sharding partitions data across multiple servers (shards), each containing subset of total data. As data and traffic grow, organizations add more shards rather than upgrading single servers indefinitely.

Sharded Cluster Architecture:

Shards:

  • Store actual data (typically replica sets for redundancy)
  • Each shard contains subset of total data
  • Process queries for their data subset

Config Servers:

  • Store cluster metadata and configuration
  • Track which chunks reside on which shards
  • Must be replica set (3 config servers minimum)

Mongos (Query Routers):

  • Route client requests to appropriate shards
  • Aggregate results from multiple shards
  • Application connects to mongos, not shards directly
  • Stateless, can deploy multiple for load balancing

Shard Key:

Critical decision determining data distribution. Shard key is indexed field (or fields) dividing data into chunks:

javascript
// Enable sharding on database
sh.enableSharding("ecommerce")

// Shard collection by userId
sh.shardCollection(
  "ecommerce.orders",
  { userId: 1 }
)

// Compound shard key
sh.shardCollection(
  "ecommerce.products",
  { category: 1, productId: 1 }
)

Shard Key Characteristics:

Good Shard Keys:

  • High cardinality (many distinct values)
  • Even distribution across shard key values
  • Query patterns align with shard key (enables targeted queries)
  • Write-friendly (avoids hot spots)

Examples of Good Shard Keys:

  • UserId (for user-specific queries)
  • Timestamp + UserId (for time-series data)
  • Hashed field (for even distribution)

Poor Shard Keys:

  • Low cardinality (country code, boolean)
  • Monotonically increasing values (timestamp alone causes write hot spot)
  • Uneven distribution

Example:

javascript
// Poor: timestamp only (all writes go to one shard)
sh.shardCollection("logs.events", { timestamp: 1 })

// Better: hashed timestamp
sh.shardCollection("logs.events", { timestamp: "hashed" })

// Best: compound with high-cardinality prefix
sh.shardCollection("logs.events", { userId: 1, timestamp: 1 })

Sharding Strategies:

Range-based Sharding: Divides data by shard key ranges:

  • Chunk1: {userId: MinKey} → {userId: 1000}
  • Chunk2: {userId: 1000} → {userId: 2000}
  • Supports range queries efficiently
  • Risk of uneven distribution if data skewed

Hashed Sharding: Distributes data by hashing shard key:

javascript
sh.shardCollection("users.profiles", { _id: "hashed" })
  • Ensures even distribution
  • Does not support range queries on shard key
  • Good for random access patterns

Zone Sharding: Associates data ranges with specific shards:

javascript
// Create zones
sh.addShardToZone("shard1", "US-East")
sh.addShardToZone("shard2", "US-West")

// Assign ranges to zones
sh.updateZoneKeyRange(
  "users.profiles",
  { region: "east" },
  { region: "east" },
  "US-East"
)

Enables data locality (geographic distribution).

When to Use Sharding:

Consider sharding when:

  • Dataset exceeds single server storage capacity
  • Working set exceeds available RAM
  • Write/read throughput exceeds single server capacity
  • Geographic distribution required for data locality
  • Application needs horizontal scalability

Avoid premature sharding:

  • Adds operational complexity
  • Shard key choice is permanent (changing requires migration)
  • Start with replica sets, add sharding when needed
  • Typical threshold: 2-5TB of data or significant performance degradation

Sharding Trade-offs:

Benefits:

  • Horizontal scalability for data and throughput
  • No single point of failure (with properly configured shards)
  • Geographic distribution capabilities
  • Cost-effective scaling with commodity hardware

Drawbacks:

  • Increased complexity (more components to manage)
  • Queries not using shard key require scatter-gather
  • Transactions across shards have performance implications
  • Backup and maintenance more complex
  • Shard key selection critical and hard to change

This question assesses your understanding of MongoDB’s scalability mechanisms and ability to make architectural decisions about when and how to implement sharding.

How do you perform aggregation in MongoDB?

The aggregation framework provides powerful data processing and analysis capabilities within MongoDB.

Aggregation Pipeline:

Aggregation processes documents through stages, each transforming data progressively:

javascript
db.orders.aggregate([
  // Stage 1: Filter documents
  { $match: { status: "completed" } },
  
  // Stage 2: Group and calculate
  { $group: {
    _id: "$customerId",
    totalSpent: { $sum: "$amount" },
    orderCount: { $sum: 1 },
    avgOrderValue: { $avg: "$amount" }
  }},
  
  // Stage 3: Filter aggregated results
  { $match: { totalSpent: { $gt: 1000 } } },
  
  // Stage 4: Sort results
  { $sort: { totalSpent: -1 } },
  
  // Stage 5: Limit results
  { $limit: 10 }
])

Common Pipeline Stages:

$match: Filters documents (like WHERE clause):

javascript
{ $match: { 
  orderDate: { 
    $gte: ISODate("2024-01-01"),
    $lt: ISODate("2025-01-01")
  },
  status: "completed"
}}

Place $match early in pipeline to reduce processed documents.

$group: Groups documents by expression and computes aggregates:

javascript
{ $group: {
  _id: { 
    year: { $year: "$orderDate" },
    month: { $month: "$orderDate" }
  },
  revenue: { $sum: { $multiply: ["$quantity", "$price"] } },
  avgQuantity: { $avg: "$quantity" },
  totalOrders: { $sum: 1 },
  maxOrder: { $max: "$amount" },
  minOrder: { $min: "$amount" }
}}

$project: Reshapes documents, includes/excludes fields:

javascript
{ $project: {
  _id: 0,  // Exclude _id
  customerName: 1,  // Include customerName
  orderTotal: { $multiply: ["$quantity", "$price"] },
  discountedPrice: { 
    $subtract: ["$price", { $multiply: ["$price", "$discount"] }]
  },
  month: { $month: "$orderDate" }
}}

$lookup: Performs left outer join with another collection:

javascript
{ $lookup: {
  from: "products",           // Collection to join
  localField: "productId",    // Field in current collection
  foreignField: "_id",        // Field in products collection
  as: "productDetails"        // Output array field
}}

$unwind: Deconstructs array field into separate documents:

javascript
// Before unwind
{ _id: 1, items: ["A", "B", "C"] }

// After $unwind: "$items"
{ _id: 1, items: "A" }
{ _id: 1, items: "B" }
{ _id: 1, items: "C" }

$sort: Orders documents:

javascript
{ $sort: { revenue: -1, customerName: 1 } }

$limit and $skip: Pagination:

javascript
{ $skip: 20 },  // Skip first 20
{ $limit: 10 }  // Return next 10

$addFields: Adds new fields to documents:

javascript
{ $addFields: {
  fullName: { $concat: ["$firstName", " ", "$lastName"] },
  isVIP: { $gte: ["$totalSpent", 10000] }
}}

$facet: Multi-faceted aggregation in single stage:

javascript
{ $facet: {
  "categoryCounts": [
    { $group: { _id: "$category", count: { $sum: 1 } } }
  ],
  "priceStats": [
    { $group: { 
      _i

Leave a Reply

Your email address will not be published. Required fields are marked *