Aggregation builder

Note

This feature is introduced in version 1.2

The aggregation framework provides an easy way to process records and return computed results. The aggregation builder helps to build complex aggregation pipelines.

Creating an Aggregation Builder

You can easily create a new Aggregation\Builder object with the DocumentManager::createAggregationBuilder() method:

<?php

$builder = $dm->createAggregationBuilder(\Documents\User::class);

The first argument indicates the document for which you want to create the builder.

Adding pipeline stages

To add a pipeline stage to the builder, call the corresponding method on the builder object:

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->match()
        ->field('purchaseDate')
        ->gte($from)
        ->lt($to)
        ->field('user')
        ->references($user)
    ->group()
        ->field('id')
        ->expression('$user')
        ->field('numPurchases')
        ->sum(1)
        ->field('amount')
        ->sum('$amount');

Just like the query builder, the aggregation builder takes care of converting DateTime objects into MongoDate objects.

Nesting expressions

You can create more complex aggregation stages by using the expr() method in the aggregation builder.

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->match()
        ->field('purchaseDate')
        ->gte($from)
        ->lt($to)
        ->field('user')
        ->references($user)
    ->group()
        ->field('id')
        ->expression(
            $builder->expr()
                ->field('month')
                ->month('purchaseDate')
                ->field('year')
                ->year('purchaseDate')
        )
        ->field('numPurchases')
        ->sum(1)
        ->field('amount')
        ->sum('$amount');

This aggregation would group all purchases by their month and year by projecting those values into an embedded object for the id field. For example:

{
    _id: {
        month: 1,
        year: 2016
    },
    numPurchases: 1,
    amount: 27.89
}

Executing an aggregation pipeline

You can execute a pipeline using the execute() method. This will run the aggregation pipeline and return a cursor for you to iterate over the results:

<?php

$builder = $dm->createAggregationBuilder(\Documents\User::class);
$result = $builder->execute();

If you instead want to look at the built aggregation pipeline, call the Builder::getPipeline() method.

Hydration

By default, aggregation results are returned as PHP arrays. This is because the result of an aggregation pipeline may look completely different from the source document. In order to get hydrated aggregation results, you first have to map a QueryResultDocument. These are written like regular mapped documents, but they can’t be persisted to the database.

  • PHP
    <?php
    
    namespace Documents;
    
    /** @QueryResultDocument */
    class UserPurchases
    {
        /** @ReferenceOne(targetDocument="User", name="_id") */
        private $user;
    
        /** @Field(type="int") */
        private $numPurchases;
    
        /** @Field(type="float") */
        private $amount;
    }
    
  • XML
    <?xml version="1.0" encoding="UTF-8"?>
    <doctrine-mongo-mapping xmlns="http://doctrine-project.org/schemas/odm/doctrine-mongo-mapping"
                      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                      xsi:schemaLocation="http://doctrine-project.org/schemas/odm/doctrine-mongo-mapping
                      http://doctrine-project.org/schemas/odm/doctrine-mongo-mapping.xsd">
        <query-result-document name="Documents\UserPurchases">
            <field fieldName="numPurchases" type="int" />
            <field fieldName="amount" type="float" />
            <reference-one field="user" target-document="Documents\User" name="_id" />
        </query-result-document>
    </doctrine-mongo-mapping>
    
  • YAML
    Documents\User:
      type: queryResultDocument
      fields:
        user:
          name: _id
          targetDocument: Documents\User
        numPurchases:
          type: int
        amount:
          type: float
    

Once you have mapped the document, use the hydrate() method to tell the aggregation builder about this document:

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->hydrate(\Documents\UserPurchases::class)
    ->match()
        ->field('purchaseDate')
        ->gte($from)
        ->lt($to)
        ->field('user')
        ->references($user)
    ->group()
        ->field('id')
        ->expression('$user')
        ->field('numPurchases')
        ->sum(1)
        ->field('amount')
        ->sum('$amount');

When you run the queries, all results will be returned as instances of the specified document.

Note

Query result documents can use all features regular documents can use: you can map embedded documents, define references, and even use discriminators to get different result documents according to the aggregation result.

Aggregation pipeline stages

MongoDB provides the following aggregation pipeline stages:

Note

The $lookup, $sample and $indexStats stages were added in MongoDB 3.2.

$project

The $project stage lets you reshape the current document or define a completely new one:

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->project()
        ->excludeIdField()
        ->includeFields(['purchaseDate', 'user'])
        ->field('purchaseYear')
        ->year('$purchaseDate');

$match

The $match stage lets you filter documents according to certain criteria. It works just like the query builder:

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->match()
        ->field('purchaseDate')
        ->gte($from)
        ->lt($to)
        ->field('user')
        ->references($user);

You can also use fields defined in previous stages:

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->project()
        ->excludeIdField()
        ->includeFields(['purchaseDate', 'user'])
        ->field('purchaseYear')
        ->year('$purchaseDate')
    ->match()
        ->field('purchaseYear')
        ->equals(2016);

$group

The $group stage is used to do calculations based on previously matched documents:

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->match()
        ->field('user')
        ->references($user)
    ->group()
        ->field('id')
        ->expression(
            $builder->expr()
                ->field('month')
                ->month('purchaseDate')
                ->field('year')
                ->year('purchaseDate')
        )
        ->field('numPurchases')
        ->sum(1)
        ->field('amount')
        ->sum('$amount');

$lookup

Note

The $lookup stage was introduced in MongoDB 3.2. Using it on older servers will result in an error.

The $lookup stage is used to fetch documents from different collections in pipeline stages. Take the following relationship for example:

<?php

/**
 * @ReferenceMany(
 *     targetDocument="Documents\Item",
 *     cascade="all",
 *     storeAs="id"
 * )
 */
private $items;
<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->lookup('items')
        ->alias('items');

The resulting array will contain all matched item documents in an array. This has to be considered when looking up one-to-one relationships:

<?php

/**
 * @ReferenceOne(
 *     targetDocument="Documents\Item",
 *     cascade="all",
 *     storeAs="id"
 * )
 */
private $items;
<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->lookup('user')
        ->alias('user')
    ->unwind('$user');

MongoDB will always return an array, even if the lookup only returned a single document. Thus, when looking up one-to-one references the result must be flattened using the $unwind operator.

Note

Due to a limitation in MongoDB, the $lookup stage can only be used with references that are stored as ID (see example above). References stored with a DbRef object can’t be used. To use references in a $lookup stage, store the references as ID. This is explained in the Reference mapping chapter..

You can also configure your lookup manually if you don’t have it mapped in your document:

<?php

$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->lookup('unmappedCollection')
        ->localField('_id')
        ->foreignField('userId')
        ->alias('items');

$unwind

The $unwind stage flattens an array in a document, returning a copy for each item. Take this sample document:

{
    _id: {
        month: 1,
        year: 2016
    },
    purchaseDates: [
        '2016-01-07',
        '2016-03-10',
        '2016-06-25'
    ]
}

To flatten the purchaseDates array, we would apply the following pipeline stage:

$builder = $dm->createAggregationBuilder(\Documents\User::class);
$builder->unwind('$purchaseDates');

The stage would return three documents, each containing a single purchase date:

{
    _id: {
        month: 1,
        year: 2016
    },
    purchaseDates: '2016-01-07'
},
{
    _id: {
        month: 1,
        year: 2016
    },
    purchaseDates: '2016-03-10'
},
{
    _id: {
        month: 1,
        year: 2016
    },
    purchaseDates: '2016-06-25'
}

$redact

The redact stage can be used to restrict the contents of the documents based on information stored in the documents themselves. You can read more about the $redact stage in the MongoDB documentation.

The following example taken from the official documentation checks the level field on all document levels and evaluates it to grant or deny access:

{
    _id: 1,
    level: 1,
    acct_id: "xyz123",
    cc: {
        level: 5,
        type: "yy",
        num: 000000000000,
        exp_date: ISODate("2015-11-01T00:00:00.000Z"),
        billing_addr: {
            level: 5,
            addr1: "123 ABC Street",
            city: "Some City"
        },
        shipping_addr: [
            {
                level: 3,
                addr1: "987 XYZ Ave",
                city: "Some City"
            },
            {
                level: 3,
                addr1: "PO Box 0123",
                city: "Some City"
            }
        ]
    },
    status: "A"
}
$builder = $dm->createAggregationBuilder(\Documents\Orders::class);
$builder
    ->redact()
        ->cond(
            $builder->expr()->gte('$$level', 5),
            '$$PRUNE',
            '$$DESCEND'
        )

$sort, $limit and $skip

The $sort, $limit and $skip stages behave like the corresponding query options, allowing you to control the order and subset of results returned by the aggregation pipeline.

$sample

The sample stage can be used to randomly select a subset of documents in the aggregation pipeline. It behaves like the $limit stage, but instead of returning the first n documents it returns n random documents.

$geoNear

The $geoNear stage finds and outputs documents in order of nearest to farthest from a specified point.

$builder = $this->dm->createAggregationBuilder(\Documents\City::class);
$builder
    ->geoNear(120, 40)
    ->spherical(true)
    ->distanceField('distance')
    // Convert radians to kilometers (use 3963.192 for miles)
    ->distanceMultiplier(6378.137);

Note

The $geoNear stage must be the first stage in the pipeline and the collection must contain a single geospatial index. You must include the distanceField option for the stage to work.

$out

The $out stage is used to store the result of the aggregation pipeline in a collection instead of returning an iterable cursor of results. This must be the last stage in an aggregation pipeline.

If the collection specified by the $out operation already exists, then upon completion of the aggregation, the existing collection is atomically replaced. Any indexes that existed on the collection are left intact. If the aggregation fails, the $out operation does not remove the data from an existing collection.

Note

The aggregation pipeline will fail to complete if the result would violate any unique index constraints, including those on the id field.

$indexStats

The $indexStats stage returns statistics regarding the use of each index for the collection. More information can be found in the official Documentation

Fork me on GitHub