Monday 25 June 2018

MongoDB - II

In the last blog we took a look arrays in MongoDB. In this post, we continue with arrays in MongoDB on the topics of querying arrays and then projecting arrays. For all the work in this post, we will be using the same version as in the last two posts: MongoDB version v3.6.5. We will refer to the same merchandise database and products collection that we created in the last post. Running a find against Mongo shell returns below result:

>  db.products.find()
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 400, "item" : "Hat", "sizes" : "M" }
{ "_id" : 500, "item" : "Wrist band" }
{ "_id" : 600, "item" : "Sweat band", "sizes" : null }
{ "_id" : 700, "item" : "Cap", "sizes" : [ ] }

Let us find only records that have an array for sizes field:

> db.products.find( { sizes: { $type: "array" } })
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }

{ "_id" : 700, "item" : "Cap", "sizes" : [ ] }

The corresponding aggregation pipeline is:

db.products.aggregate( [ { $match : { sizes: { $type: "array" } } } ] )

Note that this even returns records having no element like the document with id, 700. To remove such documents, we can use elemMatch as shown below:

> db.products.find( {sizes: {$elemMatch: {$exists: true} } })
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }

The corresponding aggregation pipeline is:

db.products.aggregate( [ { $match : {sizes: {$elemMatch: {$exists: true} } } } ] )

Let us now try to find documents having "S" in sizes array:

> db.products.find( { sizes: "S" } )

{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }

The corresponding aggregation pipeline is:

db.products.aggregate( [ { $match : { sizes: "S" } } ] )

To return documents having either a "S" or "X":

> db.products.find( { $or: [ { sizes: "S" }, { sizes: "X" } ] } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }

{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }

The corresponding aggregation pipeline is:

db.products.aggregate( [ { $match : { $or: [ { sizes: "S" }, { sizes: "X" } ] } } ] )

To return documents having an exact match of contents in the same order in arrays:

> db.products.find( { sizes: ["M"] } )
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
> db.products.find( { sizes: ["S","M","L"] } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }

The corresponding aggregation pipelines:

> db.products.aggregate( [ { $match : { sizes: ["M"] } } ] )
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
> db.products.aggregate( [ { $match : { sizes: ["S","M","L"] } } ] )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }

To return documents having arrays with content in any order:

> db.products.find( { sizes: { $all: ["M"] } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 400, "item" : "Hat", "sizes" : "M" }
> db.products.find( { sizes: { $all: ["S", "L"] } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }

Note that even though sizes in id :400 is not an array, it is still returned. To return only records with arrays, we can add a check as we shall see later. The corresponding aggregation pipelines:

db.products.aggregate( [ { $match : { sizes: { $all: ["M"] } } } ] )

db.products.aggregate( [ { $match : { sizes: { $all: ["S", "L"] } } } ] )

To add multiple conditions on the contents of arrays:

> db.products.find( { sizes: { $gt: "A", $lt: "O" } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 400, "item" : "Hat", "sizes" : "M" }

The corresponding aggregation pipeline is:

db.products.aggregate( [ { $match : { sizes: { $gt: "A", $lt: "O" } } } ] )

If the conditions are for one item only in the array, then, use elemMatch:

> db.products.find( { sizes: { $elemMatch: { $gt: "L", $lt: "S" } } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }

The corresponding aggregation pipeline is:

db.products.aggregate( [ { $match : { sizes: { $elemMatch: { $gt: "L", $lt: "S" } } } } ] )

If we wish to query on an index position, then, we can use dot to indicate the position:

> db.products.find( { "sizes.0": { $gt: "R" } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
> db.products.find( { "sizes.0": { $gt: "S" } } )
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }

The corresponding aggregation pipeline:

db.products.aggregate( [ { $match : { "sizes.0": { $gt: "R" } }  } ] )

db.products.aggregate( [ { $match : { "sizes.0": { $gt: "S" } }  } ] )

We can query an array based on the size of array as shown below:

> db.products.find( { "sizes": { $size: 0 } } )
{ "_id" : 700, "item" : "Cap", "sizes" : [ ] }
>
> db.products.find( { "sizes": { $size: 1 } } )
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
>
> db.products.find( { "sizes": { $size: 3 } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }

The corresponding aggregation pipeline:

db.products.aggregate( [ { $match : { "sizes": { $size: 0 } }  } ] )

db.products.aggregate( [ { $match : { "sizes": { $size: 1 } }  } ] )

db.products.aggregate( [ { $match : { "sizes": { $size: 3 } }  } ] )

Lastly, on the querying side, we take a look at regex that can be used for querying on the contents of arrays:

> db.products.find( { "sizes":  { $regex: /^S/ } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
>
> db.products.find( { "sizes":  { $regex: /^M/ } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 400, "item" : "Hat", "sizes" : "M" }
>
> db.products.find( { "sizes":  { $regex: /L$/ } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
>
> db.products.find( { "sizes":  { $regex: /M/ } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 400, "item" : "Hat", "sizes" : "M" }
>
> db.products.find( { "sizes":  { $regex: /X.L/ } } )
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
>
> db.products.find( { "sizes":  { $regex: /X..L/ } } )
>

Note that the last query did not return any result as there is no element in any array that starts with X and end in L and has exactly two characters in between them. The corresponding aggregation pipeline:

db.products.aggregate( [ { $match : { sizes: { $regex: /^S/ } } } ] )

db.products.aggregate( [ { $match : { sizes: { $regex: /^M/ } } } ] )

db.products.aggregate( [ { $match : { sizes: { $regex: /L$/ } } } ] )

db.products.aggregate( [ { $match : { sizes: { $regex: /M/ } } } ] )

db.products.aggregate( [ { $match : { sizes: { $regex: /X.L/ } } } ] )

db.products.aggregate( [ { $match : { sizes: { $regex: /X..L/ } } } ] )

Now, we take a look at projection of arrays. $ operator returns the first matched element of the array that matches the query and is used as array_name.$ as shown below:

db.products.find( {},{ "sizes.$": 1 } )

But this results in error:

> db.products.find( {},{ "sizes.$": 1 } )
Error: error: {
        "ok" : 0,
        "errmsg" : "Positional projection 'sizes.$' does not match the query document.",
        "code" : 2,
        "codeName" : "BadValue"
}

This is only to be expected as there are some documents that are not arrays or contain array with no elements. So, we select only the records that are arrays and then apply this operator as shown:

> db.products.find( {sizes: {$elemMatch: {$exists: true} } })
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
> db.products.find( {sizes: {$elemMatch: {$exists: true} } },{ "sizes.$": 1 } )
{ "_id" : 100, "sizes" : [ "S" ] }
{ "_id" : 200, "sizes" : [ "X" ] }
{ "_id" : 300, "sizes" : [ "M" ] }

Note that we can combine the $ operator with the query is the first argument of find() as shown below:

> db.products.find( { $and: [ { sizes: { $type: "array" } }, { "sizes": { $gt: "O" } } ] } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
> db.products.find( { $and: [ { sizes: { $type: "array" } }, { "sizes": { $gt: "O" } } ] },{ "sizes.$": 1 } )
{ "_id" : 100, "sizes" : [ "S" ] }
{ "_id" : 200, "sizes" : [ "X" ] }

The $elemMatch in project also returns the first element of the arrays as $ operator but has the advantage of taking arguments as shown below:

> db.products.find( {sizes: {$elemMatch: {$exists: true} } })
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
> db.products.find( {sizes: {$elemMatch: {$exists: true} } },{ sizes: { $elemMatch: { $gt: "O" } } } )
{ "_id" : 100, "sizes" : [ "S" ] }
{ "_id" : 200, "sizes" : [ "X" ] }
{ "_id" : 300 }

Note that $elemMatch is used both in the query and projection arguments of find(). Also, in the case of  $elemMatch in projection returns the _id when the array does not match the query. The last operator is $slice that returns records by position. $slice takes two arguments, the first is the position to start and second one indicates the number of elements:

> db.products.find( { sizes: { $type: "array" } })
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL", "XXL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 700, "item" : "Cap", "sizes" : [ ] }
>
> db.products.find( { sizes: { $type: "array" } },{ item : 1 ,  sizes: { $slice: [ 0, 1 ] } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 700, "item" : "Cap", "sizes" : [ ] }
>
> db.products.find( { sizes: { $type: "array" } },{ item : 1 ,  sizes: { $slice: [ 0, 2 ] } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "S", "M" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "X", "XL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ "M" ] }
{ "_id" : 700, "item" : "Cap", "sizes" : [ ] }
>
> db.products.find( { sizes: { $type: "array" } },{ item : 1 ,  sizes: { $slice: [ 1, 2 ] } } )
{ "_id" : 100, "item" : "Pullover", "sizes" : [ "M", "L" ] }
{ "_id" : 200, "item" : "T-shirt", "sizes" : [ "XL", "XXL" ] }
{ "_id" : 300, "item" : "Bermuda Shorts", "sizes" : [ ] }
{ "_id" : 700, "item" : "Cap", "sizes" : [ ] }

The corresponding aggregate pipeline:

db.products.aggregate( [ { $match : { sizes: { $type: "array" } } } ] )

db.products.aggregate( [ { $match : { sizes: { $type: "array" } } }, { $project : { item : 1 , sizes : { "$slice": [ "$sizes", 0, 1 ] } } }  ] )

db.products.aggregate( [ { $match : { sizes: { $type: "array" } } }, { $project : { item : 1 , sizes : { "$slice": [ "$sizes", 0, 2 ] } } }  ] )

db.products.aggregate( [ { $match : { sizes: { $type: "array" } } }, { $project : { item : 1 , sizes : { "$slice": [ "$sizes", 1, 2 ] } } }  ] )

This concludes the projection of arrays