"Thus, putting the sort key first is generally a good strategy when you're using a limit so MongoDB can stop scanning the index after a couple of matches."
In page 89, author put an example shows the power of the combination of "sort key first" plus "limit", it's a bit misleading.
First, the conclusion is not rigorous although the author used the word "generally". The conclusion and example was based on the condition:
cursor collected enough result very early.
Suppose we have a collection with 1,000,000 documents.
{"x":1, "y":1}
{"x":2, "y":2}
...
{"x":1000000, "y":1000000}
create the colleciton by:
for(var i=1;i<=1000000;i++){db.col.insert({x:i,y:i});}
Add 2 compound indexes {x:1,y:1} and {y:1,x:1}
> db.col.getIndexes() [ { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.col", "name" : "_id_" }, { "v" : 1, "key" : { "x" : 1, "y" : 1 }, "ns" : "test.col", "name" : "x_1_y_1" }, { "v" : 1, "key" : { "y" : 1, "x" : 1 }, "ns" : "test.col", "name" : "y_1_x_1" } ]Now, we're trying to get all documents with x less than 100.
After pre-heat data, let's make some test.
Case 1: all qualified documents are located at beginning 1,
1, db.col.find({"y":{$lt:1000}}).sort({x:1}).limit(500).hint({x:1,y:1})
> db.col.find({"y":{$lt:1000}}).sort({x:1}).limit(500).hint({x:1,y:1}).explain() { "cursor" : "BtreeCursor x_1_y_1", "isMultiKey" : false, "n" : 500, "nscannedObjects" : 500, "nscanned" : 500, "nscannedObjectsAllPlans" : 500, "nscannedAllPlans" : 500, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 1, "indexBounds" : { "x" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ], "y" : [ [ -1.7976931348623157e+308, 1000 ] ] }, "server" : "feifan-server:27017" }2, db.col.find({"y":{$lt:1000}}).sort({x:1}).limit(500).hint({y:1,x:1})
> db.col.find({"y":{$lt:1000}}).sort({x:1}).limit(500).hint({y:1,x:1}).explain() { "cursor" : "BtreeCursor y_1_x_1", "isMultiKey" : false, "n" : 500, "nscannedObjects" : 999, "nscanned" : 999, "nscannedObjectsAllPlans" : 999, "nscannedAllPlans" : 999, "scanAndOrder" : true, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 9, "indexBounds" : { "y" : [ [ -1.7976931348623157e+308, 1000 ] ], "x" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ] }, "server" : "feifan-server:27017" }#1 query is faster than #2 a bit
#1 followed the strategy "putting sort key first" it perform a table scan, before the cursor went to 501st object, it had already collected 500 qualified documents already, and sorted correctly. , task done!
#2 used index in matching, it first get 999 documents with field y less than 1000. Then it sort the 999 documents in memory, and return the first 500.
Case 2: all qualified documents are located at tail
3, db.col.find({"y":{$gt:900000}}).sort({x:1}).limit(500).hint({x:1,y:1})
> db.col.find({"y":{$gt:900000}}).sort({x:1}).limit(500).hint({x:1,y:1}).explain() { "cursor" : "BtreeCursor x_1_y_1", "isMultiKey" : false, "n" : 500, "nscannedObjects" : 500, "nscanned" : 859591, "nscannedObjectsAllPlans" : 500, "nscannedAllPlans" : 859591, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 3477, "indexBounds" : { "x" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ], "y" : [ [ 900000, 1.7976931348623157e+308 ] ] }, "server" : "feifan-server:27017" }4, db.col.find({"y":{$gt:900000}}).sort({x:1}).limit(500).hint({y:1,x:1})
> db.col.find({"y":{$gt:900000}}).sort({x:1}).limit(500).hint({y:1,x:1}).explain() { "cursor" : "BtreeCursor y_1_x_1", "isMultiKey" : false, "n" : 500, "nscannedObjects" : 100000, "nscanned" : 100000, "nscannedObjectsAllPlans" : 100000, "nscannedAllPlans" : 100000, "scanAndOrder" : true, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 470, "indexBounds" : { "y" : [ [ 900000, 1.7976931348623157e+308 ] ], "x" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ] }, "server" : "feifan-server:27017" }#3 query followed the strategy "putting sort key first", but it was much slower then #4.
#3 made a table scan as #1 did, but unfortunately, all qualified result are located far away from the beginning. before the 900,001st document, #3 cursor got nothing. From 900001, #3 finally get 500 qualified result.
#4 first get 100,000 results rapidly based on index. then made a sort in memory, and returned the first 500.
Analysis:
query time of #1 is made by "lucky short table scan time" + "0 sort time"
query time of #2 is made by "little b-tree searching time" + "sort in memory time"
query time of #3 is made by "unlucky a lot of table scan time" + "0 sort time"
while the query time of #4 is made by "little b-tree searching time" + "sort in memory time"
The conclusion in the book is not 100% fit for all cases, do explain before performing a mongoDB query.
No comments:
Post a Comment