Get last documents with a distinct criteria

Situation

I'm having trouble coming up with a good way to do a certain MongoDb query. First, here's what kind of query I want to do. Assume a simple database which logs entry and exit events (and possibly other actions, doesn't matter) by electronic card swipe. So there's a collection called swipelog with simple documents which look like this:

{
    _id: ObjectId("524ab4790a4c0e402200052c")
    name: "John Doe", 
    action: "entry",
    timestamp: ISODate("2013-10-01T1:32:12.112Z") 
}

Now I want to list names and their last entry times (and any other fields I may want, but example below uses just these two fields).

Current solution

Here is what I have now, as a "one-liner" for MongoDb JavaScript console:

db.swipelog.distinct('name')
           .forEach( function(name) {

    db.swipelog.find( { name: name, action:"entry" } )
               .sort( { $natural:-1 } )
               .limit(1)
               .forEach( function(entry) {

        printjson( [ entry.name, entry.timestamp ] )
    }) 
})

Which prints something like:

[ "John Doe", ISODate("2013-10-01T1:32:12.112Z")]
[ "Jane Deo", ISODate("2013-10-01T1:36:12.112Z")]
...

Question

I think above has the obvious scaling problem. If there are a hundred names, then 1+100 queries will be made to the database. So what is a good/correct way to get "last timestamp of every distinct name" ? Changing database structure or adding some collections is ok, if it makes this easier.

Answers:

Answer

You can use aggregation framework to achieve this:

 db.collection.aggregate(
          [
             {$match:
                  {action:'entry'}
             },
             {$group:
                  {_id:'$name',
                   first:
                         {$max:'$timestamp'}
                  }
             }
          ])

If you likely to include other fields in the results, you can use the $first operator

 db.collection.aggregate(
          [
             {$match:
                  {action:'entry'}
             },
             {$sort:
                  {name:1, timestamp:-1}
             },
             {$group:
                  {_id:'$name',
                   timestamp: {$first:'$timestamp'},
                   otherField: {$first:'$otherField'},
                  }
             }
          ])
Answer

This answer should be a comment on attish's answer above, but I don't have sufficient rep here to comment

Keep in mind that the aggregation framework cannot return more than 16MB of data. If you have a very large number of users, you may run into this limitation on your production system.

MongoDB 2.6 adds new features to the aggregation framework to deal with this:

  • db.collection.aggregateCursor() (temporary name) is identical to db.collection.aggregate() except that it returns a cursor instead of a document. This avoids the 16MB limitation
  • $out is a new pipeline phase that directs the pipeline's output to a collection. This allows you to run aggregation jobs
  • $sort has been improved to remove its RAM limitations and increase speed

If query performance is more important than data age, you could schedule a regular aggregate command that stores its results in collection like db.last_swipe, then have your application simply query db.last_swipe for the relevant user.

Conclusion: I agree that attish has the right approach. However, you may run into trouble scaling it on the current MongoDB release and should look into Mongo 2.6.

Tags

Recent Questions

Top Questions

Home Tags Terms of Service Privacy Policy DMCA Contact Us

©2020 All rights reserved.