MongoDB, PHP, Yii Framework

MongoYii – A Yii MongoDB ORM

I made a Yii MongoDB ORM. It supports MongoDB and attempts to conform to Yii’s own standards as much as possible; it is named MongoYii.

You can find it and its documentation here: https://github.com/Sammaye/MongoYii

Advertisements
MongoDB

MongoDB Paging using Ranged Queries (Avoiding Skip())

Paging in MongoDB has always raised a lot of questions from developers and DBAdmins alike. In this article I will attempt to deal with one of life’s greatest dilemmas and talk of pagination in MongoDB.

What is wrong with skip()?

Nothing. Skip is an awesome and necessary function for 90% of paging cases, however, there are certain times where skip becomes slow and cumbersome for the server to work on.

The first consideration with using skip is to understand exactly what is required to commit a skip of x documents. In order to do a skip MongoDB must effectively do the work server-side that you would normally do client side of reading each record out and going over it until it reaches the right one. It normally does this via the _id or $natural index.

That being said, skip will work wonders on small result sets so you must be aware at all times whether reading this post for your queries is actually micro-optimisation or not. I personally cannot tell you whether you are micro-optimising your queries. Skip deprecation should be done with massive testing prior to reading this post, however, I personally have found that skip can work effectively into the 100’s of thousands of rows, maybe up to a million.

Believing that avoiding skip will solve all your problems is bullshit.

I can no longer live with Skip() what are my choices?

The solution to using skip is, instead, to range on the index. There are two operators in MongoDB which help with this:

Essentially, these operators tell MongoDB to only take a portion of the index into account when you go to query. As an example:- imagine I wish to range 20 million records and my index is {_id: 1}:

var c = db.c.find().limit(page_size).sort({_id: 1});
c.min(the_id_of_the_previous_document_from_the_last_page);

As you can see I range the index without using range operators. This means that range operators, such as $gte are separate from the pagination allowing this method to cover all scenarios.

The hard part is making the index. There are many considerations to take into account with this type of querying, sharding being an obvious one. You want an index that covers your queries speed but is not so complex that it cannot be split off into chunks. The ideal solution is where you page from 1,000,000 to 1,000,020 using only a single node in the shard set.

Let’s take an example; say I am paginating a Facebook friends list. I might have an index of {user_id: 1, mutual: 1, mutual_friends_count: 1, date_created: 1}.

There are two ways querying this index can happen:

  1. Without any complex operators to show all friends
  2. With a complex $gte to accomodate showing only recent friends

In this case it might be better to make the index {user_id: 1, date_created: 1, mutual: 1, mutual_friends_count: 1} so you can house a user’s friend list per shard while having the detail data sorted by time so you can easily range on the data within that shard while confining your query to that shard only. This would then be a fast query and should return almost as fast using single server.

So, the biggest problem with range pagination is the index. If you can perfect the index then you can easily make these queries scale.

But… What about skipping huge sections of my data?

Now you have come to the nasty bit. If skipping huge amounts of data were that easy don’t you think MongoDB would already do it for you?

To skip huge sections you need to understand that it isn’t just the DB that has to do the work, the application has to do something as well. Just as many sites who deal with log data or other people’s drunk nights out like Cloudflare and Facebook does.

Facebook timelines and many other large paging systems now accommodate a large skip via the interface. Notice how you have that year filter on the right hand side of your new timeline? Well, that’s how they do it! They simply narrow down the index via a year you pick. Simple, clean, and more effective and user friendly than putting in a random page number into the address bar.

Cloudflare, on their statistics page, gives you a huge summary graph allowing you pick out specific ranges using your mouse to highlight parts of the graph. Again, this works a lot like Facebook’s own, whittling it down to a pre-defined set, or range, of documents.

In fact if you look at 99% of all applications, whether web based or not, that have to deal with huge amounts of data (like Apache logs) you will realise they do not implement normal paging.

So the first and most important thing to understand about how to do large scale paging is that you can no longer solve your problems with just a bit of HTML and a skip(), you need an interface.

Just look at how other programs/sites do it and you will realise how you should do it.

What if I need to pick random records out for paging or just in a normal query?

This is a little off topic but can be used for paging as well.

sometimes people like to pick random records out from their collection. One option is to use skip() like:

$cursor = $collection->find();
$cursor->skip(3);
$cursor->next();
$doc = $cursor->current();

And again, this is good for small skips. But what if you need to do this continuously and at random points?

If you were to do this often and in random ways it might be better to use an increment ID, combining another collection with findAndModify to produce an accurate document number (Doc Link).

This, however, induces problems, it is slow (findAndModify adds locks) and you must maintain this ID, especially when deletes occur. Here is a solution using $gte to get the next nearest number. If that row still exists it will get that row, if not then it will get the next nearest random number:

$cur = $db->collection->find(array('ai_id' => array('$gte' => 403454)))->order(array('ai_id' => SORT_ASC))->limit(1);
$cursor->next();
$doc = $cursor->current();

Post a comment if you would like further info or see errors.

MongoDB, Web Development

MongoDB Single Key Index or Compound Index

Just got a question on the MongoDB user group as which type of index over multiple fields would be best and why. The user basically had three fields:

  • Type
  • CompanyID
  • Date

And was wondering what the difference would be to using a compound index on all three fields and using a single key index per field. The main difference would be that MongoDB only supports ONE index per query so the query optimiser would only be able to use one of your three indexes rendering the other 2 indexes useless and dead weight in the DB slowing inserts and using valuable indexing space.

As such with that in mind it is of course best to use a compound index to cover all three fields rather than a index on each field.

Also note that queries will be parsed and optimised better if you index in the order you query so for example:

find({ type: '', companyID: {} }).sort({ Date: -1 })

Would mean a version of an optimal index would be of:

ensureIndex({ type: 1, companyId: 1, Date: 1 })
MongoDB, PHP

Real-time Statistics in MongoDB

So, how do we do aggregation in a technology with no JOINs? How do companies like Digg replace their RDBMS with MongoDB for aggregation and analytics? It is quite simple, really, there are two main approaches:

  • Pre-aggregation (includes bucket aggregation and other methods)
  • Map Reduce (MR)

Disclaimer: I personally do not advertise the use of Map Reduce except for background processing between aggregating database nodes. Mostly due to speed, but there are other considerations.

The method which I will discuss is pre-aggregation. When picking data straight from a large collection MongoDB can tend to be much faster than SQL. MongoDB is specifically designed to be very fast at doing 100s of queries a second. DO NOT TRY AND USE MONGODB LIKE A RDB AND LOOK INTO STORED PROCEDURES!

Let’s start, imagine I have a video site with a load of videos on. I want to record all hits on videos within certain time ranges but I also want to store repeated views and unique views so as to understand how many returning viewers a user has over a period of time and how many unique viewers they drew in. I also would like to pre-aggregate stuff like browser type and referrer and find the best referrers to that video over a period of time – “All Time”, “Day”, “Month”, “Year”.

I decide these “time buckets” by asking myself how I would like to query, the words in brackets explain the time bucket that can handle these queries (I’ll explain that part later):

  • I would like to get hourly stats (all time)
  • I would like to get daily stats (all time)
  • I would like to get stats over days, possibly weeks (day)
  • I would like to get stats over months (month)
  • I would like to get stats over years (Year)

So, we are sure of our time buckets. Let’s make our collections:

  • stats.all_time
  • stats.day
  • stats.month
  • stats.year

Note: I have combined the unique and recurring views collections using a is_unique field in the document. If your collections grow up to trillions of rows you can split the two.

Now, let’s think about what we would like to store:

  • Unique hit
  • Recurring hit
  • Browser total

Here is an example document I thought up of earlier:

{
  _id: {},
  video: {},
  session_id: {},
  hits: 0,
  unique_hit: true // This denotes of the hit is unique, if false then it is recurring
  browsers: { // This array holds the browsers seen over the period of time, we will $inc this according to what we see
    chrome: 0,
    ie: 0,
    firefox: 0,
    other: 0
  },
  ts: number() // This holds a UNIX TIMESTAMP of the time step of the document
}

Now for the code, a new user comes in and gets assigned a session_id for the lifetime of their browser session (cookie lifetime of 0). We need to find out if this session_id has been here before so lets search for this video and session_id in the all_time collections:

var res = db.stats.all_time.find({ video: 2394785, session_id: 34857 });

if (res) { 
    // This user is not unique so we update the not unique documents for day, month and year
    // We upsert the documents just incase they are not already there.
    // For the purpose of this test we will say they are using the chrome browser to access our lovely pages.

     db.stats.day.update({unique_hit: false, ts: //ts of 0 hour of day}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.month.update({unique_hit: false, ts: //ts of 0 day of month}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.year.update({unique_hit: false, ts: //ts of year only}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
} else {

     // Insert a new row into the all time so that we cna undersand if this is unique or not next time.
     db.stats.all_time.insert({ session_id: session_id, video: 2394785, ts: new Date() });

     db.stats.day.update({unique_hit: true, ts: //ts of 0 hour of day}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.month.update({unique_hit: true, ts: //ts of 0 day of month}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.year.update({unique_hit: true, ts: //ts of year only}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
}

And there you have it! You can now track hits on a video and the browser types, so, for example; you wish to show all hits on a video for the last 7 days:

db.stats.day.find({ video: 2394785, ts: {$lt: new Date(), $gt: new Date(//ts of 7 days ago)} })

I know what many will say, “urgh but this means I gotta de-normalise and that’s bad!!!1111” What’s so different in SQL? Yes, the query would be one with aggregation etc but then because the query would be so heavy on your server you’ll find that actually you’ll be using memcached or big table designs (no JOINs) to actually make it go fast anyway and memcached isn’t real-time, bro’. So, quit your whining and get on it!

Example

I have taken a working version straight out of one of my projects. It will be missing some dependencies but the core logic should be intact:

	/**
	 * Document structure:
	 *
	 * {
	 *   _id: {},
	 *   hits: 0,
	 *   u_hits: 0,
	 *   hours: {
	 *   	1: {v: 5, u: 0},
	 *   	2: {v: 3, u: 9},
	 *   },
	 *   browser: {
	 *   	chrome: 1,
	 *   	ie: 2
	 *   },
	 *   age: {
	 *   	13_16: 0,
	 *   	17_25: 0
	 *   },
	 *   video_comments: 0,
	 *   text_comments: 0,
	 *   video_likes: 0,
	 *   video_dislikes: 0,
	 *   age_total: 0,
	 *   browser_total: 0,
	 *   male: 0,
	 *   female: 0,
	 *   day: 0
	 * }
	 */

And these are the field definitions for the day level document:

  • _id – The unique _id of the document
  • hits – The amount of hits (recurring) for that day
  • u_hits – The unique hits for that day
  • hours – This is an array of hour values (1-24) with another level of complexity within each hour weighting. The sub array within each hour contains a “v” and a “u”. The v is recurring views and the u is unique views.
  • browser – This is basically an array of browser types with the amount used beside each one. This variable is for unique hits only.
  • age – This denotes the age of logged in users who view this video and it separated into age groups which is then $inc’d for each individual unique visit
  • video_comments – This denotes the amount of video comments for that day
  • text_comments – This denotes the amount of text comments for that day
  • video_likes – The amount of likes for that day on the video
  • video_dislikes – The amount of dislikes for that day on the video
  • age_total – This is an aggregation field of all age groups summed up in amount
  • browser_title – Same as above but for browsers
  • male – The amount of males who visited this video
  • female – The amount of females who visited this video
  • day – A time stamp of the zero hour of the day

Now, for some of the meaty stuff. Lets make a function which can manipulate this schema:

// This function would be part of say a bigger class called "Video" is whic the object we are collecting stats about, so if you see $this->blah + 1 or whatever it means I am preaggregating the parent variables to make querying stats much easier on things like user video pages.
function recordHit()
{

    if (!is_search_bot($_SERVER['HTTP_USER_AGENT'])) { // Is the user a search bot? is so we don't want to add them

        $user = glue::session()->user;
        $u_brows_key = get_ua_browser();

        $u_age_key = 'u'; // Uknown age
        if ($_SESSION['logged']) { // If the user is a logged in user lets get some details
            ///Other, 13-16, 17-25, 26-35, 36-50, 50+
            $u_age = isset($user->birth_day) && isset($user->birth_month) && isset($user->birth_year) ?
                mktime(0, 0, 0, $user->birth_month, $user->birth_day, $user->birth_year) : 0; // Lets get their birthday
            $u_age_diff = (time() - $u_age) / (60 * 60 * 24 * 365); // Lets find the difference between today and then

            switch (true) { // Lets slap them into the right age group
                case $u_age_diff > 12 && $u_age_diff < 17:
                    $u_age_key = '13_16';
                    break;
                case $u_age_diff > 17 && $u_age_diff < 26:
                    $u_age_key = '17_25';
                    break;
                case $u_age_diff > 25 && $u_age_diff < 36:
                    $u_age_key = '26_35';
                    break;
                case $u_age_diff > 35 && $u_age_diff < 51:
                    $u_age_key = '36_50';
                    break;
                case $u_age_diff > 50:
                    $u_age_key = '50_plus';
                    break;
            }
        }

        $is_unique = glue::db()->{video_stats_all}->find(array(
                "sid" => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
                "vid" => $this->_id
            ))->count() <= 0; // Is this view unique?

        $update_doc = array('$inc' => array('hits' => 1)); // Lets always add a recurring hit
        if ($is_unique) {
            $update_doc['$inc']['u_hits'] = 1; // Lets $inc all the document (atomic ops)
            $update_doc['$inc']['age.' . $u_age_key] = 1;
            $update_doc['$inc']['browser.' . $u_brows_key] = 1;

            // These are used to make my life a little easier
            $update_doc['$inc']['age_total'] = 1;
            $update_doc['$inc']['browser_total'] = 1;

            if (glue::session()->user->gender == 'm') {
                $update_doc['$inc']['male'] = 1;
            } elseif (glue::session()->user->gender == 'f') {
                $update_doc['$inc']['female'] = 1;
            }
        }

        $day_update_doc = $update_doc;
        $day_update_doc['$inc']['hours.' . date('G') . '.v'] = 1; // Lets $inc the right hour in the day for this guy
        if ($is_unique): $day_update_doc['$inc']['hours.' . date('G') . '.u'] = 1; endif; // Unique edition of $inc'ing the right hour

        glue::db()->{video_stats_day}->update(array("day" => new MongoDate(mktime(0, 0, 0, date("m"), date("d"), date("Y"))), "vid" => $this->_id), $day_update_doc, array("upsert" => true)); // Lets send down the query!

        if ($is_unique) { // If this view is unqiue lets place it witin a all views collection so we can check if the view is unqiue later
            $this->unique_views = $this->unique_views + 1;

            glue::db()->video_stats_all->insert(array(
                'sid' => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
                'vid' => $this->_id,
                'ts' => new MongoDate()
            ));
        }
        $this->views = $this->views + 1; // Lets add pre-aggregation to the parent video object for which this function belongs (in reality you would just add 1 onto whatever parent object you have
        $this->save();

        // Now lets do some referers
        $referer = glue::url()->getNormalisedReferer();
        if ($referer) { // We have a referrer who has gone from https://facebook.com/i.php?534986465mmde040 to facebook so we won't fill our table with useless referrers.
            glue::db()->{video_referers}->update(array('video_id' => $this->_id, 'referer' => $referer), array('$inc' => array('c' => 1),
                '$set' => array('ts' => new MongoDate())), array('upsert' => true));
        }
    }
}

Now, let’s write the code to actually get something out of this schema. I have written this with highcharts in mind:

function getStatistics_dateRange($fromTs /* timestamp */, $toTs /* timestamp */)
{
    // Lets set some defaults up for the data to be returned
    $unique_views_range = array();
    $non_unique_views_range = array();

    // These totals make my percentage calcing life easier
    $total_browsers = 0;
    $total_ages = 0;

    $sum_browser = array();
    $sum_ages = array();
    $sum_video_comments = 0;
    $sum_text_comments = 0;
    $sum_video_likes = 0;
    $sum_video_dislikes = 0;

    $sum_males = 0;
    $sum_females = 0;

    // If an invalid time range lets default to 7 days ago
    if ($fromTs > $toTs) {
        $dateFrom = mktime(0, 0, 0, date("m"), date("d") - 7, date("Y"));
        $dateTo = time();
    }

    if ($fromTs < strtotime('-4 days', $toTs)) { // If this is more than 4 days ago then lets aggregate on day

        $newts = $fromTs;
        while ($newts <= $toTs) { // This prefills the graph axis' with the notation we require
            $unique_views_range[$newts] = 0;
            $non_unique_views_range[$newts] = 0;
            $newts = mktime(0, 0, 0, date('m', $newts), date('d', $newts) + 1, date('Y', $newts));
        }

        foreach (glue::db()->{video_stats_day}->find(array(
            "vid" => $this->_id,
            "day" => array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs)),
        )) as $day) { // Lets get our days!
            // Lets pre-aggregate our stuff
            $non_unique_views_range[$day['day']->sec] = $day['hits'] > 0 ? $day['hits'] : 0;
            $unique_views_range[$day['day']->sec] = $day['u_hits'] > 0 ? $day['u_hits'] : 0;

            $sum_browser = summarise_array_row($day['browser'], $sum_browser);
            $sum_ages = summarise_array_row($day['age'], $sum_ages);

            $total_browsers += (int)$day['browser_total'];
            $total_ages += (int)$day['age_total'];

            $sum_video_comments += (int)$day['video_comments'];
            $sum_text_comments += (int)$day['text_comments'];
            $sum_video_likes += (int)$day['video_likes'];
            $sum_video_dislikes += (int)$day['video_dislikes'];

            $sum_males += (int)$day['males'];
            $sum_females += (int)$day['females'];
        }

    } else { // else with less than 4 days lets aggregate on hours of those days

        $newts = $fromTs;
        while ($newts < $toTs) { // Lets pre-fill our graph axis' to the notation we require
            $newts = $newts + (60 * 60);
            $unique_views_range[$newts] = 0;
            $non_unique_views_range[$newts] = 0;
        }

        foreach (glue::db()->{video_stats_day}->find(array(
            "vid" => $this->_id,
            "day" => array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs)),
        )) as $day) { // Lets get the days
            foreach ($day['hours'] as $k => $v) { // Now for each of the hours in those days mark the views as an hour of that day
                $k = $k + 1;
                $non_unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['v'] > 0 ? $v['v'] : 0;
                $unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['u'] > 0 ? $v['u'] : 0;
            }

            // Lets aggregate all the other stats too
            $sum_browser = summarise_array_row($day['browser'], $sum_browser);
            $sum_ages = summarise_array_row($day['age'], $sum_ages);

            $total_browsers += (int)$day['browser_total'];
            $total_ages += (int)$day['age_total'];

            $sum_video_comments += (int)$day['video_comments'];
            $sum_text_comments += (int)$day['text_comments'];
            $sum_video_likes += (int)$day['video_likes'];
            $sum_video_dislikes += (int)$day['video_dislikes'];

            $sum_males += (int)$day['male'];
            $sum_females += (int)$day['female'];
        }
    }

    // Now lets get the browser crap
    $browsers_highCharts_array = array();
    $u_brows_capt = 'Other';
    foreach ($sum_browser as $k => $sum) {
        if ($k == 'ie') {
            $u_brows_capt = "IE";
        } elseif ($k == 'ff') {
            $u_brows_capt = "Firefox";
        } elseif ($k == 'chrome') {
            $u_brows_capt = "Chrome";
        } elseif ($k == 'safari') {
            $u_brows_capt = "Safari";
        } elseif ($k == 'opera') {
            $u_brows_capt = "Opera";
        } elseif ($k == 'netscape') {
            $u_brows_capt = "Netscape";
        }
        $browsers_highCharts_array[] = array($u_brows_capt, ($sum / $total_browsers) * 100);
    }

    // And lets understand our age density
    $ages_highCharts_array = array();
    $u_age_capt = 'Unknown';
    foreach ($sum_ages as $k => $sum) {
        if ($k == '13_16') {
            $u_age_capt = '13-16';
        } elseif ($k == '17_25') {
            $u_age_capt = '17-25';
        } elseif ($k == '26_35') {
            $u_age_capt = '26-35';
        } elseif ($k == '36_50') {
            $u_age_capt = '36-50';
        } elseif ($k == '50_plus') {
            $u_age_capt = '50+';
        }
        $ages_highCharts_array[] = array($u_age_capt, ($sum / $total_ages) * 100);
    }

    if (sizeof($ages_highCharts_array) <= 0) { // Some defaults to stop broken graphs
        $ages_highCharts_array = array(array('None', 100));
    }

    if (sizeof($browsers_highCharts_array) <= 0) { // Some defaults to stop broken graphs
        $browsers_highCharts_array = array(array('None', 100));
    }

    $total_males_females = $sum_males + $sum_females; // Sum of both males and females who visited (vital for pie chart, mm pie)

    // Now lets form the returning array in the format we require!
    return array(
        'hits' => $this->formatHighChart(array(
            "Views" => $non_unique_views_range,
            "Unique Views" => $unique_views_range
        )),
        'browsers' => $browsers_highCharts_array,
        'ages' => $ages_highCharts_array,
        'video_comments' => $sum_video_comments,
        'text_comments' => $sum_text_comments,
        'video_likes' => $sum_video_likes,
        'video_dislikes' => $sum_video_dislikes,
        'males' => $sum_males > 0 ? number_format(($total_males_females / $sum_males) * 100, 0) : 0,
        'females' => $sum_females > 0 ? number_format(($total_males_females / $sum_females) * 100, 0) : 0
    );
}

And there you have it a very basic aggregation example in MongoDB.

If you need more explanation just post a comment!

MongoDB, PHP

MongoDB PHP Session Handler

<?php

use MongoDB\Driver\Query;
use MongoDB\Driver\BulkWrite;
use MongoDB\BSON\UTCDateTime;

class Session
{
    /**
     * @var example: 'db.sessions'
     */
    public $collection = 'db.sessions';
    public $lifeTime;

    private $db;


    /**
     * This houses the actual save handler (session_set_save_handler) which then houses an array telling the sessions how to save (what methods to use)
     * @param $db The DB object you have from your application, in this case a MongoDB\Driver\Manager object
     */
    public function __construct($db)
    {
        // Read the maxlifetime setting from PHP
        $this->life_time = get_cfg_var("session.gc_maxlifetime");

        // Register this object as the session handler
        session_set_save_handler(
            [$this, "open"],
            [$this, "close"],
            [$this, "read"],
            [$this, "write"],
            [$this, "destroy"],
            [$this, "gc"]
        );

        //read in the db object my api uses
        $this->db = $db;
    }

    /**
     * This opens the session for reading or writing
     * @param $save_path
     * @param $session_name
     * @return bool
     */
    public function open($save_path, $session_name)
    {
        global $sess_save_path;
        $sess_save_path = $save_path;
        // Don't need to do anything. Just return TRUE.
        return true;

    }

    /**
     * @return bool
     */
    public function close()
    {
        return true;
    }

    /**
     * This reads all your session data from the table into $_SESSION whilst checking the session has not expires. Also makes a new session if needed.
     * @param $id
     * @return string
     */
    public function read($id)
    {
        // Set empty result
        $data = '';

        // Fetch session data from the selected database
        $time = time();

        $result = $this->db->executeQuery(
            $this->collection,
            new Query(
                [
                    'session_id' => $id,
                    'expires' => ['$gt' => $time]
                ],
                [
                    'limit' => 1
                ]
            )
        )->toArray();

        if ($result) {
            $data = $result[0]->session_data;
        }
        return $data;
    }

    /**
     * Writes all your $_SESSION data to database
     * @param $id
     * @param $data
     * @return integer
     */
    public function write($id, $data)
    {
        // Build query
        $time = time() + $this->life_time;

        $bulk = new BulkWrite();

        if (isset($_SESSION['uid'])) {
            $bulk->update(
                [
                    'session_id' => $id
                ],
                [
                    '$set' => [
                        'user_id' => $_SESSION['uid'],
                        'session_data' => $data,
                        'expires' => $time
                    ]
                ]
            );
        } else {
            $bulk->update(
                [
                    'session_id' => $id
                ],
                [
                    '$set' => [
                        'session_data' => $data,
                        'expires' => $time
                    ]
                ]
            );
        }

        return $this->db->executeBulkWrite($this->collection, $bulk)->getModifiedCount();
    }

    /**
     * This is complex because normally you can just use session_destroy or whatever but due to a bug in PHP which is meant to be fixed you had to logout the user manually. Best to look at my user class for help on this. This is most likely fixed but have never double checked.
     * @param $id
     * @return integer
     */
    public function destroy($id)
    {
        $bulk = new BulkWrite();
        $bulk->delete(['session_id' => $id]);
        return $this->db->executeBulkWrite($this->collection, $bulk)->getDeletedCount();
    }

    /**
     * This will delete any old sessions which should no longer exist. For this to work properly you must set carbage collection within your php.ini to 100% of the time. This is done within the session.gc_probability value in your php.ini, set it to 100 for 100%
     * @return integer
     */
    public function gc()
    {
        $bulk = new BulkWrite();
        $bulk->delete(['expires' => ['$lt' => new UTCDateTime(time() * 1000)]]);
        return $this->db->executeBulkWrite($this->collection, $bulk)->getDeletedCount();
    }
}