MongoDB

MongoDB Paging using Ranged Queries (Avoiding Skip())

This post has been rewritten

Originally I gave only one scenario and only one form of paging the paging was open to loop holes. Also I did not take sharding into account for certain paging scenarios etc.

So what is wrong with skip()?

Nothing. Skip is an awesome and necessary function for 90% of paging cases, however, there are certain times where skip becomes slow and cumbersome for the server to work on.

The first consideration with using skip is to understand exactly what is required to commit a skip of x documents. In order to do a skip MongoDB must effectively do the work server-side that you would normally do client side of reading each record out and going over it until it reaches the right one.

A skip does not make very effective use of indexes so it is mostly pointless to put indexes on a collection to improve skipping speed.

That being said, skip will work wonders on small result sets so you must be aware at all times whether reading this post for your queries is actually micro-optimisation or not. I personally cannot tell you whether you are micro-optimising your queries. Skip deprecation should be done with massive testing prior to reading this post however I personally have found that skip can work effectively into the 100’s of thousands of rows, maybe up to a million.

Believing that avoiding skip will solve all your problems through defacto is like…think of stored procedures, not triggers, or certain crazy querying patterns in SQL. It’s all bullshit.

So I can no longer live with Skip() what are my choices?

Well my furry little friend that all depends. It depends on what application you are making and what your data is like and what your indexes are like, however I am going to outline the solution. RANGED QUERIES. Yep, I said it, ranged queries. The idea of this is to first limit anything before your highest skip with an operator like $gt and $gte and then use limit() to create the limited page you want.

Let’s look at an example. We want to make a Facebook feed for what would normally be the users homepage when they login. To begin with we want to get our first page:

var posts = db.find({}).sort({ ts: -1 }).limit(20)

Here I am using the documents timestamp (ISODate) to sort and range my documents.

Note: the ISODate in MongoDB has a visibility issue in that it can only go down to seconds so if you are likely to get documents the same second then use something else. We will discuss this later.

So now we have the first 20 stories for our feed. When the user scrolls the page, to the bottom, we want to fire off another query which will get our next page. In order to accomplish the range we need to know the last timestamp of the document (story) on the page:

var last_ts = posts[posts.length-1]['ts']['sec'];

With this last timestamp we then make the query that will get the next lot of stories:

var posts = db.find({ ts: {$gt: {last_ts}}}).sort({ ts: -1 }).limit(20)

And bingo! Keep repeating steps 1 and 2 and you will have successful paging for large data sets.

But…what about skipping huge sections of my data?

Now you have come to the gritty nasty bit. If skipping huge amounts of data were that easy don’t you think MongoDB would already do it for you?

To skip huge sections you need to understand that it isn’t just the DB that has to do the work, the application has to do something as well. Just as many sites who deal with log data or other peoples drunk nights out like Cloudflare and Facebook does.

Facebook timelines and many other large paging systems now accommodate a large skip via the interface. Notice how you have that year filter on the right hand side of your new timeline? Well that’s how they do it! They simply narrow down the index via a year you pick, good isn’t it? Simple, clean and more effectively and user friendly than putting in a random page number into the address bar.

Cloudflare, on their statistics page, gives you a huge summary graph allowing you pick out specific ranges using your mouse to highlight parts of the graph. Again this works a lot like Facebooks own, whittling it down to a pre-defined set, or range, of documents.

In fact if you look at 99% of all applications, whether web based or not, that have to deal with huge amounts of data (like Apache logs) you will realise they do not implement normal paging.

So the first and most important thing to understand about how to do large scale paging is that you can no longer solve your problems with just a bit of HTML and a skip(), you need an interface.

Just look at how other programs/sites do it and you will realise how you should do it.

What if I need to pick random records out for paging or just in a normal query?

This is a little off topic but can be used for paging as well.

sometimes people like to pick random records out from their collection. One option is to use skip() like:

$cursor = $collection->find();
$cursor->skip(3);
$cursor->next();
$doc = $cursor->current();

And again, this is good for small skips. But what if you need to do this continuously and at random points?

If you were to do this often and in random ways it might be better to use an incrementing ID combining another collection with findAndModify to produce an accurate document number ( Doc Link ).

This however induces problems, you must maintain this ID especially when deletes occur. One method around this is to mark documents as deleted instead of actually deleting them. When you query for the exact document you omit deletes and limit() by one allowing you to get the next closest document to that position like so:

$cur = $db->collection->find(array('ai_id' => array('$gte' => 403454), 'deleted' => 0))->limit(1);
$cursor->next();
$doc = $cursor->current();

How do I choose my range key?

As I said earlier, a timestamp may not always work for you. Choosing the right cardinality and granularity for your range key is very important. A timestamp is not always unique especially on a Facebook wall. You might have two posts that occur the same second. As such when getting the next page you will actually miss this post. So the better key to use here is _id with a sort on timestamp and a compound index on _id and your timestamp field “ts”.

So what is the best key in your scenario? Again I cannot answer that from this post, however, I will try and give a few points and hints on how to choose a good one. I will also try and throw some scenarios in.

First thing to get a combination THAT WILL be unique to each document. So say you are doing some rated items and you want to sort by specific ratings. You would normally create a a compound index and range and use that as your sort as well, like:

$db->find(array('type' => 'video', 'liked' => 1, 'reaction' => 'love'))->sort(array('type' => 1, 'liked' => 1, 'reaction' => 1))

Of course I do not recommend that schema but this is just an example of how a unique range key would work.

The second thing to consider is your sharding key. This is important because you don’t want to scan all computers for a small range that you might be able to find on one computer, right? So make sure to take your shard key into consideration. For example when dong the Facebook feed over an _id range key and ts sort key I would make a shard key of _id and ts to make my ranging as fast and possible.

The problem that can be experienced here, as stated in the comments below, is that you do not hit the shard key when ranging. This still ok since MongoDB will house index ranges and will still be able to hit specific areas of the collection however it will still force a global operation.

You can find a very nice (as @Matt pointed out) pictographic presentation of how queries work in a sharding environment here: http://www.mongodb.org/download/attachments/2097354/how+queries+work+with+sharding.pdf. This will show you how your queries might react on certain shards and under certain conditions.

Now we have covered the absolute basics of how to page without skip() in MongoDB. I am sure there are more parts I have forgotten to add and will probably update this post as time goes on.

Post a comment if you would like further info or see errors.

MongoDB, PHP

Realtime Stats in MongoDB Example (PHP)

Ok so you’ll remember some time ago I talked about realtime statistics in MongoDB, right?

Well here is a working version. This script basically records the basic statistics on videos.

So this time I am going to skip why and how I made my document structure and just show you my document structure:

	/**
	 * Document structure:
	 *
	 * {
	 *   _id: {},
	 *   hits: 0,
	 *   u_hits: 0,
	 *   hours: {
	 *   	1: {v: 5, u: 0},
	 *   	2: {v: 3, u: 9},
	 *   },
	 *   browser: {
	 *   	chrome: 1,
	 *   	ie: 2
	 *   },
	 *   age: {
	 *   	13_16: 0,
	 *   	17_25: 0
	 *   },
	 *   video_comments: 0,
	 *   text_comments: 0,
	 *   video_likes: 0,
	 *   video_dislikes: 0,
	 *   age_total: 0,
	 *   browser_total: 0,
	 *   male: 0,
	 *   female: 0,
	 *   day: 0
	 * }
	 */

This may look very overwhelming at first but if you just take a minute to sit down a comprehend it you’ll realise it is very simple:

  • Every document works only on a day level
  • _id – The unique _id of the document
  • hits – The amount of hits (recurring) for that day
  • u_hits – The unique hits for that day
  • hours – This is an array of hour values (1-24) with another level of complexity within each hour weighting. The sub array within each hour contains a “v” and a “u”. The v is recurring views and the u is unique views.
  • browser – This is basically an array of browser types with the amount used beside each one. This variable is for unique hits only.
  • age – This denotes the age of logged in users who view this video and it separated into age groups which is then $inc’d for each individual unique visit
  • video_comments – This denotes the amount of video comments for that day
  • text_comments – This denote the amount of text comments for that day
  • video_likes – The amount of likes for that day on the video
  • video_dislikes – The amount of dislikes for that day on the video
  • age_total – This is an aggregation field of all age groups summed up in amount
  • browser_title – Same as above but for browsers
  • male – The amount of males who visited this video
  • female – The amount of females who visited this video
  • day – A timestamp of the zero hour of the day

Now for some of the meaty stuff. Lets make a function which can manipulate this schema.

        // This function would be part of say a bigger class called "Video" is whic the object we are collecting stats about, so if you see $this->blah + 1 or whatever it means I am preaggregating the parent variables to make querying stats much easier on things like user video pages.
	function recordHit(){

		if(!is_search_bot($_SERVER['HTTP_USER_AGENT'])){ // Is the user a search bot? is so we don't want to add them

			$user = glue::session()->user;
			$u_brows_key = get_ua_browser();

	        $u_age_key = 'u'; // Uknown age
	        if($_SESSION['logged']){ // If the user is a logged in user lets get some details
		        ///Other, 13-16, 17-25, 26-35, 36-50, 50+
		        $u_age = isset($user->birth_day) && isset($user->birth_month) && isset($user->birth_year) ?
		        	mktime(0, 0, 0, $user->birth_month, $user->birth_day, $user->birth_year) : 0; // Lets get their birthday
				$u_age_diff = (time() - $u_age)/(60*60*24*365); // Lets find the difference between today and then

				switch(true){ // Lets slap them into the right age group
					case $u_age_diff > 12 && $u_age_diff < 17:
						$u_age_key = '13_16';
						break;
					case $u_age_diff > 17 && $u_age_diff < 26:
						$u_age_key = '17_25';
						break;
					case $u_age_diff > 25 && $u_age_diff < 36:
						$u_age_key = '26_35';
						break;
					case $u_age_diff > 35 && $u_age_diff < 51:
						$u_age_key = '36_50';
						break;
					case $u_age_diff > 50:
						$u_age_key = '50_plus';
						break;
				}
	        }

			$is_unique = glue::db()->{video_stats_all}->find(array(
				"sid" => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
				"vid" => $this->_id
			))->count() <= 0; // Is this view unique?

			$update_doc = array( '$inc' => array( 'hits' => 1 ) ); // Lets always add a recurring hit
			if($is_unique){
				$update_doc['$inc']['u_hits'] = 1; // Lets $inc all the document (atomic ops)
				$update_doc['$inc']['age.'.$u_age_key] = 1;
				$update_doc['$inc']['browser.'.$u_brows_key] = 1;

				// These are used to make my life a little easier
				$update_doc['$inc']['age_total'] = 1;
				$update_doc['$inc']['browser_total'] = 1;

				if(glue::session()->user->gender == 'm'){
					$update_doc['$inc']['male'] = 1;
				}elseif(glue::session()->user->gender == 'f'){
					$update_doc['$inc']['female'] = 1;
				}
			}

			$day_update_doc = $update_doc;
			$day_update_doc['$inc']['hours.'.date('G').'.v'] = 1; // Lets $inc the right hour in the day for this guy
			if($is_unique): $day_update_doc['$inc']['hours.'.date('G').'.u'] = 1; endif; // Unique edition of $inc'ing the right hour

			glue::db()->{video_stats_day}->update(array("day"=>new MongoDate(mktime(0, 0, 0, date("m"), date("d"), date("Y"))), "vid"=>$this->_id), $day_update_doc, array("upsert"=>true)); // Lets send down the query!

			if($is_unique){ // If this view is unqiue lets place it witin a all views collection so we can check if the view is unqiue later
				$this->unique_views = $this->unique_views+1;

				glue::db()->video_stats_all->insert(array(
					'sid' => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
					'vid' => $this->_id,
					'ts' => new MongoDate()
				));
			}
			$this->views = $this->views+1; // Lets add pre-aggregation to the parent video object for which this function belongs (in reality you would just add 1 onto whatever parent object you have
			$this->save();

			// Now lets do some referers
			$referer = glue::url()->getNormalisedReferer();
			if($referer){ // We have a referrer who has gone from https://facebook.com/i.php?534986465mmde040 to facebook so we won't fill our table with useless referrers.
				glue::db()->{video_referers}->update(array('video_id' => $this->_id, 'referer' => $referer), array('$inc' => array('c' => 1),
					'$set' => array('ts' => new MongoDate())), array('upsert' => true));
			}
		}
	}

Now let’s write the code to actually get something out of this schema. I have written this with highcharts in mind. I am not going to say this function is easy and what not because it is a bastard and was one to get it working exactly the way I needed it but here goes.

	function getStatistics_dateRange($fromTs /* timestamp */, $toTs /* timestamp */){

                // Lets set some defaults up for the data to be returned
		$unique_views_range = array();
		$non_unique_views_range = array();

		// These totals make my percentage calcing life easier
		$total_browsers = 0;
		$total_ages = 0;

		$sum_browser = array();
		$sum_ages = array();
		$sum_video_comments = 0;
		$sum_text_comments = 0;
		$sum_video_likes = 0;
		$sum_video_dislikes = 0;

		$sum_males = 0;
		$sum_females = 0;

                // If an invalid time range lets default to 7 days ago
		if($fromTs > $toTs){
			$dateFrom = mktime(0, 0, 0, date("m"), date("d")-7, date("Y"));
			$dateTo = time();
		}

		if($fromTs < strtotime('-4 days', $toTs)){ // If this is more than 4 days ago then lets aggregate on day

		  	$newts = $fromTs;
			while ($newts <= $toTs) { // This prefills the graph axis' with the notation we require
				$unique_views_range[$newts] = 0;
				$non_unique_views_range[$newts] = 0;
				$newts = mktime(0,0,0, date('m', $newts), date('d', $newts)+1, date('Y', $newts));
			}

			foreach(glue::db()->{video_stats_day}->find(array(
				"vid"=>$this->_id,
				"day"=> array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs) ),
			)) as $day){ // Lets get our days!
				// Lets pre-aggregate our stuff
				$non_unique_views_range[$day['day']->sec] = $day['hits'] > 0 ? $day['hits'] : 0;
				$unique_views_range[$day['day']->sec] = $day['u_hits'] > 0 ? $day['u_hits'] : 0;

				$sum_browser = summarise_array_row($day['browser'], $sum_browser);
				$sum_ages = summarise_array_row($day['age'], $sum_ages);

				$total_browsers += (int)$day['browser_total'];
				$total_ages += (int)$day['age_total'];

				$sum_video_comments += (int)$day['video_comments'];
				$sum_text_comments += (int)$day['text_comments'];
				$sum_video_likes += (int)$day['video_likes'];
				$sum_video_dislikes += (int)$day['video_dislikes'];

				$sum_males += (int)$day['males'];
				$sum_females += (int)$day['females'];
			}

		}else{ // else with less than 4 days lets aggregate on hours of those days

			$newts = $fromTs;
			while($newts < $toTs){ // Lets pre-fill our graph axis' to the notation we require
				$newts = $newts+(60*60);
				$unique_views_range[$newts] = 0;
				$non_unique_views_range[$newts] = 0;
			}

			foreach(glue::db()->{video_stats_day}->find(array(
				"vid"=>$this->_id,
				"day"=> array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs) ),
			)) as $day){ // Lets get the days
				foreach($day['hours'] as $k => $v){ // Now for each of the hours in those days mark the views as an hour of that day
					$k = $k+1;
					$non_unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['v'] > 0 ? $v['v'] : 0;
					$unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['u'] > 0 ? $v['u'] : 0;
				}

                                // Lets aggregate all the other stats too
				$sum_browser = summarise_array_row($day['browser'], $sum_browser);
				$sum_ages = summarise_array_row($day['age'], $sum_ages);

				$total_browsers += (int)$day['browser_total'];
				$total_ages += (int)$day['age_total'];

				$sum_video_comments += (int)$day['video_comments'];
				$sum_text_comments += (int)$day['text_comments'];
				$sum_video_likes += (int)$day['video_likes'];
				$sum_video_dislikes += (int)$day['video_dislikes'];

				$sum_males += (int)$day['male'];
				$sum_females += (int)$day['female'];
			}
		}

		// Now lets get the browser crap
		$browsers_highCharts_array = array();
		$u_brows_capt = 'Other';
		foreach($sum_browser as $k => $sum){
			if($k =='ie'){
				$u_brows_capt = "IE";
			}elseif($k == 'ff'){
				$u_brows_capt = "Firefox";
			}elseif($k == 'chrome'){
				$u_brows_capt = "Chrome";
			}elseif($k == 'safari'){
				$u_brows_capt = "Safari";
			}elseif($k == 'opera'){
				$u_brows_capt = "Opera";
			}elseif($k == 'netscape'){
				$u_brows_capt = "Netscape";
			}
			$browsers_highCharts_array[] = array($u_brows_capt, ($sum/$total_browsers)*100);
		}

                // And lets understand our age density
		$ages_highCharts_array = array();
		$u_age_capt = 'Unknown';
		foreach($sum_ages as $k => $sum){
			if($k == '13_16'){
				$u_age_capt = '13-16';
			}elseif($k == '17_25'){
				$u_age_capt = '17-25';
			}elseif($k == '26_35'){
				$u_age_capt = '26-35';
			}elseif($k == '36_50'){
				$u_age_capt = '36-50';
			}elseif($k == '50_plus'){
				$u_age_capt = '50+';
			}
			$ages_highCharts_array[] = array($u_age_capt, ($sum/$total_ages)*100);
		}

		if(sizeof($ages_highCharts_array) <= 0){ // Some defaults to stop broken graphs
			$ages_highCharts_array = array(array('None', 100));
		}

		if(sizeof($browsers_highCharts_array) <= 0){ // Some defaults to stop broken graphs
			$browsers_highCharts_array = array(array('None', 100));
		}

		$total_males_females = $sum_males+$sum_females; // Sum of both males and females who visited (vital for pie chart, mm pie)

                // Now lets form the returning array in the format we require!
		return array(
			'hits' => $this->formatHighChart(array(
				"Views"=>$non_unique_views_range,
				"Unique Views"=>$unique_views_range
			)),
			'browsers' => $browsers_highCharts_array,
			'ages' => $ages_highCharts_array,
			'video_comments' => $sum_video_comments,
			'text_comments' => $sum_text_comments,
			'video_likes' => $sum_video_likes,
			'video_dislikes' => $sum_video_dislikes,
			'males' => $sum_males > 0 ? number_format(($total_males_females/$sum_males)*100, 0) : 0,
			'females' => $sum_females > 0 ? number_format(($total_males_females/$sum_females)*100, 0) : 0
		);
	}

And there you have it a very basic aggregation example in MongoDB.

If you have any questions or need clarification or notice errors etc just post a comment below.

Enjoy and happy MongoDB’ing

MongoDB, Web Development

MongoDB Single Key Index or Compound Index

Just got a question on the MongoDB user group as which type of index over multiple fields would be best and why. The user basically had three fields:

  • Type
  • CompanyID
  • Date

And was wondering what the difference would be to using a compound index on all three fields and using a single key index per field. The main difference would be that MongoDB only supports ONE index per query so the query optimiser would only be able to use one of your three indexes rendering the other 2 indexes useless and dead weight in the DB slowing inserts and using valuable indexing space.

As such with that in mind it is of course best to use a compound index to cover all three fields rather than a index on each field.

Also note that queries will be parsed and optimised better if you index in the order you query so for example:

find({ type: '', companyID: {} }).sort({ Date: -1 })

Would mean a version of an optimal index would be of:

ensureIndex({ type: 1, companyId: 1, Date: 1 })

Hope this helps,

MongoDB

Realtime Statistics in MongoDB

There is now an example of this sort of aggregation here: Realtime Stats in MongoDB Example (PHP)

As I troll the MongoDB user group (mongodb-user) I notice a LOT of questions about realtime statisitics handling in Mongo. As I have repeated myself about 100 times over I will write a blog post about it.

Many individuals try and treat Mongo like SQL and say “Ok how do I do big massive aggregated JOINs between massive tables?” The answer, my friends, is that you don’t. Yes that’s right Mongo of course has no JOINs so there is no real way to do aggregated JOINs across many pieces of data, instead Mongo has two methods:

  • Pre-aggregated
  • Map Reduce

As a disclaimer I personally do not advertise the use of MRs at all. I don’t like them to be honest with you so I will not be talking anymore about MRs (I will do some MR articles another day but not right now).

The method which instead I will discuss is pre-aggregation. As we all know when picking data straight from a large collection MongoDB can tend to be much faster than SQL it is specifically designed to be when used this way and is very fast at doing 10’s of queries a second, so why try to use it like a RDB and shove as much as possible into one single call (or god forbid should you fall into the trap of trying to use…stored procedures)?

Lets start with an example of pre-aggregation. Imagine I have a video site with a load of videos on. I want to record all hits on videos within certain time ranges but I also want to store repeated views and unique views so as to understand how many returning viewers a user has over a period of time and how many unique viewers they drew in. I also would like to pre-aggregate stuff like browser type and referrer and find the best referrers to that video over a period of time.

Now this period of time I keep talking about could exist in the form of:

  • all time
  • day
  • month
  • Year

I decide these “time buckets” by saying how I would like to query, the words in brackets explain the time bucket that can handle these queries (I’ll explain that part later):

  • I would like to get hourly stats (all time)
  • I would like to get daily stats (all time)
  • I would like to get stats over days, possibly weeks (day)
  • I would like to get stats over months (month)
  • I would like to get stats over years (Year)

Now Lets think about the collections I would need. To make querying as easy as possible I will break down the time buckets into collections like so:

  • stats.all_time
  • stats.day
  • stats.month
  • stats.year

Note: I have combined the unique and recurring views collections using a “is_unique” boolean field in the document. If your collections grow upto trillions of rows you can split the two.

Ok so you got your time bucket collections in what the time sections you would to query on.

Now lets think about what we would like to store:

  • Unique hit
  • Recurring hit
  • Browser total

So here is an example document I thought up:

{
  _id: {},
  video: {},
  session_id: {},
  hits: 0,
  unique_hit: true // This denotes of the hit is unique, if false then it is recurring
  browsers: { // This array holds the browsers seen over the period of time, we will $inc this according to what we see
    chrome: 0,
    ie: 0,
    firefox: 0,
    other: 0
  },
  ts: number() // This holds a UNIX TIMESTAMP of the time step of the document
}

So we have our collections and a sample of our document. Lets mark updates.

So a new user comes in and gets assigned a session_id for the lifetime of their browser session (cookie lifetime of 0). We need to find out if this session_id has been here before so lets search for this video and session_id in the all_time collections:

var res = db.stats.all_time.find({ video: 2394785, session_id: 34857 });

if(res){ // This user is not unique so we update the not unique documents for day, month and year
// We upsert the documents just incase they are not already there.
// For the purpose of this test we will say they are using the chrome browser to access our lovely pages.

     db.stats.day.update({unique_hit: false, ts: //ts of 0 hour of day}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.month.update({unique_hit: false, ts: //ts of 0 day of month}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.year.update({unique_hit: false, ts: //ts of year only}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
}else{

     // Insert a new row into the all time so that we cna undersand if this is unique or not next time.
     db.stats.all_time.insert({ session_id: session_id, video: 2394785, ts: new Date() });

     db.stats.day.update({unique_hit: true, ts: //ts of 0 hour of day}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.month.update({unique_hit: true, ts: //ts of 0 day of month}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
     db.stats.year.update({unique_hit: true, ts: //ts of year only}, {$inc: {hits: 1, chrome: 1}}, {upsert: true});
}

And there ypou have it! You can now track a hits on a video and browser types, so for example you wish to show (on a graph) all hits on a video for the last 7 days:

db.stats.day.find({ video: 2394785, ts: {$lt: new Date(), $gt: new Date(//ts of 7 days ago)} })

I know what many will say, “urgh but this means I gotta de-normalise and that’s bad!!!1111” What’s so different in SQL? Yes the query would be one with aggregation etc but then because the query would be so heavy on your server you’ll find that actually you’ll be using memcached or big table designs to actually make it go fast anyway and memcached ain’t realtime bro. So quit your whining and get on it!

Now we got that sorted you will notice that I missed out referrer, why don’t you try this yourself?

If you need anymore explanations etc just post a comment!

Happy coding,

PHP

Mongo Nosql Session Handler (PHP)

This is the session handler I use in my websites which require nosql support. It work pretty much the same as a normal MySQL session handler.

I should probably explain some of the Mongo functions will look a little weird (ensureIndex()), do not worry I’m not using a different version of Mongo I just use an abstraction layer I made for my MVC Framework. The key functions such as the actual manipulation commands are the same as normal Mongo.

I also never delete my sessions there is a cronjob that does this for me all I do is set them inactive. There is a reason why I never delete a single session through this script. The main reason is session resurrection. To be able to use sessions as a reliable storage medium and have quickly depleating session times is to not do the conventional act of deleting a session but instead just mark it inactive until it is used again.

<?php

/*
 * This handles the user session and also stores information about the user that other users might need to see.
 */

class session {
	protected $mongo_db;
	
   	private $life_time;
   	private $session;

   function session($mongo) {
   	  
   	/*
   	 * Lets do some operations on our db to ensure its all setup
   	 */
   	  $this->mongo_db = $mongo->getCollection("sessions");
      $this->mongo_db->ensureIndex("session_id", array('session_id' => 1));
      
      // Read the maxlifetime setting from PHP
      $this->life_time = get_cfg_var("session.gc_maxlifetime");

      // Register this object as the session handler
      session_set_save_handler( 
        array( &$this, "open" ), 
        array( &$this, "close" ),
        array( &$this, "read" ),
        array( &$this, "write"),
        array( &$this, "destroy"),
        array( &$this, "gc" )
      );
	}

   function open( $save_path, $session_name ) {

      global $sess_save_path;

      $sess_save_path = $save_path;

      // Don't need to do anything. Just return TRUE.
      return true;

   }

   function close() {

      return true;

   }

   function read( $id ) {

      // Set empty result
      $data = '';

      // Fetch session data from the selected database
      $time = time();
      
      $result = $this->mongo_db->findOne(array("session_id" => $id));
		
	  if ($result) {
	  	$this->session = $result;
       	$data = $result['session_data'];
      }
      
      return $data;

   }

   function write( $id, $data ) {

      //Write details to session table             
      $time = time() + $this->life_time;
      $uid = $_SESSION['logged'] ? $_SESSION['uid'] : 0;
      
      if((array)$this->session){
      	$object = $this->session;
      	
      	$object["session_id"] = $id;
      	$object["user_id"] = $uid;
      	$object["session_data"] = $data;
      	$object["active"] = 1;
      	$object["expires"] = $time;
      }else{
      	$object = array("session_id" => $id, "user_id" => $_SESSION['uid'], "session_data" => $data, "active" => 1, "expires" => $time);
      }

      $this->mongo_db->save($object);
      
      return true;
   }

   function destroy( $id ) {
      
   	/*
   	 * This will remove the session from db when the user requests to logout
   	 */
   	
     $this->mongo_db->remove(array("session_id" => $id), true);
      
     return true;
   }

	function gc() {

		/*
		 * Garbage collection will remove all old and deceased sessions
		 */
		$time = time();
		
		/*
		 * XXX @todo: Make a script that can check whether the session expiry has passed nth time and delete it if it has 
		 */
		$this->mongo_db->update(array("expires" => array(':lt' => $time)), array(":set" => array("active" => 0)));
      
    	return true;

   	}

}

PHP

Mongo Abstraction layer concept

This page is very, very, old. I have since made a MongoDB Active Record abstraction

Mongo_db.php

<?php

/*
 * This is the main controller class for mongo db.
 * 
 * From here everything happens. Always reference this class when using the database. Try not to reference other classes as it will
 * just break this script.
 */

require_once("mongo_collection.php");

class mongo_db{
	
	protected $connection_string;
	protected $persistent;
	protected $autoConnect;
	protected $db;
	
	private $_mongo;
	private $_collections = array();
	
	public function __get($key){
		return $this->$key;
	}
	
	public function __set($key, $val){
		$this->$key = $val;
	}
	
	function __destruct(){
		$this->close();
	}
	
	function __construct($connection_string = 'mongodb://localhost:27017', $db = 'the_stage', $persistent = true, $autoConnect = true){
		
		$this->connection_string = $connection_string;
		$this->persistent = $persistent;
		$this->autoConnect = $autoConnect;
		$this->db = $db;
		
		if(!$this->_mongo){
			$this->instantiate_db();
		}
		
	}
	
	public function instantiate_db(){
		try{
			$this->_mongo = new Mongo($this->connection_string, array("connect" => $this->autoConnect, "persist" => $this->persistent));
			if(!$this->autoConnect){
				$this->_mongo->connect();
			}
		}catch(MongoConnectionException $e){
			//throw new error
		}
	}
	
	public function connect(){
		if(empty($this->_mongo)){
			throw new MongoConnectionException();
		}
		$this->_mongo->connect();
	}
	
	public function drop(){
		if(empty($this->db)){
			//throw new exception	
		}
		
		$this->_mongo->drop($this->db);
	}
	
	public function getCollection($collectionName, $model_class = null){
		if(!isset($this->_collections[$collectionName])){
			if(empty($this->db)){
				//throw new error
			}	
			$collection = $this->getDb()->selectCollection($collectionName);
			$this->_collections[$collectionName] = new mongo_collection($collectionName, $collection, $model_class);
		}
			
		return $this->_collections[$collectionName];  
	}
	
	public function getDb(){
		return $this->_mongo->selectDB($this->db);
	}
	
	public function command($data){
		return $this->_mongo->command($data);
	}
	
	public function close(){
		if($this->_mongo){
			if(!$this->persistent){
				$this->_mongo->close();
			}
		}		
	}
}

mongo_collection.php

<?php

/*
 * This is where the action really happens. This allows the user to control collections in their database using various functions.
 * 
 * A model class can be added which will form an automatic object array of the results of any find/findOne.
 * If no class is detected it will just return the result for use manually.
 */

class mongo_collection{
	
	protected $model_class;
	
	private $_collection_name;
	private $_ensured_indexes = array();
	private $_collection;
	
	public function __get($key){
		return $this->$key;
	}
	
	public function __set($key, $val){
		$this->$key = $val;
	}	
	
	function __construct($collectionName, MongoCollection $collection, $model_class){
		
		$this->_collection_name = $collectionName;
		$this->model_class = $model_class;
		$this->_collection = $collection;
		
		if(isset($this->model_class)){
			$this->include_model();
		}
	}
	
	public function include_model(){
		require_once("document_models/".$this->model_class.".php");
	}
	
	public function ensureIndex($name, $fields, $unique = true){
		
		$constraint = array("unique" => $unique);
		$_ensured_indexes[$name] = array($fields, $constraint);
		
		$this->collection->ensureIndex($fields, $constraint);
		
	}	
	
	public function findOne($query = null){
		$result = $this->_collection->findOne($query);
		
		if((empty($result)) || (!isset($result))){
			throw new Exception();	
		}elseif($this->model_class == null){
			return $result;
		}
		
		return new $model_class($result);
	}
	
	public function find($query){
	
	}
	
	public function batchInsert($object){
		
	}
	
	public function removeById($id){
		
	}
	
	public function remove($criteria, $justone = false){
		
	}
}