MongoDB, PHP

Realtime Stats in MongoDB Example (PHP)

Ok so you’ll remember some time ago I talked about realtime statistics in MongoDB, right?

Well here is a working version. This script basically records the basic statistics on videos.

So this time I am going to skip why and how I made my document structure and just show you my document structure:

	/**
	 * Document structure:
	 *
	 * {
	 *   _id: {},
	 *   hits: 0,
	 *   u_hits: 0,
	 *   hours: {
	 *   	1: {v: 5, u: 0},
	 *   	2: {v: 3, u: 9},
	 *   },
	 *   browser: {
	 *   	chrome: 1,
	 *   	ie: 2
	 *   },
	 *   age: {
	 *   	13_16: 0,
	 *   	17_25: 0
	 *   },
	 *   video_comments: 0,
	 *   text_comments: 0,
	 *   video_likes: 0,
	 *   video_dislikes: 0,
	 *   age_total: 0,
	 *   browser_total: 0,
	 *   male: 0,
	 *   female: 0,
	 *   day: 0
	 * }
	 */

This may look very overwhelming at first but if you just take a minute to sit down a comprehend it you’ll realise it is very simple:

  • Every document works only on a day level
  • _id – The unique _id of the document
  • hits – The amount of hits (recurring) for that day
  • u_hits – The unique hits for that day
  • hours – This is an array of hour values (1-24) with another level of complexity within each hour weighting. The sub array within each hour contains a “v” and a “u”. The v is recurring views and the u is unique views.
  • browser – This is basically an array of browser types with the amount used beside each one. This variable is for unique hits only.
  • age – This denotes the age of logged in users who view this video and it separated into age groups which is then $inc’d for each individual unique visit
  • video_comments – This denotes the amount of video comments for that day
  • text_comments – This denote the amount of text comments for that day
  • video_likes – The amount of likes for that day on the video
  • video_dislikes – The amount of dislikes for that day on the video
  • age_total – This is an aggregation field of all age groups summed up in amount
  • browser_title – Same as above but for browsers
  • male – The amount of males who visited this video
  • female – The amount of females who visited this video
  • day – A timestamp of the zero hour of the day

Now for some of the meaty stuff. Lets make a function which can manipulate this schema.

        // This function would be part of say a bigger class called "Video" is whic the object we are collecting stats about, so if you see $this->blah + 1 or whatever it means I am preaggregating the parent variables to make querying stats much easier on things like user video pages.
	function recordHit(){

		if(!is_search_bot($_SERVER['HTTP_USER_AGENT'])){ // Is the user a search bot? is so we don't want to add them

			$user = glue::session()->user;
			$u_brows_key = get_ua_browser();

	        $u_age_key = 'u'; // Uknown age
	        if($_SESSION['logged']){ // If the user is a logged in user lets get some details
		        ///Other, 13-16, 17-25, 26-35, 36-50, 50+
		        $u_age = isset($user->birth_day) && isset($user->birth_month) && isset($user->birth_year) ?
		        	mktime(0, 0, 0, $user->birth_month, $user->birth_day, $user->birth_year) : 0; // Lets get their birthday
				$u_age_diff = (time() - $u_age)/(60*60*24*365); // Lets find the difference between today and then

				switch(true){ // Lets slap them into the right age group
					case $u_age_diff > 12 && $u_age_diff < 17:
						$u_age_key = '13_16';
						break;
					case $u_age_diff > 17 && $u_age_diff < 26:
						$u_age_key = '17_25';
						break;
					case $u_age_diff > 25 && $u_age_diff < 36:
						$u_age_key = '26_35';
						break;
					case $u_age_diff > 35 && $u_age_diff < 51:
						$u_age_key = '36_50';
						break;
					case $u_age_diff > 50:
						$u_age_key = '50_plus';
						break;
				}
	        }

			$is_unique = glue::db()->{video_stats_all}->find(array(
				"sid" => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
				"vid" => $this->_id
			))->count() <= 0; // Is this view unique?

			$update_doc = array( '$inc' => array( 'hits' => 1 ) ); // Lets always add a recurring hit
			if($is_unique){
				$update_doc['$inc']['u_hits'] = 1; // Lets $inc all the document (atomic ops)
				$update_doc['$inc']['age.'.$u_age_key] = 1;
				$update_doc['$inc']['browser.'.$u_brows_key] = 1;

				// These are used to make my life a little easier
				$update_doc['$inc']['age_total'] = 1;
				$update_doc['$inc']['browser_total'] = 1;

				if(glue::session()->user->gender == 'm'){
					$update_doc['$inc']['male'] = 1;
				}elseif(glue::session()->user->gender == 'f'){
					$update_doc['$inc']['female'] = 1;
				}
			}

			$day_update_doc = $update_doc;
			$day_update_doc['$inc']['hours.'.date('G').'.v'] = 1; // Lets $inc the right hour in the day for this guy
			if($is_unique): $day_update_doc['$inc']['hours.'.date('G').'.u'] = 1; endif; // Unique edition of $inc'ing the right hour

			glue::db()->{video_stats_day}->update(array("day"=>new MongoDate(mktime(0, 0, 0, date("m"), date("d"), date("Y"))), "vid"=>$this->_id), $day_update_doc, array("upsert"=>true)); // Lets send down the query!

			if($is_unique){ // If this view is unqiue lets place it witin a all views collection so we can check if the view is unqiue later
				$this->unique_views = $this->unique_views+1;

				glue::db()->video_stats_all->insert(array(
					'sid' => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
					'vid' => $this->_id,
					'ts' => new MongoDate()
				));
			}
			$this->views = $this->views+1; // Lets add pre-aggregation to the parent video object for which this function belongs (in reality you would just add 1 onto whatever parent object you have
			$this->save();

			// Now lets do some referers
			$referer = glue::url()->getNormalisedReferer();
			if($referer){ // We have a referrer who has gone from https://facebook.com/i.php?534986465mmde040 to facebook so we won't fill our table with useless referrers.
				glue::db()->{video_referers}->update(array('video_id' => $this->_id, 'referer' => $referer), array('$inc' => array('c' => 1),
					'$set' => array('ts' => new MongoDate())), array('upsert' => true));
			}
		}
	}

Now let’s write the code to actually get something out of this schema. I have written this with highcharts in mind. I am not going to say this function is easy and what not because it is a bastard and was one to get it working exactly the way I needed it but here goes.

	function getStatistics_dateRange($fromTs /* timestamp */, $toTs /* timestamp */){

                // Lets set some defaults up for the data to be returned
		$unique_views_range = array();
		$non_unique_views_range = array();

		// These totals make my percentage calcing life easier
		$total_browsers = 0;
		$total_ages = 0;

		$sum_browser = array();
		$sum_ages = array();
		$sum_video_comments = 0;
		$sum_text_comments = 0;
		$sum_video_likes = 0;
		$sum_video_dislikes = 0;

		$sum_males = 0;
		$sum_females = 0;

                // If an invalid time range lets default to 7 days ago
		if($fromTs > $toTs){
			$dateFrom = mktime(0, 0, 0, date("m"), date("d")-7, date("Y"));
			$dateTo = time();
		}

		if($fromTs < strtotime('-4 days', $toTs)){ // If this is more than 4 days ago then lets aggregate on day

		  	$newts = $fromTs;
			while ($newts <= $toTs) { // This prefills the graph axis' with the notation we require
				$unique_views_range[$newts] = 0;
				$non_unique_views_range[$newts] = 0;
				$newts = mktime(0,0,0, date('m', $newts), date('d', $newts)+1, date('Y', $newts));
			}

			foreach(glue::db()->{video_stats_day}->find(array(
				"vid"=>$this->_id,
				"day"=> array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs) ),
			)) as $day){ // Lets get our days!
				// Lets pre-aggregate our stuff
				$non_unique_views_range[$day['day']->sec] = $day['hits'] > 0 ? $day['hits'] : 0;
				$unique_views_range[$day['day']->sec] = $day['u_hits'] > 0 ? $day['u_hits'] : 0;

				$sum_browser = summarise_array_row($day['browser'], $sum_browser);
				$sum_ages = summarise_array_row($day['age'], $sum_ages);

				$total_browsers += (int)$day['browser_total'];
				$total_ages += (int)$day['age_total'];

				$sum_video_comments += (int)$day['video_comments'];
				$sum_text_comments += (int)$day['text_comments'];
				$sum_video_likes += (int)$day['video_likes'];
				$sum_video_dislikes += (int)$day['video_dislikes'];

				$sum_males += (int)$day['males'];
				$sum_females += (int)$day['females'];
			}

		}else{ // else with less than 4 days lets aggregate on hours of those days

			$newts = $fromTs;
			while($newts < $toTs){ // Lets pre-fill our graph axis' to the notation we require
				$newts = $newts+(60*60);
				$unique_views_range[$newts] = 0;
				$non_unique_views_range[$newts] = 0;
			}

			foreach(glue::db()->{video_stats_day}->find(array(
				"vid"=>$this->_id,
				"day"=> array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs) ),
			)) as $day){ // Lets get the days
				foreach($day['hours'] as $k => $v){ // Now for each of the hours in those days mark the views as an hour of that day
					$k = $k+1;
					$non_unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['v'] > 0 ? $v['v'] : 0;
					$unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['u'] > 0 ? $v['u'] : 0;
				}

                                // Lets aggregate all the other stats too
				$sum_browser = summarise_array_row($day['browser'], $sum_browser);
				$sum_ages = summarise_array_row($day['age'], $sum_ages);

				$total_browsers += (int)$day['browser_total'];
				$total_ages += (int)$day['age_total'];

				$sum_video_comments += (int)$day['video_comments'];
				$sum_text_comments += (int)$day['text_comments'];
				$sum_video_likes += (int)$day['video_likes'];
				$sum_video_dislikes += (int)$day['video_dislikes'];

				$sum_males += (int)$day['male'];
				$sum_females += (int)$day['female'];
			}
		}

		// Now lets get the browser crap
		$browsers_highCharts_array = array();
		$u_brows_capt = 'Other';
		foreach($sum_browser as $k => $sum){
			if($k =='ie'){
				$u_brows_capt = "IE";
			}elseif($k == 'ff'){
				$u_brows_capt = "Firefox";
			}elseif($k == 'chrome'){
				$u_brows_capt = "Chrome";
			}elseif($k == 'safari'){
				$u_brows_capt = "Safari";
			}elseif($k == 'opera'){
				$u_brows_capt = "Opera";
			}elseif($k == 'netscape'){
				$u_brows_capt = "Netscape";
			}
			$browsers_highCharts_array[] = array($u_brows_capt, ($sum/$total_browsers)*100);
		}

                // And lets understand our age density
		$ages_highCharts_array = array();
		$u_age_capt = 'Unknown';
		foreach($sum_ages as $k => $sum){
			if($k == '13_16'){
				$u_age_capt = '13-16';
			}elseif($k == '17_25'){
				$u_age_capt = '17-25';
			}elseif($k == '26_35'){
				$u_age_capt = '26-35';
			}elseif($k == '36_50'){
				$u_age_capt = '36-50';
			}elseif($k == '50_plus'){
				$u_age_capt = '50+';
			}
			$ages_highCharts_array[] = array($u_age_capt, ($sum/$total_ages)*100);
		}

		if(sizeof($ages_highCharts_array) <= 0){ // Some defaults to stop broken graphs
			$ages_highCharts_array = array(array('None', 100));
		}

		if(sizeof($browsers_highCharts_array) <= 0){ // Some defaults to stop broken graphs
			$browsers_highCharts_array = array(array('None', 100));
		}

		$total_males_females = $sum_males+$sum_females; // Sum of both males and females who visited (vital for pie chart, mm pie)

                // Now lets form the returning array in the format we require!
		return array(
			'hits' => $this->formatHighChart(array(
				"Views"=>$non_unique_views_range,
				"Unique Views"=>$unique_views_range
			)),
			'browsers' => $browsers_highCharts_array,
			'ages' => $ages_highCharts_array,
			'video_comments' => $sum_video_comments,
			'text_comments' => $sum_text_comments,
			'video_likes' => $sum_video_likes,
			'video_dislikes' => $sum_video_dislikes,
			'males' => $sum_males > 0 ? number_format(($total_males_females/$sum_males)*100, 0) : 0,
			'females' => $sum_females > 0 ? number_format(($total_males_females/$sum_females)*100, 0) : 0
		);
	}

And there you have it a very basic aggregation example in MongoDB.

If you have any questions or need clarification or notice errors etc just post a comment below.

Enjoy and happy MongoDB’ing

Advertisements

One thought on “Realtime Stats in MongoDB Example (PHP)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s