SEO Keyword Density

Okay, so recently I was posed with the question about SEO keyword density. Basically I was tasked with adding a feature to the administration section of the company I work for to check that the description of a product had a certain keyword density that mattered to search engines. As reference I was given the Yoast wordpress plugin, whereby it actually gives a keyword density checker.

I was sceptical about exactly why keyword density was so important rather than placement of the keywords, having seen many “myths” in SEO.

What do I mean by placement over density? Placement could be class as putting the keywords in the right places like in the meta, page title, header tags and content in a natural manner.

So I decided to research this a little. I soon came across: http://www.highervisibility.com/blog/what-is-the-proper-keyword-density-for-seo/ which not only provides a meta type discussion between numerous SEO experts but also provides a short but descriptive abstract from Matt Cutts:

“the first time you mention a word, you know, ‘Hey, that’s pretty interesting. It’s about that word.’ The next time you mention that word, ‘oh, OK. It’s still about that word.’ And once you start to mention it a whole lot, it really doesn’t help that much more. There’s diminishing returns. It’s just an incremental benefit, but it’s really not that large.”

So already Matt states that really once you start going crazy with those words and trying to cram them in they become almost useless.

What is also more interesting is that one of the consulted SEO experts actually did some research to find out the optimum keyword density for Google and other search engines varey:

He used pictures from gorank.com to determine that Yahoo recommends a keyword density of about 3% while Google seems to like sites that have a 1-2% keyword density. Below is an example of the chart he used to form this opinion:

With this in mind assuming one single density for all could be dangerous, what if the search engine thinks you are keyword stuffing? This used to be a common (and can still be) problem whereby scammers and hackers would keyword stuff to ensure they get fake sites to the top. So you can imagine that if a search engine thinks you are keyword stuffing they will actually give you a penalty possibly?

In fact all of the consulted SEO experts seem to agree that keyword density:

  • Is not a fixed calculatable number
  • Is not a big problem to most sites
  • Could be premature optimisation for a page (if you are a programmer you will understand this one)

So with all this in mind I decided to recommend, with evidence, that maybe we should rethink our SEO tactics and not fall into this common myth trap.

MongoYii – A Yii MongoDB ORM

I had more spare time recently and decided to make an ORM for Yii.

It supports MongoDB and attempts to conform to Yiis own standards as much as possible; it is named MongoYii.

I won’t explain much more here since the repository has a pretty comprehensive readme that will describe the project fully to you:

https://github.com/Sammaye/MongoYii

Debug MongoDB Map Reduce with the Mongo MapReduce Web browser

I got wind of a very nice new tool out there recently.

It is called the Mongo MapReduce WebBrowser and can its Github repository can be found here: https://github.com/angelozerr/mongo-mapreduce-webbrowser.

Why is this tool so great? It actually allows you to debug your custom-made map reduces within a JQuery environment in your own browser.

Now I have only been playing with this tool for a while now I have only made some basic tests with all of them being done on Firefox. Even though this program looks like it is a bit bare with documentation in the readme it in fact isn’t. This program is so simple that the couple of screenshots provided was all I needed to start debugging my map reduce. So already this program is simple and easy to learn.

Compatibility

I have tested this program with:

  • Opera
  • Safari
  • Firefox
  • Internet Explorer

I have found that in versions of Opera just before the latest (maybe 6 months ago) you just get a white screen saying “ACTIVATING…” and nothing else, updating Opera solved this problem.

On Internet Explorer I cannot seem to find the dynamic script entry within the scripts tab for the script but it wasn’t there. I believe this is because IE does not reload this tab for dynamically added scripts, so I would stay away from using IE with this program for the time being; things may have changed on IE 10 and it may be up to scratch with the other browsers but I wouldn’t bet on it (IE, the most popular browser to get another browser).

On Safari, a note about it on Windows, it contains no developer tools by default! This means that actually this program, to no fault of its own, it useless on this browser unless you download a Firebug type program. I did not do this since I don’t like Apple made products so I just quickly tested and left it alone.

Apart from these few problems the program works a treat in all other conditions, however, my recommendation of browsers to use for this program are:

  • Chrome (latest)
  • Firefox (latest)

So with the compatibility aside lets look at actually using this, shall we?

Usage

I have found out the online version is not great for testing this program. It does not seem to refresh the input document of the Map Reduce on each run.

When downloading this program yourself there is no documentation or pointer for non-Java individuals on how to get this running. This means that you will need to understand something about how to get this running for the time being.

Unfortunately if you intend to get this program running on the small knowledge you have of Java by installing a JRE and then accessing the StartServer.java file (which starts a jetty server) you will be hit with:

$ java StartServer.java   Exception in thread "main" java.lang.NoClassDefFoundError: StartServer/java
Caused by: java.lang.ClassNotFoundException: StartServer.java
        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: StartServer.java. Program will exit.

I tried to be a sneaky little bastard about this and actually access the webapp directly from a browser. It kind of worked however without the jetty server running it gave a 404 error about the jaxrs folder.

My verdict on this is that it is new. However, I do see the need for a Java jetty server as a downside. The program really needs to be independent and instantly integratable with almost any server, technically it runs its entirety in JavaScript (JQuery to be more precise) so it should be relatively easily to port this program to work without the need for starting a Java server.

That being said the author of this program has intentions of making the program bieng able to dynamically save map reduces to MongoDB and other such integrations which means that in time the web server will undoubtly be required. I am personally used to using programs such as Solr so this is no biggy for me :) .

So I have only used the online edition of this program since my knowledge of Java (now-a-days) is a little too rusty to work through the Java scripts, debugging them.

So, with that aside, first off this program, being brand new, does output a lot of debug info to the console:

Firebug output of MApReduce debug

In fact enough to fill you screen a couple of times, not only that but it does have a deprecation error: JQMIGRATE: jQuery.browser is deprecated, however this does not seem to effect the overall running of the program and I can still easily debug with it.

However, once you have got past this and onto the Map Reduce script, adding break points, it easily debugs the three functions with whatever stepping tools your browser console has.

My end verdict is that this program has undeniable potential but at the moment there is just too many teething problems to really test any Map Reduces correctly.

You should try the online version for yourself here: http://mongo-mapreduce-webbrowser.opensagres.cloudbees.net/ and if you know Java and are willing to contribute it would be so good to have a program like this running and maintained.

MongoDB Active Record with mongoglue

I recently had some spare/slack time on my hands and decided randomly to build a MongoDB Active Record.

I am not going to explain and document the active record here since the readme on the project (hosted on GitHub) itself does that so I am just going to direct you there instead:

https://github.com/Sammaye/mongoglue

Enjoy,

Amazon like Product Gallery with Prodigal

I recently had a need to create a product gallery lightbox like on Amazon capable of using images and videos and having a filter for showing one or the other or both. So I went off, procrastinated about it for about 3 days then made it in about 2.

I have called it, prodigal.

You can find the Github repo to fork, love etc here: https://github.com/Sammaye/prodigal.

The browsers I have tested this on as working are:

  • Firefox (latest)
  • IE (7 and above)
  • Safari 5.1 and above
  • Opera 12 and above
  • and Chrome (latest)

All you need to do to get started is to look at the example.html page within the repo and you can use it like:

$('.some-thumbs').prodigal({ //options });

With .some-thumbs being a class that is on all the thumbnails you wish to enter prodigal.

Enjoy,

SEO: 100 links per Page?

I recently had an SEO consultant state that Google recommends, for SEO or “linkvalue” for a page, 100 links per page.

I have never seen the word “linkvalue” before so I decided to Google it, ironically and found nothing. After that I decided to Google about 100 links per page. I soon came across a post made by Matt Cutts: http://www.mattcutts.com/blog/how-many-links-per-page/ where by he states:

The original reason we provided that recommendation is that Google used to index only about 100 kilobytes of a page. When we thought about how many links a page might reasonably have and still be under 100K, it seemed about right to recommend 100 links or so.

So the problem was that Google searchbot would only read upto 100kb of data before potentially truncating a page. Of course even back then this was just an advisory (this was back in 2009) and it was Googles attempt to predict max page size for things.

Matt goes on to explain about the modern day tactics of the Google searchbot:

Does Google automatically consider a page spam if your page has over 100 links? No, not at all. The “100 links” recommendation is in the “Design and content” guidelines section

So you see, the whole 100 link thing is really only about design and inreality, as can be seen from this picturegram: http://www.nickbilton.com/98/ many of the top 98 sites on the web from back then had lots more than 100 links and still do.

However this being said Matt does go onto explain:

At any rate, you’re dividing the PageRank of that page between hundreds of links

So if you are really keen on passing as much of your pagerank to pages on your site that might not be able to hold their own (in which case you should probably question why you want them listed) and you have 600 odd links on one page then you probably want to cut down on the links a little otherwise it is completely upto the design of the site.

Running background process in PHP

An interesting method: http://nsaunders.wordpress.com/2007/01/12/running-a-background-process-in-php/

MongoDB Paging using Ranged Queries (Avoiding Skip())

This post has been rewritten

Originally I gave only one scenario and only one form of paging the paging was open to loop holes. Also I did not take sharding into account for certain paging scenarios etc.

So what is wrong with skip()?

Nothing. Skip is an awesome and necessary function for 90% of paging cases, however, there are certain times where skip becomes slow and cumbersome for the server to work on.

The first consideration with using skip is to understand exactly what is required to commit a skip of x documents. In order to do a skip MongoDB must effectively do the work server-side that you would normally do client side of reading each record out and going over it until it reaches the right one.

A skip does not make very effective use of indexes so it is mostly pointless to put indexes on a collection to improve skipping speed.

That being said, skip will work wonders on small result sets so you must be aware at all times whether reading this post for your queries is actually micro-optimisation or not. I personally cannot tell you whether you are micro-optimising your queries. Skip deprecation should be done with massive testing prior to reading this post however I personally have found that skip can work effectively into the 100′s of thousands of rows, maybe up to a million.

Believing that avoiding skip will solve all your problems through defacto is like…think of stored procedures, not triggers, or certain crazy querying patterns in SQL. It’s all bullshit.

So I can no longer live with Skip() what are my choices?

Well my furry little friend that all depends. It depends on what application you are making and what your data is like and what your indexes are like, however I am going to outline the solution. RANGED QUERIES. Yep, I said it, ranged queries. The idea of this is to first limit anything before your highest skip with an operator like $gt and $gte and then use limit() to create the limited page you want.

Let’s look at an example. We want to make a Facebook feed for what would normally be the users homepage when they login. To begin with we want to get our first page:

var posts = db.find({}).sort({ ts: -1 }).limit(20)

Here I am using the documents timestamp (ISODate) to sort and range my documents.

Note: the ISODate in MongoDB has a visibility issue in that it can only go down to seconds so if you are likely to get documents the same second then use something else. We will discuss this later.

So now we have the first 20 stories for our feed. When the user scrolls the page, to the bottom, we want to fire off another query which will get our next page. In order to accomplish the range we need to know the last timestamp of the document (story) on the page:

var last_ts = posts[posts.length-1]['ts']['sec'];

With this last timestamp we then make the query that will get the next lot of stories:

var posts = db.find({ ts: {$gt: {last_ts}}}).sort({ ts: -1 }).limit(20)

And bingo! Keep repeating steps 1 and 2 and you will have successful paging for large data sets.

But…what about skipping huge sections of my data?

Now you have come to the gritty nasty bit. If skipping huge amounts of data were that easy don’t you think MongoDB would already do it for you?

To skip huge sections you need to understand that it isn’t just the DB that has to do the work, the application has to do something as well. Just as many sites who deal with log data or other peoples drunk nights out like Cloudflare and Facebook does.

Facebook timelines and many other large paging systems now accommodate a large skip via the interface. Notice how you have that year filter on the right hand side of your new timeline? Well that’s how they do it! They simply narrow down the index via a year you pick, good isn’t it? Simple, clean and more effectively and user friendly than putting in a random page number into the address bar.

Cloudflare, on their statistics page, gives you a huge summary graph allowing you pick out specific ranges using your mouse to highlight parts of the graph. Again this works a lot like Facebooks own, whittling it down to a pre-defined set, or range, of documents.

In fact if you look at 99% of all applications, whether web based or not, that have to deal with huge amounts of data (like Apache logs) you will realise they do not implement normal paging.

So the first and most important thing to understand about how to do large scale paging is that you can no longer solve your problems with just a bit of HTML and a skip(), you need an interface.

Just look at how other programs/sites do it and you will realise how you should do it.

What if I need to pick random records out for paging or just in a normal query?

This is a little off topic but can be used for paging as well.

sometimes people like to pick random records out from their collection. One option is to use skip() like:

$cursor = $collection->find();
$cursor->skip(3);
$cursor->next();
$doc = $cursor->current();

And again, this is good for small skips. But what if you need to do this continuously and at random points?

If you were to do this often and in random ways it might be better to use an incrementing ID combining another collection with findAndModify to produce an accurate document number ( Doc Link ).

This however induces problems, you must maintain this ID especially when deletes occur. One method around this is to mark documents as deleted instead of actually deleting them. When you query for the exact document you omit deletes and limit() by one allowing you to get the next closest document to that position like so:

$cur = $db->collection->find(array('ai_id' => array('$gte' => 403454), 'deleted' => 0))->limit(1);
$cursor->next();
$doc = $cursor->current();

How do I choose my range key?

As I said earlier, a timestamp may not always work for you. Choosing the right cardinality and granularity for your range key is very important. A timestamp is not always unique especially on a Facebook wall. You might have two posts that occur the same second. As such when getting the next page you will actually miss this post. So the better key to use here is _id with a sort on timestamp and a compound index on _id and your timestamp field “ts”.

So what is the best key in your scenario? Again I cannot answer that from this post, however, I will try and give a few points and hints on how to choose a good one. I will also try and throw some scenarios in.

First thing to get a combination THAT WILL be unique to each document. So say you are doing some rated items and you want to sort by specific ratings. You would normally create a a compound index and range and use that as your sort as well, like:

$db->find(array('type' => 'video', 'liked' => 1, 'reaction' => 'love'))->sort(array('type' => 1, 'liked' => 1, 'reaction' => 1))

Of course I do not recommend that schema but this is just an example of how a unique range key would work.

The second thing to consider is your sharding key. This is important because you don’t want to scan all computers for a small range that you might be able to find on one computer, right? So make sure to take your shard key into consideration. For example when dong the Facebook feed over an _id range key and ts sort key I would make a shard key of _id and ts to make my ranging as fast and possible.

The problem that can be experienced here, as stated in the comments below, is that you do not hit the shard key when ranging. This still ok since MongoDB will house index ranges and will still be able to hit specific areas of the collection however it will still force a global operation.

You can find a very nice (as @Matt pointed out) pictographic presentation of how queries work in a sharding environment here: http://www.mongodb.org/download/attachments/2097354/how+queries+work+with+sharding.pdf. This will show you how your queries might react on certain shards and under certain conditions.

Now we have covered the absolute basics of how to page without skip() in MongoDB. I am sure there are more parts I have forgotten to add and will probably update this post as time goes on.

Post a comment if you would like further info or see errors.

Realtime Stats in MongoDB Example (PHP)

Ok so you’ll remember some time ago I talked about realtime statistics in MongoDB, right?

Well here is a working version. This script basically records the basic statistics on videos.

So this time I am going to skip why and how I made my document structure and just show you my document structure:

	/**
	 * Document structure:
	 *
	 * {
	 *   _id: {},
	 *   hits: 0,
	 *   u_hits: 0,
	 *   hours: {
	 *   	1: {v: 5, u: 0},
	 *   	2: {v: 3, u: 9},
	 *   },
	 *   browser: {
	 *   	chrome: 1,
	 *   	ie: 2
	 *   },
	 *   age: {
	 *   	13_16: 0,
	 *   	17_25: 0
	 *   },
	 *   video_comments: 0,
	 *   text_comments: 0,
	 *   video_likes: 0,
	 *   video_dislikes: 0,
	 *   age_total: 0,
	 *   browser_total: 0,
	 *   male: 0,
	 *   female: 0,
	 *   day: 0
	 * }
	 */

This may look very overwhelming at first but if you just take a minute to sit down a comprehend it you’ll realise it is very simple:

  • Every document works only on a day level
  • _id – The unique _id of the document
  • hits – The amount of hits (recurring) for that day
  • u_hits – The unique hits for that day
  • hours – This is an array of hour values (1-24) with another level of complexity within each hour weighting. The sub array within each hour contains a “v” and a “u”. The v is recurring views and the u is unique views.
  • browser – This is basically an array of browser types with the amount used beside each one. This variable is for unique hits only.
  • age – This denotes the age of logged in users who view this video and it separated into age groups which is then $inc’d for each individual unique visit
  • video_comments – This denotes the amount of video comments for that day
  • text_comments – This denote the amount of text comments for that day
  • video_likes – The amount of likes for that day on the video
  • video_dislikes – The amount of dislikes for that day on the video
  • age_total – This is an aggregation field of all age groups summed up in amount
  • browser_title – Same as above but for browsers
  • male – The amount of males who visited this video
  • female – The amount of females who visited this video
  • day – A timestamp of the zero hour of the day

Now for some of the meaty stuff. Lets make a function which can manipulate this schema.

        // This function would be part of say a bigger class called "Video" is whic the object we are collecting stats about, so if you see $this->blah + 1 or whatever it means I am preaggregating the parent variables to make querying stats much easier on things like user video pages.
	function recordHit(){

		if(!is_search_bot($_SERVER['HTTP_USER_AGENT'])){ // Is the user a search bot? is so we don't want to add them

			$user = glue::session()->user;
			$u_brows_key = get_ua_browser();

	        $u_age_key = 'u'; // Uknown age
	        if($_SESSION['logged']){ // If the user is a logged in user lets get some details
		        ///Other, 13-16, 17-25, 26-35, 36-50, 50+
		        $u_age = isset($user->birth_day) && isset($user->birth_month) && isset($user->birth_year) ?
		        	mktime(0, 0, 0, $user->birth_month, $user->birth_day, $user->birth_year) : 0; // Lets get their birthday
				$u_age_diff = (time() - $u_age)/(60*60*24*365); // Lets find the difference between today and then

				switch(true){ // Lets slap them into the right age group
					case $u_age_diff > 12 && $u_age_diff < 17:
						$u_age_key = '13_16';
						break;
					case $u_age_diff > 17 && $u_age_diff < 26:
						$u_age_key = '17_25';
						break;
					case $u_age_diff > 25 && $u_age_diff < 36:
						$u_age_key = '26_35';
						break;
					case $u_age_diff > 35 && $u_age_diff < 51:
						$u_age_key = '36_50';
						break;
					case $u_age_diff > 50:
						$u_age_key = '50_plus';
						break;
				}
	        }

			$is_unique = glue::db()->{video_stats_all}->find(array(
				"sid" => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
				"vid" => $this->_id
			))->count() <= 0; // Is this view unique?

			$update_doc = array( '$inc' => array( 'hits' => 1 ) ); // Lets always add a recurring hit
			if($is_unique){
				$update_doc['$inc']['u_hits'] = 1; // Lets $inc all the document (atomic ops)
				$update_doc['$inc']['age.'.$u_age_key] = 1;
				$update_doc['$inc']['browser.'.$u_brows_key] = 1;

				// These are used to make my life a little easier
				$update_doc['$inc']['age_total'] = 1;
				$update_doc['$inc']['browser_total'] = 1;

				if(glue::session()->user->gender == 'm'){
					$update_doc['$inc']['male'] = 1;
				}elseif(glue::session()->user->gender == 'f'){
					$update_doc['$inc']['female'] = 1;
				}
			}

			$day_update_doc = $update_doc;
			$day_update_doc['$inc']['hours.'.date('G').'.v'] = 1; // Lets $inc the right hour in the day for this guy
			if($is_unique): $day_update_doc['$inc']['hours.'.date('G').'.u'] = 1; endif; // Unique edition of $inc'ing the right hour

			glue::db()->{video_stats_day}->update(array("day"=>new MongoDate(mktime(0, 0, 0, date("m"), date("d"), date("Y"))), "vid"=>$this->_id), $day_update_doc, array("upsert"=>true)); // Lets send down the query!

			if($is_unique){ // If this view is unqiue lets place it witin a all views collection so we can check if the view is unqiue later
				$this->unique_views = $this->unique_views+1;

				glue::db()->video_stats_all->insert(array(
					'sid' => glue::session()->user->_id instanceof MongoId ? glue::session()->user->_id : session_id(),
					'vid' => $this->_id,
					'ts' => new MongoDate()
				));
			}
			$this->views = $this->views+1; // Lets add pre-aggregation to the parent video object for which this function belongs (in reality you would just add 1 onto whatever parent object you have
			$this->save();

			// Now lets do some referers
			$referer = glue::url()->getNormalisedReferer();
			if($referer){ // We have a referrer who has gone from https://facebook.com/i.php?534986465mmde040 to facebook so we won't fill our table with useless referrers.
				glue::db()->{video_referers}->update(array('video_id' => $this->_id, 'referer' => $referer), array('$inc' => array('c' => 1),
					'$set' => array('ts' => new MongoDate())), array('upsert' => true));
			}
		}
	}

Now let’s write the code to actually get something out of this schema. I have written this with highcharts in mind. I am not going to say this function is easy and what not because it is a bastard and was one to get it working exactly the way I needed it but here goes.

	function getStatistics_dateRange($fromTs /* timestamp */, $toTs /* timestamp */){

                // Lets set some defaults up for the data to be returned
		$unique_views_range = array();
		$non_unique_views_range = array();

		// These totals make my percentage calcing life easier
		$total_browsers = 0;
		$total_ages = 0;

		$sum_browser = array();
		$sum_ages = array();
		$sum_video_comments = 0;
		$sum_text_comments = 0;
		$sum_video_likes = 0;
		$sum_video_dislikes = 0;

		$sum_males = 0;
		$sum_females = 0;

                // If an invalid time range lets default to 7 days ago
		if($fromTs > $toTs){
			$dateFrom = mktime(0, 0, 0, date("m"), date("d")-7, date("Y"));
			$dateTo = time();
		}

		if($fromTs < strtotime('-4 days', $toTs)){ // If this is more than 4 days ago then lets aggregate on day

		  	$newts = $fromTs;
			while ($newts <= $toTs) { // This prefills the graph axis' with the notation we require
				$unique_views_range[$newts] = 0;
				$non_unique_views_range[$newts] = 0;
				$newts = mktime(0,0,0, date('m', $newts), date('d', $newts)+1, date('Y', $newts));
			}

			foreach(glue::db()->{video_stats_day}->find(array(
				"vid"=>$this->_id,
				"day"=> array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs) ),
			)) as $day){ // Lets get our days!
				// Lets pre-aggregate our stuff
				$non_unique_views_range[$day['day']->sec] = $day['hits'] > 0 ? $day['hits'] : 0;
				$unique_views_range[$day['day']->sec] = $day['u_hits'] > 0 ? $day['u_hits'] : 0;

				$sum_browser = summarise_array_row($day['browser'], $sum_browser);
				$sum_ages = summarise_array_row($day['age'], $sum_ages);

				$total_browsers += (int)$day['browser_total'];
				$total_ages += (int)$day['age_total'];

				$sum_video_comments += (int)$day['video_comments'];
				$sum_text_comments += (int)$day['text_comments'];
				$sum_video_likes += (int)$day['video_likes'];
				$sum_video_dislikes += (int)$day['video_dislikes'];

				$sum_males += (int)$day['males'];
				$sum_females += (int)$day['females'];
			}

		}else{ // else with less than 4 days lets aggregate on hours of those days

			$newts = $fromTs;
			while($newts < $toTs){ // Lets pre-fill our graph axis' to the notation we require
				$newts = $newts+(60*60);
				$unique_views_range[$newts] = 0;
				$non_unique_views_range[$newts] = 0;
			}

			foreach(glue::db()->{video_stats_day}->find(array(
				"vid"=>$this->_id,
				"day"=> array("\$gte" => new MongoDate($fromTs), "\$lte" => new MongoDate($toTs) ),
			)) as $day){ // Lets get the days
				foreach($day['hours'] as $k => $v){ // Now for each of the hours in those days mark the views as an hour of that day
					$k = $k+1;
					$non_unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['v'] > 0 ? $v['v'] : 0;
					$unique_views_range[mktime($k, 0, 0, date('m', $day['day']->sec), date('d', $day['day']->sec), date('Y', $day['day']->sec))] = $v['u'] > 0 ? $v['u'] : 0;
				}

                                // Lets aggregate all the other stats too
				$sum_browser = summarise_array_row($day['browser'], $sum_browser);
				$sum_ages = summarise_array_row($day['age'], $sum_ages);

				$total_browsers += (int)$day['browser_total'];
				$total_ages += (int)$day['age_total'];

				$sum_video_comments += (int)$day['video_comments'];
				$sum_text_comments += (int)$day['text_comments'];
				$sum_video_likes += (int)$day['video_likes'];
				$sum_video_dislikes += (int)$day['video_dislikes'];

				$sum_males += (int)$day['male'];
				$sum_females += (int)$day['female'];
			}
		}

		// Now lets get the browser crap
		$browsers_highCharts_array = array();
		$u_brows_capt = 'Other';
		foreach($sum_browser as $k => $sum){
			if($k =='ie'){
				$u_brows_capt = "IE";
			}elseif($k == 'ff'){
				$u_brows_capt = "Firefox";
			}elseif($k == 'chrome'){
				$u_brows_capt = "Chrome";
			}elseif($k == 'safari'){
				$u_brows_capt = "Safari";
			}elseif($k == 'opera'){
				$u_brows_capt = "Opera";
			}elseif($k == 'netscape'){
				$u_brows_capt = "Netscape";
			}
			$browsers_highCharts_array[] = array($u_brows_capt, ($sum/$total_browsers)*100);
		}

                // And lets understand our age density
		$ages_highCharts_array = array();
		$u_age_capt = 'Unknown';
		foreach($sum_ages as $k => $sum){
			if($k == '13_16'){
				$u_age_capt = '13-16';
			}elseif($k == '17_25'){
				$u_age_capt = '17-25';
			}elseif($k == '26_35'){
				$u_age_capt = '26-35';
			}elseif($k == '36_50'){
				$u_age_capt = '36-50';
			}elseif($k == '50_plus'){
				$u_age_capt = '50+';
			}
			$ages_highCharts_array[] = array($u_age_capt, ($sum/$total_ages)*100);
		}

		if(sizeof($ages_highCharts_array) <= 0){ // Some defaults to stop broken graphs
			$ages_highCharts_array = array(array('None', 100));
		}

		if(sizeof($browsers_highCharts_array) <= 0){ // Some defaults to stop broken graphs
			$browsers_highCharts_array = array(array('None', 100));
		}

		$total_males_females = $sum_males+$sum_females; // Sum of both males and females who visited (vital for pie chart, mm pie)

                // Now lets form the returning array in the format we require!
		return array(
			'hits' => $this->formatHighChart(array(
				"Views"=>$non_unique_views_range,
				"Unique Views"=>$unique_views_range
			)),
			'browsers' => $browsers_highCharts_array,
			'ages' => $ages_highCharts_array,
			'video_comments' => $sum_video_comments,
			'text_comments' => $sum_text_comments,
			'video_likes' => $sum_video_likes,
			'video_dislikes' => $sum_video_dislikes,
			'males' => $sum_males > 0 ? number_format(($total_males_females/$sum_males)*100, 0) : 0,
			'females' => $sum_females > 0 ? number_format(($total_males_females/$sum_females)*100, 0) : 0
		);
	}

And there you have it a very basic aggregation example in MongoDB.

If you have any questions or need clarification or notice errors etc just post a comment below.

Enjoy and happy MongoDB’ing

Follow

Get every new post delivered to your Inbox.