There is no nicer feeling for a small website owner than signing into your Google Analytics account only to see a nice spike in traffic. You get all excited and drill down to try to analyse the source and see you have a new referral that has created 25 sessions in less than a week! Great news… Except, you don’t remember posting anything relevant to ‘buttons-for-your-website.com’ and upon inspecting the site you can’t find the back link anywhere.
Analytics spam can be more than just a spatter of disappointment when you realise those 25 shiny new sessions were just a misbehaving bot crawling your website or spamming Google’s tracking service. Analytics spam can and does occur in huge quantities that can have a serious impact on your perceived data, creating or blocking patterns and trends on your website. For example, if last month your website generated 21,000 sessions and this month it generates 23,000 sessions then you may be inclined to start celebrating, your new marketing push has paid off! However if this month saw a 3,000 session increase in spam traffic then your actual meaningful traffic has reduced by 1,000. Spotting and removing analytics spam could be the difference between recognising a failing marketing effort and not, or at the very least it can stop you scratching your head over mysterious referral traffic.
Spotting Referral Spam
Spam comes in 5 main categories, you have:
The Obvious Sales Pitch
Some spam websites make it all too easy for you to spot them, ‘best-seo-offer.com’, ‘100dollars-seo.com’ it’s not going to take long to realise these are not websites genuinely linking to your site. It’s probably a fair assumption to make that any SEO based url’s that you don’t recognise as a company you have done business with are in fact just trying to sell you their services in a very strange manner.
Buying SEO services from a website that caught your attention by spamming your web analytics seems a bit like buying a burglar alarm from someone who sneaks in and leaves their business card on your table.
The Malicious Referral
Again these websites tend to be relatively easy to spot. We’ve all grown suspicious of random links within our emails. Long since are the days that we naively click on a link from a friend we haven’t spoken to in years who has just found a great new investment opportunity abroad and has forgotten how to spell your name. We’ve wised up, unfortunately the same cannot be said about Analytics spam, many website owners and analytics enthusiasts are still unaware of its existence at all. So when you see a referral from ‘freemoneyonline.biz’ (I don’t know if that is an existing website, but I wouldn’t recommend trying it) you don’t stop and think ‘well that looks dodgy’, but instead wonder why such a website is linking to your homepage and before you know it your computer is running strangely slow. If you’re unsure if a site is spam or dangerous, Google first, visit later.
The Just For Fun Referral
There is one culprit in particular who springs to mind when thinking of this category. Vitaly Popov, a Russian citizen who spams referrals for fun, why? Because he can. He has found a vulnerability in Google and he intends to make sure everyone knows it. Don’t worry though, if his spam appears in your analytics you’ll most likely spot it easily enough.
The Clever Spam
Clever may be a bit of a stretch, but this category covers the spam that can actually be difficult to identify. Here are a couple of examples from our website (plugandplaydesign.co.uk)
You would be forgiven for thinking links from these websites are genuine referrals, and not just genuine but also great referrals for any website! A website awards website and a well trusted news source. Unfortunately all is not as it seems, notice anything strange about ‘theguardlan.com’? No? Neither did I at first, if you’re scanning through hundreds of referrals you’re unlikely to spot that the letter i is actually an l, making the website ‘the guard lan.com’, a redirect to aliexpress.com, a reputable ‘EBay’ style auction site.
As for ‘bestwebsiteawards.com’ a quick Google search revealed this to be a redirect URL used by major spam player semalt.
The Unfortunate Client
Sometimes you’ll find a website in your referrals that you know is spam, there’s no actual link to your website on theirs and they have no connection or reason to be linking to your site. Yet something doesn’t seem quite right. The website seems genuine enough and isn’t trying to force a virus or a sale down your throat straight from the get go, so why are they in the analytics spamming racket? Chances are they don’t even know that they are!
With an ever growing number of shady SEO companies offering instant guaranteed traffic, someone somewhere was going to bite. An example from our websites analytics was a fire station in Philadelphia. An instantly suspicious backlink for a London based Web Design Agency! Despite not being in America or requiring the fire service, as far as Google is concerned by checking out this referral, I have viewed their website, that is traffic, traffic most likely promised by a shady SEO agency who failed to elaborate on their methods.
Understanding the different types of analytics spam is useful in identifying your own spam, however if you’re a reasonably large business with a high level of traffic over multiple years it’s going to be a costly process filtering through thousands of referral sources trying to work out what is and isn’t genuine. Below are a few other tips and tricks that can be used to quickly identify spam from personal practice:
Google Analytics – New Session Percentage
One strong indicator of spam are referrals with a high volume of sessions and a 100% new sessions percentage. Let’s look at this logically, if you have a link on a website that has generated 6,000 sessions to your website, what are the cha
nces that all of those sessions will be first time visits. That means nobody clicking that link has visited your website before, or clicked that link more than once. Yes it is possible, but it is unlikely. At the very least you would want to take a quick look and see if it fits any of the categories mentioned above.
Google Analytics – Average Session Duration
Another strong indicator that a referral link is spam is the average session duration. Looking at the above referral entry you can see the final column contains the average session duration. Out of 921 sessions the average time spent on the site is less than 1 second. Either our website is incredibly off putting or there’s something suspicious about this entry. Combining this with the 100% new session percentage mentioned above and it’s pretty safe to assume you’ve found analytics spam.
Stopping Referral Spam
Now that you’ve identified the spam in your analytics you may be worried about how much this is skewing your beloved traffic data. Luckily there are ways we can defuse this spam-bomb and find out.
Filters are the main recommendation for dealing with analytics spam. Establishing different filtered views early on allows you to prevent spam visits from being recorded in your analytics. This is ideal if you’ve just spotted your first spam and gone off in search of a solution! Depending on your level of analytics best practice knowledge you may or may not have set-up multiple views when you launched your website. If not then now is a good time to start! It is recommended that you have 3 different filtered views:
- Unfiltered View – This will be the raw data, all your spam and genuine data rolled into one unfiltered data pile.
- Test View – This will be a copy of your unfiltered view where you can add different filters for testing. As filtering out traffic stops it at the source, having a test data set stops you from accidentally filtering out your own domain and losing valuable data!
- Master View – Once you have established that your test filters work as expected you can copy them over onto your master view, this will then be the main view you use for reporting.
The main drawback of filters is that they have no effect on historical data. Think of your analytics as a production line. If for multiple years you have been producing faulty toys, fixing the production line isn’t going to fix all the toys already stocking the shelves of the local toy store.
For those of you who require historical accuracy and a backdated removal of spam, fear not as there is an alternative.
Google Analytics allows the creation of custom ‘segments’ these segments can be used to filter out data from historic reports, such as spam! If filters are fixing the production line, then a segment would be recalling the stock from the stores. Using segments we can eliminate the traffic we’re not interested in and leave only the good stuff.
Below we’ll look into how to accomplish both methods!
Filtering Your Analytics Data
Before we apply any filters or segments we need to understand one last thing, the two kinds of spam. Ghost Referral’s and Misbehaving Bots!
The ‘Ghost Referral’, a term coined by Mike Sullivan at Analytics Edge, the ghost referral refers to websites that never make contact with your website. Instead they post fake page views to Google’s tracking service and end up hitting your tracking ID. These are the easiest to filter out, as due to never actually hitting your website they won’t hit your hostnames where as legitimate traffic will!
The misbehaving bot is a crawler that searches the internet one domain at a time seeking out vulnerabilities. In this case looking for opportunities to spam analytics. These bots actually do hit your website, even if only briefly. Due to this they can be slightly harder to spot, some smarter bots will even linger a while to prevent giving off suspiciously low average session durations! These unfortunately are more difficult to filter out and require some upkeep.
We will start with the Ghost Referrals. To filter out these we will be telling analytics to only pay attention to OUR hostnames. So the first step is establishing what your hostnames are. To do this go to your analytics and go to:
Audience > Technology > Network
Within here you will see the option to use Hostname as your primary dimension, select this.
You will now have a list of all of your sessions and the Hostnames that generated them.
Note: Remember to first filter by the correct time frame, if you want to remove spam historically you should be looking at the full duration of your websites analytics
You can see in the image above the top five hostnames appearing for our website. It’s a standard approach to take to assume that anywhere you have entered your analytics tracking code is an acceptable hostname. So for us all of these results seem fine, except number 2. This is not a domain of ours nor have we ever entered our tracking code on this website. A quick Google search reveals this to be spam!
Below are some common host names you will want to include:
- www.yoursite.com – Your websites main domain.
- example.yoursite.com – Sub domains of your website.
- translate.googleusercontent.com – Anyone visiting your website and utilising Google Translate.
- Your Payment Gateway.
- Any other pages you have added your tracking ID too, such as social media or YouTube.
- (not set) – This can often be email campaigns or other useful traffic, if possible investigate this further, if you’re not sure then it’s probably worth keeping this data.
Common mistakes people make here is seeing websites they recognise such as Google.com and assume this is a hostname they should include. Remember, if you didn’t add your tracking code there then you don’t want it!
Once you have established your hostnames it’s time to create your first filter / segment!
To create your hostname filter follow these steps:
- Ensure you have admin access to the analytics account in question.
- Select the Admin tab within your analytics account.
- Click on the ‘View’ dropdown and select ‘Create New View’
- Name your filter and set your timezone
- Once this view has been created ensure it is selected then click ‘Filters’ in the view menu.
- Click the red ‘+New Filter’ button and give your filter a name.
- Select ‘Custom’ filter type and then ‘Include’, the filter field you want selected is ‘Hostname’.
- The next step is the complicated part, here you need to enter your hostnames in the form of a regular expression. To do this simply ensure you use no spaces, separate each hostname with a | symbol and use “.*” to include subdomains. For example, if I want to include the information for both “plugandplaydesign.co.uk” and “example.plugandplaydesign.co.uk”. I would use – “.*plugandplaydesign.co.uk”
- Once this is done, click save and congratulations! You have setup a hostname filter. Now you just have to wait for the data to populate.
The process for creating a segment is similar to that of a filter, however you utilise your main view instead of creating a new one.
To set up the hostname segment you will need to follow these steps:
- On your analytics dashboard click the ‘Add Segment’ option next to your ‘All Sessions’ default segment.
- Click the ‘+New Segment’ button that appears in the popup window.
- On the left hand menu select ‘Conditions’ under the advanced column.
- You will now want to ensure the filter options state ‘Sessions’ and ‘Include’
- Select ‘Hostname’ and ‘matches regex’ from the filter details and then using the regular expression detailed in the filter section add your hostnames.
- Click ‘Save’ and your segment will automatically apply itself in comparison to the ‘All Sessions’ segment.
We have now successfully filtered out all ‘ghost referrals’ and can turn our attention to the ‘misbehaving bots’.
The process of removing the ‘misbehaving bots’ is very similar to hostname filtering, the difference is that instead of including your valid host names, you are instead excluding the spam domain names.
To create this filter or segment then you first need to establish who you want to remove. Using the methods detailed earlier in this article you can analyse your referral traffic and pick out the spam. If your traffic is excessive and you don’t have the time to sift through thousands of referrals picking out what is and isn’t spam then a simple approach would be to set a cut off point. Establish that any spam with less than 10 sessions for example is not worth the effort, by eliminating the majority of the spam you will be left with much more accurate data, if small amounts of spam remain it shouldn’t have too much of an effect.
Domain Exclusion Filter
We’ll start by looking at how to create the exclusion filter by following these steps:
- Return to the view that was created for use with the ‘Hostname Filter’ and click the filters button again.
- Click the red “+New Filter” button to create an additional filter on this view.
- As before name the filter appropriately and click ‘Custom’ filter type.
- Here we want to select ‘Exclude’ and ‘Campaign Source’ as the filter field.
- Using a regular expression add the domains you wish to block.
- Once this is completed, save the filter and wait for the data to populate!
Domain Exclusion Segment
For creating a backdated domain exclusion segment we could create a separate segment to the hostname filter, though given that it is unlikely you would only want to filter one element of spam and not the other it makes more sense to amend the original segment we set up. To do this we will need to complete the following steps:
- Click on ‘Add Segment’ or the default ‘All Segments’ section and select your previously created segment from the list presented.
- Click ‘Edit’ to open the segment for alteration.
- Click ‘+Add Filter’.
- From here select ‘Sessions’ and ‘Exclude’ as the base filter settings.
- For the filter criteria select ‘Medium’, ‘exactly matches’ and then enter ‘referral’ into the appropriate box.
- Now you have established that you are filtering the referral traffic specifically, click the ‘AND’ button to add an AND clause to your filter.
- The next step is to select ‘Source’ and ‘matches regex’ for the filter criteria and enter the regular expression for all of the domains you wish to filter.
- Save these changes and you’re done!
If you have followed all of the above steps correctly you will now have at least 2 Google Analytics views, one which contains your main unfiltered data and another containing your spam filters. You will within your unfiltered view also have a data segment that can be used to view your historic data spam free!
Maintaining your filters and segments is as easy as appending a row to the domain exclusion regular expression whenever you spot a new spam you don’t want.
Now go and enjoy your spam free reporting!