Spot Bot traffic with Google and Adobe Analytics

Introduction

As the digital landscape continues to evolve, so does the prevalence of bot traffic. Bots have become a critical aspect of online activity, accounting for a significant portion of internet traffic. While bots can serve legitimate purposes, they can also be used for malicious activities.

As businesses and organizations rely heavily on online channels for various purposes, understanding and managing bot traffic has become crucial to ensure the integrity, security, and performance of their digital assets. In this article, we will delve into the significance of bot traffic, explore its impact on online ecosystems, and provide insights on how to identify and mitigate bot traffic to safeguard your online presence.

What is a bot?

First things first, a bot is a program or a script that runs automated tasks over the Internet.

Typically it is intended to perform simple and repetitive tasks that would be time-consuming, mundane, or impossible for a human to perform manually.

Not all bots are bad

There are organizations or services that use bots to:

crawl the internet or improve SEO,
monitor parts of the website,
improve customer service, interacting with users via chat boxes
generate content‍‍

‍Bad hat bots

However, there are actors with malicious intentions. They use bots to:

steal content or valuable data
damage SEO
contribute to DDoS attacks
increase CPC costs‍‍

‍Why detecting bots is important

At this point, you might ask yourself; why do I care if bots visit my website?
Well, there are two main reasons:

Detecting a bad actor behind a bot means you can take actions against them, like blocking them
Even if a bot is not used for malicious purposes, it still pollutes your data. And poor data means poor decision making.‍

Bots. Bots everywhere

Finding bots is a very demanding task. It’s literally a competition between humans and machines!

And if that wasn't already enough, there are also millions of people out there who create bots. There are even software services that enable anyone to create bots that monitor websites!

‍Known bot-traffic exclusion

Fortunately, Analytics vendors like Google Analytics and Adobe analytics already exclude known bots. This is based on the International Spiders and Bots List, maintained by the Interactive Advertising Bureau (IAB).

Unfortunately, not all bots follow the rules set by IAB. In fact, there are studies that say that as much as 40% of traffic is generated by bots, according to Statista.com.

Detecting bots

But how can you detect bots using the tools that analytics vendors provide you with?

Essentially, you have to separate bot-generated traffic from legitimate traffic. To do so, you must find and follow the “tracks” that bots have left behind.

And why is this difficult?

Because searching for tracks of bot traffic means searching for indications across a myriad of dimensions & metric combinations.

To enumerate those components let’s first make an important distinction between bots.

Bots with or without persistent sessions

When humans visit your website, analytics vendors have the ability to track and stitch all the interactions they perform into “visits” or “sessions”. That’s why you can tell that a user xyz viewed your homepage, proceeded to contact us, and then left without further interactions.

In other words, human visitors, who are engaging with the website, have persistent sessions.

‍However, this is not always the case for bots. ‍

There are 2 different types of bot activities, since programmers have the option to create bots with or without a persistent session when they visit your website.

How does this translate to web analytics:

- A bot with a persistent session makes multiple page views per session

- A bot without a persistent session makes only one page view per session

Persistent session	Page views per session
Yes	Multiple
No	Single

To put it simply, depending on the existence or not of persistent sessions, the tracks that bots leave behind differ. Armed with this information, let’s start searching for bots!

Recipes for bot detection

We start by listing all the possible components (dimensions and metrics) we can use to detect bots.

Components

Dimensions	Metrics
Day Hour Minute Country City IP address (assuming you have not obstructed) Channel Device Type Browser Browser Type Device Manufacturer Operating System User Agent Screen resolution Visit number Hit depth Timezone Entry Page / Entry Page URL Page URL Visitor/User ID Referrer / Referring domain	Visits/Entries Page Views Single Page Visit Bounce Rate Unique Visitor Media Clicks Media Impressions Conversions (clicks, orders) Visit duration Login Mouse movement Exits

That’s quite a handful, right?

Now let’s move on with the description of the patterns we search for.

Bot patterns

General

Below we will list patterns regardless of whether bots have persistent sessions:

Repetitive time patterns
Traffic spikes in patterns of variable seasonality: hourly, daily, weekly, etc.. And this pattern may be evident only in specific pages or sections of the website.
‍‍
The unusual traffic peaks within the day
A similar but not identical to the first pattern. This pattern concerns a one-off anomalous spike. So, no seasonality here. In addition, it’s quite possible that the spike occurred at a specific hour or hour range within a day.
‍
Suspicious geo-locations
There might be increased activity from strange geo-locations (based on IPs). For example, traffic from another continent.
‍
Sharp DoD increases from a single or specific
a. Channel
b. Device
c. Browser
d. Operating System
e. User Agent
or the combination of the above
‍
Traffic from obsolete Operating Systems (OS)
‍Using a PC or a mobile Operating System that is 15 or 8 years old respectively is not common. So getting traffic from an old OSs- like Windows XP, Android 4, or iOS 7 might be an indication that bots generated that traffic.
‍
Operating Systems information missing or is Linux
Bots may avoid setting an operating system when generating traffic. This can be identified by a “Not Specified” operating system.
Additionally, some bots may set their operating system to Linux, probably due to OS’s prevalence in servers and the programmer’s community. Though, be cautious not to exclude legitimate users using Linux.
‍
Unusual User Agents
‍Custom user agents are often used by bot creators. So, look for any strange User Agent value like an unknown version of a standard browser (especially Safari and Opera) or a value that has a simplified format compared to the standard User Agent convention.
‍
Traffic from obsolete browser versions
‍These days, browsers tend to release new versions every month! So traffic from an old version like Chrome 43 points to a bot, as automated systems behind bots don't update their browser that often.
‍
Lower monitor resolutions
Nowadays most of us use high-resolution monitors. So traffic from lower resolutions may indicate bot traffic. Specifically, the below ratios are popular for bots:
a. 1024 x 768
b. 1366 x 768
c. 1600 x 864
d. 800 x 600
e. 1600 x 1200
f. 1024 x 667
g. Not Specified
‍
Abnormally high campaign CTR
The Click-Through-Rate (CTR) for paid campaigns is usually within a specific range. Thus, when seeing an irregularly high CTR, chances might be that a fraudulent bot drove that performance.
‍
Unusual patterns in paid traffic
Expanding on the previous point, any other unusual pattern in paid traffic can be an indication of bots. It could be unexpected clicks or traffic spikes within a day or the exact opposite; the lack of stochasticity.
‍
Extremely low conversion / No KPIs in the visit
‍The majority of bots are not set up to interact with your pages. These interactions can add products to the cart, check out, submit lead forms, or watch a video. By removing bot traffic from your data, the conversion rate will again be close to average.
‍
No mouse movement or page scrolling
An expansion of the previous pattern, analytics implementations that track mouse movement or page scrolling can provide yet another signal of bot traffic if this type of page interaction is missing.
‍
Non-logged-in users
‍If your website has an option for users to log in, then focus on the non-logged-in traffic. While bots can be configured to automatically authenticate, the majority of them aren’t that “smart”.
‍
IP addresses that originate from distributed computing platforms
‍As cloud computing providers like Amazon Web Services or Google Cloud become an integral piece of modern software infrastructure, they are inevitably used for bot farms. So IP addresses that indicate that cloud computing services were used, run a high risk of being bots. For instance, Google Cloud’s IP address starts with 35.199 or 35.194.
‍
Specific query string present / Pages URLs that don’t exist
‍Bots can sometimes attempt to overload a site's cache or otherwise cause damage by accessing URLs that do not exist or that have invalid formatting. This can include URLs for typical LAMP or WordPress admin pages, as well as URLs with specific query strings appended to them.

Persistent-session bots

‍Repetitive behavior 1
‍Multiple page views or interactions within a visit, or a High number of page views or interactions per visit.
Bots often have a very high number of page views per visit, ranging from several hundred to thousands. So when looking at a report for Users, IPs, or User Agents and Page Views per Visit, sort from highest to lowest. Bots will gather at the top of the page. And you will see a sharp decline as you move downwards into actual real traffic.
‍
Repetitive behavior 2
Multiple page views or interactions within a short period of time.
It’s safe to assume that viewing multiple pages or interacting with a page in a rapid manner indicate bot behavior. We are talking about 60 page views or interactions within a minute or any other traffic volume that is impossible for a human to generate.
‍
Repetitive behavior 3
Same number of page views or interactions every minute/hour, for many minutes/hours.
‍
Repetitive behavior 4
Number of page views or interactions follows an algorithmic behavior as the time passes.
For example, the number of pageviews decreases by a specific number every 1 minute for many minutes or even hours coming from the same visitor.
‍
Long duration visits
‍Even your most engaged users will eventually terminate their visits. So multi-hour sessions with a long duration (e.g.bove 3 hours) are a signal of non-human behavior.

Non-persistent-session bots

Page Views = Visits = Unique Visitors, with a Visit number of 1
‍This type of bot gets a new visitor ID every time they view a page. Consequently, this visitor will incur only one visit and their traffic will have a visit number of 1. Additionally, they will also have only one Page View within that visit.
‍
Extremely high bounce rate
‍As a consequence of the above, their Bounce Rate will be equal to 100%. Slicing and dicing your data for high bounce rate traffic can help you locate those bots. There might be bot activities where the behavior is slightly different and the Bounce Rate is close to but not 100%.
‍
Zero average time spent on site
‍Bots don’t stay on the website for any amount of time because they are designed to complete their tasks instantly. So, 0 time spent on a site is another indicator for such activities.
‍
Large volume of Single Page Visits
‍Another way to detect bots that don’t have a persistent session is to use Single Page Visits. This metric returns those Visits where the Page dimension assumes only one value. Beware that users that interacted on that page will still count as Single Page Visits.

Set up alerts to spot bot traffic [Indicatively]

Bot traffic can be identified from unusual patterns followed within the data. It’s important to distinguish between bots with enabled cookie acceptance (persistent session) and bots with disabled cookie acceptance (no persistent session). Below we’ll see some of the scenarios that would indicate bot traffic and how you can prepare alerts within Google or Adobe Analytics to spot such trends.

A sudden spike in Pageviews per Session from Direct Traffic

Bots often mimic human behavior: Some bots are designed to generate traffic that appears to be similar to human behavior. They may send requests to websites without passing through referral links or search engines, making it appear as if the traffic is coming directly to the website, even though it may not be genuine human traffic.

High Pageviews per Session is usually a pattern that indicates bot activity.

Actions to take‍

Set up an alert to notify you of sudden Pageviews per Session increases in Direct traffic.

How to set up alerts in Google Analytics

To set up an alert for sudden increases in Direct traffic, start by building a segment with the criteria "Default Channel Grouping exactly matches 'Direct'." This segment will be used for your alert configuration.

Step 1: Open Reports > Customization > Custom Alerts

Step 2: Click Manage Custom Alerts > New Alert

Step 3: Set the granularity to daily

Step 4: Define the alert conditions

Step 5: Specify the alert trigger by choosing a metric and threshold. Unfortunately creating an alert for a custom calculated metric is not feasible in Google Analytics, therefore only Pageviews can be used instead.

Note: Google Universal Analytics does not feature anomaly detection. Instead of creating an alert by identifying trends over recent days, you can compare your current value to the following:

the previous day's value
the same day of the previous week
the same day of the previous year.

To avoid confusion caused by fluctuations in traffic due to weekly seasonality, you may prefer to choose option b) and compare today's Direct pageviews to those of the same day last week. Alternatively, you can choose the option that suits your needs best.

Once you've decided, set a reasonable threshold, such as a 50% increase in pageviews compared to the same day the previous week.

You’ll get something like this:

Google Analytics alert for Pageviews spike for Direct channel

This will alert you in the case that a spike from Direct occured the previous day, but no additional information / breakdowns will be provided.

To gain more insights into what caused the spike in Direct traffic, you can create a custom report in Google Analytics. Here are the steps to follow:

Set up the same filters and date range of the last 2 days/weeks as your alert.
In the case of Google Analytics, the User Agent is not an option, therefore you need to use a combination of Dimensions, in order to create as small as possible user buckets, which will be used as an alternative to the User Agent.
These dimensions are City, Operating System, Browser Version.
Add the "Day" or "Week" dimension to calculate day-over-day (DoD) or week-over-week (WoW) figures. You will need to work on this outside GA, as it does not support creating a custom metric like this on the fly.
You’ll also need to add some further metrics that can help you with your investigation. Apart from sessions, you can use entries, bounce rate, pageviews per session and visit duration.
Note: It’s always optimal to get the absolute numbers and calculate the ratio metrics outside GA (e.g. for bounce rate, you can take the raw bounces and entries number and calculate the ratio outside GA).
Now you can see the contributors. Focus on the ones with a high number of pageviews, pageviews per session and long visit duration (in case of persistent-session bots). See any strange combination of the above dimensions that could indicate bot traffic?

How to set up alerts in Adobe Analytics‍

Step 1: Navigate to the top toolbar > Components > Alerts and click ‘Add’

Step 2: Set the granularity to daily

Step 3: Define the metric for which you want the alert to be sent. That would be Page Views / Visit

Note: Page Views / Visits should be a default calculated metrics from Adobe Analytics. If you cannot find it, you may need to first create it yourself, to use it in the alerts
(Components > Calculated Metrics > Add)

Step 4: Define the threshold
Options: 90%, 95%, 99%, 99.75%, and 99.9% thresholds; % change; above/below)

Step 5: Select the value of the dimension you are interested in. You want to segment your data with Last Touch Channel = Direct

You’ll get something like this:

Create a custom insight with Adobe Analytics_1

Create a custom insight with Adobe Analytics_2

This will alert you in the case of a spike from Direct occuring the previous day, but no additional information / breakdowns will be provided.
If you want to know what contributed to this analysis, you will have to recreate it in Adobe workspace.

Set up the same filters and date range of the last 2 days/weeks.
Create a DoD/WoW Page Views custom metric and sort it in descending order (easier if you calculate the DoD of an absolute number instead of the ratio metric).
Next, you can break it down by city and user agent.
You’ll also need to add some further metrics that can help you with your investigation. Apart from visits, you can use entries, bounce rate, pages per visit and time spent per visit.
Now you can see the contributors. Focus on the ones with a high number of pageviews, pageviews per visit and long visit duration (in case of persistent-session bots). See any city or user agent that looks suspicious?

You can also run a contribution analysis within Adobe Workspace, but this feature is limited and depends on your license.

Limitations of alerts

Regardless of whether you are using Google or Adobe Analytics alerts, both systems have some constraints.

Limitations of Adobe Analytics Alerts

Cannot break down the alerts by additional dimensions to view top contributors. Therefore each anomaly will require a separate analysis.
Limited number of Contribution Analyses per month per company; depending on the product license

Limitations of Google Analytics Alerts

UA alerts cannot break down by additional dimensions so that you can view top contributors
UA alerts cannot be set up for custom, calculated metrics, e.g. page views per session
GA4 alerts may or may not breakdown the Insights created by additional dimensions

Drive efficiency with automation

Don’t want to waste a hundred man-hours of analytic resources hunting for bots?

Contact Baresquare to discuss how we can help you.

More from Baresquare

If you want to stay on top of untracked campaigns and avoid costly mistakes in your campaign optimization, please visit our ultimate guide to learn everything you need to know.

Ultimate Guide: How to Spot Bot Traffic