As the digital landscape continues to evolve, so does the prevalence of bot traffic. Bots have become a critical aspect of online activity, accounting for a significant portion of internet traffic. While bots can serve legitimate purposes, they can also be used for malicious activities.
As businesses and organizations rely heavily on online channels for various purposes, understanding and managing bot traffic has become crucial to ensure the integrity, security, and performance of their digital assets. In this article, we will delve into the significance of bot traffic, explore its impact on online ecosystems, and provide insights on how to identify and mitigate bot traffic to safeguard your online presence.
What is a bot?
First things first, a bot is a program or a script that runs automated tasks over the Internet.
Typically it is intended to perform simple and repetitive tasks that would be time-consuming, mundane, or impossible for a human to perform manually.
Not all bots are bad
There are organizations or services that use bots to:
crawl the internet or improve SEO,
monitor parts of the website,
improve customer service, interacting with users via chat boxes
generate content
Bad hat bots
However, there are actors with malicious intentions. They use bots to:
steal content or valuable data
damage SEO
contribute to DDoS attacks
increase CPC costs
Why detecting bots is important
At this point, you might ask yourself; why do I care if bots visit my website? Well, there are two main reasons:
Detecting a bad actor behind a bot means you can take actions against them, like blocking them
Even if a bot is not used for malicious purposes, it still pollutes your data. And poor data means poor decision making.
Bots. Bots everywhere
Finding bots is a very demanding task. It’s literally a competition between humans and machines!
And if that wasn't already enough, there are also millions of people out there who create bots. There are even software services that enable anyone to create bots that monitor websites!
Known bot-traffic exclusion
Fortunately, Analytics vendors like Google Analytics and Adobe analytics already exclude known bots. This is based on the International Spiders and Bots List, maintained by the Interactive Advertising Bureau (IAB).
Unfortunately, not all bots follow the rules set by IAB. In fact, there are studies that say that as much as 40% of traffic is generated by bots, according to Statista.com.
Detecting bots
But how can you detect bots using the tools that analytics vendors provide you with?
Essentially, you have to separate bot-generated traffic from legitimate traffic. To do so, you must find and follow the “tracks” that bots have left behind.
And why is this difficult?
Because searching for tracks of bot traffic means searching for indications across a myriad of dimensions & metric combinations.
To enumerate those components let’s first make an important distinction between bots.
Bots with or without persistent sessions
When humans visit your website, analytics vendors have the ability to track and stitch all the interactions they perform into “visits” or “sessions”. That’s why you can tell that a user xyz viewed your homepage, proceeded to contact us, and then left without further interactions.
In other words, human visitors, who are engaging with the website, have persistent sessions.
However, this is not always the case for bots.
There are 2 different types of bot activities, since programmers have the option to create bots with or without a persistent session when they visit your website.
How does this translate to web analytics:
- A bot with a persistent session makes multiple page views per session
- A bot without a persistent session makes only one page view per session
Persistent session
Page views per session
Yes
Multiple
No
Single
To put it simply, depending on the existence or not of persistent sessions, the tracks that bots leave behind differ. Armed with this information, let’s start searching for bots!
Recipes for bot detection
We start by listing all the possible components (dimensions and metrics) we can use to detect bots.
Components
Dimensions
Metrics
Day
Hour
Minute
Country
City
IP address (assuming you have not obstructed)
Channel
Device Type
Browser
Browser Type
Device Manufacturer
Operating System
User Agent
Screen resolution
Visit number
Hit depth
Timezone
Entry Page / Entry Page URL
Page URL
Visitor/User ID
Referrer / Referring domain
Visits/Entries
Page Views
Single Page Visit
Bounce Rate
Unique Visitor
Media Clicks
Media Impressions
Conversions (clicks, orders)
Visit duration
Login
Mouse movement
Exits
That’s quite a handful, right?
Now let’s move on with the description of the patterns we search for.
Bot patterns
General
Below we will list patterns regardless of whether bots have persistent sessions:
Repetitive time patterns Traffic spikes in patterns of variable seasonality: hourly, daily, weekly, etc.. And this pattern may be evident only in specific pages or sections of the website.
The unusual traffic peaks within the day A similar but not identical to the first pattern. This pattern concerns a one-off anomalous spike. So, no seasonality here. In addition, it’s quite possible that the spike occurred at a specific hour or hour range within a day.
Suspicious geo-locations There might be increased activity from strange geo-locations (based on IPs). For example, traffic from another continent.
Sharp DoD increases from a single or specific a. Channel b. Device c. Browser d. Operating System e. User Agent or the combination of the above
Traffic from obsolete Operating Systems (OS) Using a PC or a mobile Operating System that is 15 or 8 years old respectively is not common. So getting traffic from an old OSs- like Windows XP, Android 4, or iOS 7 might be an indication that bots generated that traffic.
Operating Systems information missing or is Linux Bots may avoid setting an operating system when generating traffic. This can be identified by a “Not Specified” operating system. Additionally, some bots may set their operating system to Linux, probably due to OS’s prevalence in servers and the programmer’s community. Though, be cautious not to exclude legitimate users using Linux.
Unusual User Agents Custom user agents are often used by bot creators. So, look for any strange User Agent value like an unknown version of a standard browser (especially Safari and Opera) or a value that has a simplified format compared to the standard User Agent convention.
Traffic from obsolete browser versions These days, browsers tend to release new versions every month! So traffic from an old version like Chrome 43 points to a bot, as automated systems behind bots don't update their browser that often.
Lower monitor resolutions Nowadays most of us use high-resolution monitors. So traffic from lower resolutions may indicate bot traffic. Specifically, the below ratios are popular for bots: a. 1024 x 768 b. 1366 x 768 c. 1600 x 864 d. 800 x 600 e. 1600 x 1200 f. 1024 x 667 g. Not Specified
Abnormally high campaign CTR The Click-Through-Rate (CTR) for paid campaigns is usually within a specific range. Thus, when seeing an irregularly high CTR, chances might be that a fraudulent bot drove that performance.
Unusual patterns in paid traffic Expanding on the previous point, any other unusual pattern in paid traffic can be an indication of bots. It could be unexpected clicks or traffic spikes within a day or the exact opposite; the lack of stochasticity.
Extremely low conversion / No KPIs in the visit The majority of bots are not set up to interact with your pages. These interactions can add products to the cart, check out, submit lead forms, or watch a video. By removing bot traffic from your data, the conversion rate will again be close to average.
No mouse movement or page scrolling An expansion of the previous pattern, analytics implementations that track mouse movement or page scrolling can provide yet another signal of bot traffic if this type of page interaction is missing.
Non-logged-in users If your website has an option for users to log in, then focus on the non-logged-in traffic. While bots can be configured to automatically authenticate, the majority of them aren’t that “smart”.
IP addresses that originate from distributed computing platforms As cloud computing providers like Amazon Web Services or Google Cloud become an integral piece of modern software infrastructure, they are inevitably used for bot farms. So IP addresses that indicate that cloud computing services were used, run a high risk of being bots. For instance, Google Cloud’s IP address starts with 35.199 or 35.194.
Specific query string present / Pages URLs that don’t exist Bots can sometimes attempt to overload a site's cache or otherwise cause damage by accessing URLs that do not exist or that have invalid formatting. This can include URLs for typical LAMP or WordPress admin pages, as well as URLs with specific query strings appended to them.
Persistent-session bots
Repetitive behavior 1 Multiple page views or interactions within a visit, or a High number of page views or interactions per visit. Bots often have a very high number of page views per visit, ranging from several hundred to thousands. So when looking at a report for Users, IPs, or User Agents and Page Views per Visit, sort from highest to lowest. Bots will gather at the top of the page. And you will see a sharp decline as you move downwards into actual real traffic.
Repetitive behavior 2 Multiple page views or interactions within a short period of time. It’s safe to assume that viewing multiple pages or interacting with a page in a rapid manner indicate bot behavior. We are talking about 60 page views or interactions within a minute or any other traffic volume that is impossible for a human to generate.
Repetitive behavior 3 Same number of page views or interactions every minute/hour, for many minutes/hours.
Repetitive behavior 4 Number of page views or interactions follows an algorithmic behavior as the time passes. For example, the number of pageviews decreases by a specific number every 1 minute for many minutes or even hours coming from the same visitor.
Long duration visits Even your most engaged users will eventually terminate their visits. So multi-hour sessions with a long duration (e.g.bove 3 hours) are a signal of non-human behavior.
Non-persistent-session bots
Page Views = Visits = Unique Visitors, with a Visit number of 1 This type of bot gets a new visitor ID every time they view a page. Consequently, this visitor will incur only one visit and their traffic will have a visit number of 1. Additionally, they will also have only one Page View within that visit.
Extremely high bounce rate As a consequence of the above, their Bounce Rate will be equal to 100%. Slicing and dicing your data for high bounce rate traffic can help you locate those bots. There might be bot activities where the behavior is slightly different and the Bounce Rate is close to but not 100%.
Zero average time spent on site Bots don’t stay on the website for any amount of time because they are designed to complete their tasks instantly. So, 0 time spent on a site is another indicator for such activities.
Large volume of Single Page Visits Another way to detect bots that don’t have a persistent session is to use Single Page Visits. This metric returns those Visits where the Page dimension assumes only one value. Beware that users that interacted on that page will still count as Single Page Visits.
Set up alerts to spot bot traffic [Indicatively]
Bot traffic can be identified from unusual patterns followed within the data. It’s important to distinguish between bots with enabled cookie acceptance (persistent session) and bots with disabled cookie acceptance (no persistent session). Below we’ll see some of the scenarios that would indicate bot traffic and how you can prepare alerts within Google or Adobe Analytics to spot such trends.
A sudden spike in Pageviews per Session from Direct Traffic
Bots often mimic human behavior: Some bots are designed to generate traffic that appears to be similar to human behavior. They may send requests to websites without passing through referral links or search engines, making it appear as if the traffic is coming directly to the website, even though it may not be genuine human traffic.
High Pageviews per Session is usually a pattern that indicates bot activity.
Actions to take
Set up an alert to notify you of sudden Pageviews per Session increases in Direct traffic.
How to set up alerts in Google Analytics
How to set up alerts in Adobe Analytics
Limitations of alerts
Regardless of whether you are using Google or Adobe Analytics alerts, both systems have some constraints.
Limitations of Adobe Analytics Alerts
Cannot break down the alerts by additional dimensions to view top contributors. Therefore each anomaly will require a separate analysis.
Limited number of Contribution Analyses per month per company; depending on the product license
Limitations of Google Analytics Alerts
UA alerts cannot break down by additional dimensions so that you can view top contributors
UA alerts cannot be set up for custom, calculated metrics, e.g. page views per session
GA4 alerts may or may not breakdown the Insights created by additional dimensions
Drive efficiency with automation
Don’t want to waste a hundred man-hours of analytic resources hunting for bots?
If you want to stay on top of untracked campaigns and avoid costly mistakes in your campaign optimization, please visit our ultimate guide to learn everything you need to know.