Power icon
Check our latest Product Hunt launch: 404 Error Hound!
Right arrow
404 Error Hound - Hunt down & fix website errors with AI power | Product Hunt
Tips & Guides

Ultimate Guide: How to Spot Bot Traffic

September 20, 2023
17 min read
Ultimate Guide: How to Spot Bot Traffic

Introduction

As the digital landscape continues to evolve, so does the prevalence of bot traffic. Bots have become a critical aspect of online activity, accounting for a significant portion of internet traffic. While bots can serve legitimate purposes, they can also be used for malicious activities. 

As businesses and organizations rely heavily on online channels for various purposes, understanding and managing bot traffic has become crucial to ensure the integrity, security, and performance of their digital assets. In this article, we will delve into the significance of bot traffic, explore its impact on online ecosystems, and provide insights on how to identify and mitigate bot traffic to safeguard your online presence.

What is a bot?

First things first, a bot is a program or a script that runs automated tasks over the Internet. 

Typically it is intended to perform simple and repetitive tasks that would be time-consuming, mundane, or impossible for a human to perform manually.

Not all bots are bad

There are organizations or services that use bots to:

  • crawl the internet or improve SEO,
  • monitor parts of the website,
  • improve customer service, interacting with users via chat boxes
  • generate content

Bad hat bots

However, there are actors with malicious intentions. They use bots to:

  • steal content or valuable data
  • damage SEO
  • contribute to DDoS attacks
  • increase CPC costs

Why detecting bots is important

At this point, you might ask yourself; why do I care if bots visit my website?
Well, there are two main reasons:

  1. Detecting a bad actor behind a bot means you can take actions against them, like blocking them
  2. Even if a bot is not used for malicious purposes, it still pollutes your data. And poor data means poor decision making.

Bots. Bots everywhere

Finding bots is a very demanding task. It’s literally a competition between humans and machines!

And if that wasn't already enough, there are also millions of people out there who create bots. There are even software services that enable anyone to create bots that monitor websites!

Known bot-traffic exclusion

Fortunately, Analytics vendors like Google Analytics and Adobe analytics already exclude known bots. This is based on the International Spiders and Bots List, maintained by the Interactive Advertising Bureau (IAB).

Unfortunately, not all bots follow the rules set by IAB. In fact, there are studies that say that as much as 40% of traffic is generated by bots, according to Statista.com.

Detecting bots

But how can you detect bots using the tools that analytics vendors provide you with? 

Essentially, you have to separate bot-generated traffic from legitimate traffic. To do so, you must find and follow the “tracks” that bots have left behind. 

And why is this difficult?

Because searching for tracks of bot traffic means searching for indications across a myriad of dimensions & metric combinations.

To enumerate those components let’s first make an important distinction between bots.

Bots with or without persistent sessions

When humans visit your website, analytics vendors have the ability to track and stitch all the interactions they perform into “visits” or “sessions”. That’s why you can tell that a user xyz viewed your homepage, proceeded to contact us, and then left without further interactions.

In other words, human visitors, who are engaging with the website, have persistent sessions.

However, this is not always the case for bots. 

There are 2 different types of bot activities, since programmers have the option to create bots with or without a persistent session when they visit your website. 

How does this translate to web analytics:

- A bot with a persistent session makes multiple page views per session

- A bot without a persistent session makes only one page view per session

Persistent session Page views per session
Yes Multiple
No Single

To put it simply, depending on the existence or not of persistent sessions, the tracks that bots leave behind differ. Armed with this information, let’s start searching for bots!

Recipes for bot detection

We start by listing all the possible components (dimensions and metrics) we can use to detect bots.

Components

Dimensions Metrics
  1. Day
  2. Hour
  3. Minute
  4. Country
  5. City
  6. IP address (assuming you have not obstructed)
  7. Channel
  8. Device Type
  9. Browser
  10. Browser Type
  11. Device Manufacturer
  12. Operating System
  13. User Agent
  14. Screen resolution
  15. Visit number
  16. Hit depth
  17. Timezone
  18. Entry Page / Entry Page URL
  19. Page URL
  20. Visitor/User ID
  21. Referrer / Referring domain
  1. Visits/Entries
  2. Page Views
  3. Single Page Visit
  4. Bounce Rate
  5. Unique Visitor
  6. Media Clicks
  7. Media Impressions
  8. Conversions (clicks, orders)
  9. Visit duration
  10. Login
  11. Mouse movement
  12. Exits

That’s quite a handful, right?

Now let’s move on with the description of the patterns we search for.

Bot patterns

General

Below we will list patterns regardless of whether bots have persistent sessions:

  1. Repetitive time patterns
    Traffic spikes in patterns of variable seasonality: hourly, daily, weekly, etc.. And this pattern may be evident only in specific pages or sections of the website.
  2. The unusual traffic peaks within the day
    A similar but not identical to the first pattern. This pattern concerns a one-off anomalous spike. So, no seasonality here. In addition, it’s quite possible that the spike occurred at a specific hour or hour range within a day.
  3. Suspicious geo-locations
    There might be increased activity from strange geo-locations (based on IPs). For example, traffic from another continent.
  4. Sharp DoD increases from a single or specific
    a. Channel
    b. Device
    c. Browser
    d. Operating System
    e. User Agent
    or the combination of the above
  5. Traffic from obsolete Operating Systems (OS)
    Using a PC or a mobile Operating System that is 15 or 8 years old respectively is not common. So getting traffic from an old OSs- like Windows XP, Android 4, or iOS 7 might be an indication that bots generated that traffic.
  6. Operating Systems information missing or is Linux 
    Bots may avoid setting an operating system when generating traffic. This can be identified by a “Not Specified” operating system.
    Additionally, some bots may set their operating system to Linux, probably due to OS’s prevalence in servers and the programmer’s community. Though, be cautious not to exclude legitimate users using Linux. 
  7. Unusual User Agents
    Custom user agents are often used by bot creators. So, look for any strange User Agent value like an unknown version of a standard browser (especially Safari and Opera) or a value that has a simplified format compared to the standard User Agent convention.
  8. Traffic from obsolete browser versions
    These days, browsers tend to release new versions every month! So traffic from an old version like Chrome 43 points to a bot, as automated systems behind bots don't update their browser that often.
  9. Lower monitor resolutions
    Nowadays most of us use high-resolution monitors. So traffic from lower resolutions may indicate bot traffic. Specifically, the below ratios are popular for bots:
    a. 1024 x 768
    b. 1366 x 768
    c. 1600 x 864
    d. 800 x 600
    e. 1600 x 1200
    f. 1024 x 667
    g. Not Specified
  10. Abnormally high campaign CTR
    The Click-Through-Rate (CTR) for paid campaigns is usually within a specific range. Thus, when seeing an irregularly high CTR, chances might be that a fraudulent bot drove that performance.
  11. Unusual patterns in paid traffic
    Expanding on the previous point, any other unusual pattern in paid traffic can be an indication of bots. It could be unexpected clicks or traffic spikes within a day or the exact opposite; the lack of stochasticity.
  12. Extremely low conversion / No KPIs in the visit
    The majority of bots are not set up to interact with your pages. These interactions can add products to the cart, check out, submit lead forms, or watch a video. By removing bot traffic from your data, the conversion rate will again be close to average.
  13. No mouse movement or page scrolling 
    An expansion of the previous pattern, analytics implementations that track mouse movement or page scrolling can provide yet another signal of bot traffic if this type of page interaction is missing.
  14. Non-logged-in users
    If your website has an option for users to log in, then focus on the non-logged-in traffic. While bots can be configured to automatically authenticate, the majority of them aren’t that “smart”. 
  15. IP addresses that originate from distributed computing platforms
    As cloud computing providers like Amazon Web Services or Google Cloud become an integral piece of modern software infrastructure, they are inevitably used for bot farms. So IP addresses that indicate that cloud computing services were used, run a high risk of being bots. For instance, Google Cloud’s IP address starts with ​35.199 or 35.194​.
  16. Specific query string present / Pages URLs that don’t exist
    Bots can sometimes attempt to overload a site's cache or otherwise cause damage by accessing URLs that do not exist or that have invalid formatting. This can include URLs for typical LAMP or WordPress admin pages, as well as URLs with specific query strings appended to them.

Persistent-session bots

  1. Repetitive behavior 1
    Multiple page views or interactions within a visit, or a High number of page views or interactions per visit.
    Bots often have a very high number of page views per visit, ranging from several hundred to thousands. So when looking at a report for Users, IPs, or User Agents and Page Views per Visit, sort from highest to lowest. Bots will gather at the top of the page. And you will see a sharp decline as you move downwards into actual real traffic.
  2. Repetitive behavior 2
    Multiple page views or interactions within a short period of time.
    It’s safe to assume that viewing multiple pages or interacting with a page in a rapid manner indicate bot behavior. We are talking about 60 page views or interactions within a minute or any other traffic volume that is impossible for a human to generate.
  3. Repetitive behavior 3
    Same number of page views or interactions every minute/hour, for many minutes/hours.
  4. Repetitive behavior 4
    Number of page views or interactions follows an algorithmic behavior as the time passes.
    For example, the number of pageviews decreases by a specific number every 1 minute for many minutes or even hours coming from the same visitor.
  5. Long duration visits
    Even your most engaged users will eventually terminate their visits. So multi-hour sessions with a long duration (e.g.bove 3 hours) are a signal of non-human behavior.

Non-persistent-session bots

  1. Page Views = Visits = Unique Visitors, with a Visit number of 1
    This type of bot gets a new visitor ID every time they view a page. Consequently, this visitor will incur only one visit and their traffic will have a visit number of 1. Additionally, they will also have only one Page View within that visit.
  2. Extremely high bounce rate
    As a consequence of the above, their Bounce Rate will be equal to 100%. Slicing and dicing your data for high bounce rate traffic can help you locate those bots. There might be bot activities where the behavior is slightly different and the Bounce Rate is close to but not 100%.
  3. Zero average time spent on site
    Bots don’t stay on the website for any amount of time because they are designed to complete their tasks instantly. So, 0 time spent on a site is another indicator for such activities.
  4. Large volume of Single Page Visits
    Another way to detect bots that don’t have a persistent session is to use Single Page Visits. This metric returns those Visits where the Page dimension assumes only one value. Beware that users that interacted on that page will still count as Single Page Visits.

Set up alerts to spot bot traffic [Indicatively]

Bot traffic can be identified from unusual patterns followed within the data. It’s important to distinguish between bots with enabled cookie acceptance (persistent session) and bots with disabled cookie acceptance (no persistent session). Below we’ll see some of the scenarios that would indicate bot traffic and how you can prepare alerts within Google or Adobe Analytics to spot such trends.

A sudden spike in Pageviews per Session from Direct Traffic

Bots often mimic human behavior: Some bots are designed to generate traffic that appears to be similar to human behavior. They may send requests to websites without passing through referral links or search engines, making it appear as if the traffic is coming directly to the website, even though it may not be genuine human traffic.

High Pageviews per Session is usually a pattern that indicates bot activity.

Actions to take

Set up an alert to notify you of sudden Pageviews per Session increases in Direct traffic.

How to set up alerts in Google Analytics

How to set up alerts in Adobe Analytics

Limitations of alerts

Regardless of whether you are using Google or Adobe Analytics alerts, both systems have some constraints.

Limitations of Adobe Analytics Alerts

  • Cannot break down the alerts by additional dimensions to view top contributors. Therefore each anomaly will require a separate analysis.
  • Limited number of Contribution Analyses per month per company; depending on the product license

Limitations of Google Analytics Alerts

  • UA alerts cannot break down by additional dimensions so that you can view top contributors
  • UA alerts cannot be set up for custom, calculated metrics, e.g. page views per session
  • GA4 alerts may or may not breakdown the Insights created by additional dimensions

Drive efficiency with automation

Don’t want to waste a hundred man-hours of analytic resources hunting for bots?

Contact Baresquare to discuss how we can help you.

More from Baresquare

If you want to stay on top of untracked campaigns and avoid costly mistakes in your campaign optimization, please visit our ultimate guide to learn everything you need to know.

Similar posts

Read more posts from the same author!

Start your 30-day free trial

Never miss a metric that matters.
No credit card required
Cancel anytime