Power icon
Check our latest Product Hunt launch: 404 Error Hound!
Right arrow
404 Error Hound - Hunt down & fix website errors with AI power | Product Hunt
Tips & Guides

Build a Custom SEO Crawler with advertools: a Step-by-Step Guide

May 10, 2023
5 min read
Build a Custom SEO Crawler with advertools: a Step-by-Step Guide

Looking to take your website's SEO to the next level?Learn how to build a custom SEO crawler with advertools - a powerful open-source Python library designed for digital marketing, SEM, crawling, and text & content analysis for SEO & social media.

In this step-by-step guide, I'll walk you through the process of creating an SEO crawler, helping you identify technical issues like broken links and missing meta tags that could negatively impact your site's ranking on search engine results pages (SERPs).

But first, let's start with the basics - what exactly is a web crawler, and why is it crucial for SEO?

What Is a Web Crawler and Why Use it for SEO?

A web crawler or simply crawler is a computer program that systematically searches and indexes content on the internet. Search engines use web crawlers to index web pages and rank them based on their relevance and authority.

Using a web crawler for SEO can help you identify technical issues on your site, such as:

  • broken links 
  • duplicate content, and
  • missing meta tags

These issues can negatively impact your site's ranking on search engine results pages (SERPs), making it harder for potential customers to find your business online. By using a web crawler to identify and fix these issues, you can improve your site's visibility and drive more traffic to your site.

Step-by-Step Instructions

Get ready to build your own SEO crawler! In this guide, we will be using a Colab Notebook, a powerful tool that allows you to write and execute Python directly in the browser, without any configuration required and free of charge access to GPUs, so you can tackle even the most complex crawling tasks with ease.

Whether you're a seasoned Python developer or just starting out, Colab is a fantastic platform for building your own custom crawler and taking your SEO game to the next level!

Step 1: Installing advertools

The first thing you need to do before creating your SEO crawler is to install advertools. It’s easy, just run the following command to get started: 

code example: pip install advertools

Step 2: Importing advertools and Crawling the Website

With advertools installed, it's time to start crawling! To do this, you'll need to import advertools and call the SEO-customized crawl() function. 

This powerful function is designed to systematically search and index your website's content, helping you identify technical issues that might be hurting your SEO.

code example: import advertools

Step 3: Feeding in the URLs

After importing advertools, it's time to start feeding in URLs to crawl! This can be done easily by providing a list of URLs that you want to crawl.

code example: urls list

Step 4: Calling the crawl() Function

With advertools installed and URLs provided, you can now call the crawl() function with the necessary parameters as below:

code example: crawl the urls

Please note the following:

  • The crawl() function first checks the robots.txt file and takes into account any crawl rules
  • follow_links can be either True or False, where:
  • ~True: the crawler will follow all links
  • ~False: it will only crawl the provided URLs
  • The output file is a JSON Lines file that contains the most important page elements. Some of them are:
  • ~title (SEO Title of the page)
  • ~meta_desc (Meta Description of the page)
  • ~canonical (The canonical tag, if available)
  • ~download_latency (The amount of time it took to get the page HTML, in seconds)
  • ~redirect_urls (The chain of URLs from the requested URL to the one actually fetched)
  • ~redirect_reasons (The type of redirection(s) 301, 302, etc.)
  • ~status (Response status code, i.e. 200, 404, etc.)

For the full list and notes for all the elements that are extracted, please refer to the documentation of advertools.

Step 5: Exporting Output to an Excel file

Ready to see the results of your custom crawler in a readable format? After running the crawl() function, you can save the output as an Excel file by entering the following commands:

code example: save to an excel file

Finally, you can download the file to the destination of your choice:

code example: download excel file

Your custom-built SEO crawler is ready. 

Success!

Congratulations - you've just created your very own custom SEO crawler with advertools! By following the steps outlined in this article, you've gained the power to systematically search and index your site's content, identify technical issues, and optimize your site for search engines.

Whether it's broken links, missing metadata, or duplicate content, your custom crawler will help you pinpoint and fix any issues that might be hurting your site's SEO. And with advertools at your disposal, you have a powerful tool that can help you take your site's SEO and overall online presence to the next level.

So go forth and crawl, and let's drive more traffic to your site!

Similar posts

Read more posts from the same author!

Start your 30-day free trial

Never miss a metric that matters.
No credit card required
Cancel anytime