Understanding how Googlebot and visitors interact with your website is of paramount importance for the effectiveness and success of your SEO efforts. Subsequently, this will help you improve your rankings, generate high-quality traffic, and boost your conversions.
When Google’s crawl bot enters a website, its main task is to crawl a specific number of pages determined by the website’s crawl budget and save those pages to Google’s database in the form of automatically generated log files stored on your server.
A log file analysis is an essential component of every technical and on-site audit as it will show you which parts of your website received the most interest.
Here’s what server log files are used for in SEO and how to make the most of them.
Why being indexed & crawlable are important
For a web page to appear in Google’s search results, Google indexes that page. During the indexing process Google decides on the rankings and what keywords that page will appear for in its search results. (This can change over time)
To index the page the page needs to be crawlable. That is there must be nothing on the page or website that blocks Google’s bot from accessing all the information on the page. HTML errors do not stop Google from crawling a page or website, though they may not get the full understanding of the information being displayed.
There are some common reasons for stopping Google from crawling a page. If a web page provides no useful information in Google’s search results then developers will often block these in the robots txt file. This is a file that tells bots where to crawl or where not to crawl on a website. Reasons include landing pages that are only setup for ad traffic, checkout pages, cart pages, and admin login pages.
What is a server log file analysis and what are its advantages?
In terms of SEO, log files are text files that contain the history of page requests for your website made by both humans and robots.
With a server log file analysis, you’ll be able to find out how Googlebot crawls and processes your website. This provides valuable insight into how the search engine views your site and whether it can index it properly.
This analysis will allow you to track some important findings, such as:
- Do you spend your crawl budget efficiently?
- Were there any accessibility errors during the crawl?
- Are there any pages that can’t be crawled?
- What are your most popular and active pages?
- Are there any pages that Google doesn’t know about?
These are only a couple of things that you can pinpoint and improve on your website.
It’s worth mentioning that ever since 2015, Google has been using AI and its RankBrain algorithm in order to offer only the best and most relevant results to its users, which is why you should make sure to polish your website and SEO if you want to stay on top of your game.
How to conduct a server log file analysis
Let’s discuss several ways of analyzing log files and how to use results for the purpose of search engine optimization.
1. Is your crawl budget being wasted and where?
Crawl budget refers to the number of pages on your website a search engine crawls within a given period of time. A page that hasn’t been crawled won’t be indexed, and as a result, won’t be ranked. In other words, Google won’t show it in relevant searches.
If the number of pages on your website exceeds your crawl budget, you’ll have unindexed pages on your website.
It’s worth mentioning that in most cases, you won’t have to worry about this, but there are three situations in which you’ll have to keep an eye on your crawl budget, and they are:
- Your website is big. If you’re running an eCommerce website with more than 10,000 pages, Google can have a hard time indexing them all.
- You added a lot of new pages. If you add a new section with more than a hundred new pages to your website, it’s important to make sure you have the crawl budget so that they can be quickly indexed.
- A lot of redirected pages. A great number of redirects and redirect chains means that you’ll go through your crawl budget quickly.
However, your crawl budget can sometimes be wasted on low-value and irrelevant pages, thus preventing some important ones from being indexed and ranked.
These low-value URLs that negatively impact crawling and indexing and waste your crawl budget can be:
- Duplicate content
- Low-quality, spammy content
- Soft error pages
- Dynamic URL generation and session identifiers used for eCommerce stores.
By optimizing your crawl budget, you can free up a significant portion of your resources for the most important and valuable pages on your website. You can do this by using a robots.txt file to block and exclude certain URLs from being crawled.
2. Has your website switched to mobile-first indexing?
As of July 2019, Google started predominantly using the mobile version of websites for indexing and ranking purposes.
Given that the majority of people now use their mobile devices for searching the internet, it’s only logical that Google adjusted its best practices in order to optimize the user experience and offer the best and most relevant results.
If you want to make sure that all your important pages are crawled and indexed in a timely manner, as well as that your rankings don’t suffer, you should optimize your website for mobile. Focus on responsive design as it will allow your visitors to view and interact with your website from their mobile devices, as well as increase its loading speed.
However, to determine whether your website receives the increased crawling by Googlebot Smartphone, you should check your server log files. Usually, a website that’s still on regular indexing will be crawled by the desktop crawler in 80% of the cases and 20% by the mobile one. On the other hand, if you have switched to mobile indexing, this ratio will reverse.
3. Are your preferred search engine bots crawling your pages?
As Google is the dominant search engine, it’s very likely that you’re trying to optimize your website for it. In that case, you should check whether Googlebot Smartphone and Googlebot are regularly accessing your website pages.
By analyzing server log files, you’ll be able to find out which search engine bots are your most frequent visitors. If everything is as it’s supposed to be, then you’ll see Google’s bots in your results.
Also, it’s a good idea to check how frequently unwanted bots visit your website. For example, if your main focus is the European, UK, or US audience, you should keep track of the number of visits by Yandex or Baidu bots, and in case the number is high, you can block them.
4. Identify errors in code responses
A log file analysis can also be used to identify errors in code responses such as 4xx and 5xx as they can hurt your SEO.
What are 4xx errors?
4xx errors are so-called client errors – “Page not found” or “The site or page couldn’t be reached.” This means that the request has been made but that there’s an error on the site or the page doesn’t exist. For example, if a highly relevant page returns a 404 code, you should leverage a 301 and redirect it to the next most relevant page. You should know that too many 404 errors will get in the way of proper crawling.
Similarly, if you spot a 410 error, meaning that the page in question is gone and isn’t available from the server, what you should do is remove any reference or link leading to that dead page from your content.
What are 5xx errors?
5xx errors or server errors, mean that a valid request has been made by the client, but the server couldn’t complete it.
A 500 is an internal server error, and it will prevent both bots and visitors from accessing your website. Since Google holds well-maintained websites in high regard, you should investigate why this error appears and fix it.
5. Inspect slow or large pages
If we bear in mind that the bounce rate skyrockets if it takes more than 3 seconds for a web page to load, it’s clear that you should be very careful about large pages with lots of elements.
Similarly, bots use the time to first byte, time to last byte, and time to full page load metrics in order to establish how fast every particular web page is. These metrics can significantly impact the way your website is crawled.
Let’s not forget that page load speed is an important ranking factor, which is why you should use log files to analyze the largest and the slowest pages on your website.
This report will allow you to start optimizing these pages and speed them up.
Some of the factors that can result in sluggish websites are:
- Lots of high-resolution images on your web pages
- Video auto-play
- Custom fonts.
A large web page can load fast if the images have been compressed and auto-play disabled. Check the Average Response Time column in your analytic tool and identify the ones with the slowest load times.
6. Find orphaned pages
With the help of server log files, it’s easy to discover orphaned pages, that is, the ones that aren’t linked to by any other pages on your website. This means that your visitors can’t find it by browsing through your website.
Once you spot orphaned pages, you should use internal linking to make them easily discovered by your visitors and more easily crawled by Googlebot.
Such pages appear for a variety of reasons, including:
- After changing the structure of your website
- After the content has been updated
- Due to incorrect internal and external linking
Leverage server log files to optimize your site
Server log files can be used to significantly improve your SEO efforts and help your website to rank better in Google search results. Leveraging them will also help you bring in more traffic and increase your conversion rates onsite. This is one simple method that can help you explore technical SEO concepts early on without having to dig too deep into your website setup.