Crawl efficiency: How to help Google slip through your site with David Pagotto (NEWBIE)

Crawl efficiency: How to help Google slip through your site with David Pagotto (NEWBIE)
Reading Time: 25 minutes

Crawlability is super important for your website.

Ensuring humans and those sticky little Google bots can slip through your site like a damp otter down a pipe is crucial.

We don’t want duplicate content and we don’t want oodles of pointless pages and archives being crawled.

We want slick, efficient, uber crawlable websites and today we’ll be going through how to make that doable.


Tune in to learn:

  • What crawlability is and why it matters
  • The meaning of crawl budget
  • How often Google crawls your site
  • How you can get Google to crawl your site more
  • The key elements of crawlability
  • How to handle password-protected content
  • Zombie pages and how to deal with them
  • Best tools to crawl your website
  • How to handle 404s and 301 redirects
  • Dealing with duplicate content


Listen to the podcast






Sponsor love

This episode is proudly sponsored by Ahrefs which offers tools to grow your search traffic, research your competitors and monitor your niche. Ahrefs helps you learn why your competitors rank so high and what you need to do to outrank them. Start a 7-day trial for $7 (Psst this is not an affiliate link.)



Share the meme



Share the love

If you like what you’re hearing on The Recipe for SEO Success Show, please support the show by taking a few seconds to leave a rating and/or comment on iTunes, SoundCloud, Spotify or Stitcher. Thanks!

And big thanks to Tdamji for their lovely review.


About David


David Pagotto is the Founder and Managing Director of SIXGUN, a digital marketing agency based in Melbourne. He has been involved in the digital marketing space for over 10 years, helping organisations get more customers, more reach, and more impact.

SIXGUN focuses on scaling organisational growth and building sustainable results for the long-term, with a data-driven approach to developing strategy.



Connect with David


Useful resources




Kate Toon:                          Okay. Crawlability is super important for your website, ensuring humans and those sticky little Google bots can slip through your sites like a damp otter down a pipe. Well, it’s crucial. We don’t want you to give consent. We don’t want oodles of pointless pages and archives being crawled, we want slick, efficient Uber crawlable websites. And today we’re going to be telling you how to make that doable.


Hello, my name is Kate Toon and I’m the Head Chef here at The Recipe for SEO Success, an online teaching hub for all things related to search engine optimization. And I love SEO sometimes, not all the time. And today I’m talking with David Pagotto. How are you doing, David?

David Pagotto:                  I’m doing very well. What about you Kate?

Kate Toon:                          I don’t know, I’m doing okay. It’s been a tough week. School holidays here, so never fun, but let me introduce you to the world, to the world of The Recipe for SEO Success, anyway. So David Pagotto is the founder and Managing Director of SIXGUN, a digital marketing agency based in Melbourne in Australia. He’s been involved in the digital marketing space for over 10 years. Helping organisations get more customers, more reach, and more impact. SIXGUN focuses on scaling organisational growth and building sustainable results for the longterm with a data driven approach to developing strategy. That all sounds very exciting. What’s it like in Melbourne stay? Because I’ll be in Melbourne next week for Copy Con. What’s the weather like?

David Pagotto:                  Very exciting. Oh look, it’s a bit of a mixed bag. You know, it’s this clouds forming. It’s a bit grey outside.

Kate Toon:                          For those of you who don’t live in Australia, the joke about Melbourne is you have four seasons in a day, is that right, five seasons. Four.

David Pagotto:                  Yeah it does. Yeah, it does. It does, yeah.

Kate Toon:                          Yeah. But it’s a lovely city you should visit if you’re ever there. Anyway, let’s get stuck into crawlability. We’re going to start off with defining crawlability. So David, give us your definition of what crawlability actually means.

David Pagotto:                  So crawlability is all about serving valuable content to the search engine crawlers, and ensuring that valuable content is crawled as efficiently as possible. And it’s also equally about removing and blocking what you don’t want them to crawl.

Kate Toon:                          Yeah, so basically for those who are really at the beginning of their SEO journey, every search engine has three elements to it. It has a crawler, or a bot, or a spider, or whatever you want to call in it. It has an algorithm which kind of sorts that data out and decides what it’s going to do with it. And then it has a search engine results page. So we mostly talking about Google today, but obviously things like YouTube are also search engine. Facebook, Instagram, they’re all search engines because they have those things and they all crawl through content. But we’re focusing today on really on website crawlability and Google really, because it has the biggest market share here in Australia and in the world. So for your average website only, your small business, your eCommerce site owner, why does crawlability matter so much?

David Pagotto:                  Look, ultimately crawlability is all about getting that edge, that additional edge to give you that additional ranking performance boost. And crawlability and crawl efficiency really cover an umbrella of technical tasks that we’ll probably dive into throughout this podcast. And it really reminds me of, you know, I hope some of the viewers have seen the movie, The Aviator or else this isn’t going to make the most amount of sense. But in the movie, The Aviator, Leonardo DiCaprio plays, Howard Hughes. And in one of the scenes they try to break the speed record in a H-1 Racer aircraft, and you know, it shows Leonardo DiCaprio or Howard Hughes talking to his tech team and telling them to shave down the rivets on the plane to make it as slippery as possible. And it’s such a good … the way he says it, it just always reminds me about the stuff, you know, shave down the rivets, make it slippery. You know, some of this stuff with crawl efficiency, it is the one percenters, we’re not fighting over the necessarily some of the bigger elements that make up search engine performance.

David Pagotto:                  Sometimes we are when we get to content, things like that, but often it’s about the one percenters, and if you want that edge, crawl efficiency is something you need to consider.

Kate Toon:                          Yeah, I like to call it the pointy end of SEO, when you’re done all the obvious things you get to the point where you are literally just trying to find tiny things. You are shaving down rivets, but then equally I think a lot of sites have major crawlability issues, but they’re just not aware of them. The classic one, the IC so often with WordPress site owners is where they’ve accidentally checked the discouraged Google on search engines from crawling the site, and their site has been live for two years, or they’ve got a block in their robot’s TXT the developers has left there for [crosstalk 00:04:36].

Kate Toon:                          So those, you know, I think it’s not just the points here and then I think today we’re going to go through the 99 percenters and the one percenters, so keep listening. If you’re thinking, well, I don’t have crawlability issues. I bet you probably do. Now, another phrase that’s bandied around a lot is crawl budget. This year the … these bots get worn out, so after looking at a certain number of pages, can you take us through what crawl budget means?

David Pagotto:                  Yeah, so essentially, you know, based on the size of the side and the authority of the side, it’s basically allocated a certain amount of crawl budget. So a website like the, today you or Sydney Morning Herald have massive crawl budgets, massive. The amount of pages that would be crawled every day would be astronomically high. Whereas if you’re Barry the plumber for example, small WordPress site, 15 to 20 pages, you know, your crawl budget is going to be a lot less and rightly so.

Kate Toon:                          Right, so it’s all about poor Barry, poor Barry and his tiny crawl budget.

David Pagotto:                  Exactly.

Kate Toon:                          We’re going to talk about Barry quite a lot in this podcast. Get ready. David and I have got everything for Barry. So let’s talk about some of the key elements of crawlability. Some of these you’ll have heard of, some of them you won’t, dear listeners. So the first one that you might have heard of is a robot’s TXT file. And that’s one of the files that tells Google what it can and can’t look at it through a series of allow and disallow statements. Most of the content management systems that exist like Shopify and Squarespace will just generate this for you. You don’t have to really do anything.

Kate Toon:                          With WordPress you can instal Yoast, and Yoast will create a standard robot’s TXT file and often you don’t really have to fiddle with it. It just use the default. And then the other part of that is the sitemap.xml, which is a kind of ever growing list of all the pages, posts, products, and maybe other things on your site. And again, most of the content management systems out there generate this for you or you can use something like Yoast to create it if you’re on WordPad … WordPress. But David, what are some of the common problems you see in people’s robots and TXT files and their sitemap.xml files.

David Pagotto:                  So I think probably the biggest things that I make sure that you’re not blocking the entire website in a robot’s text file. That’s quite common exactly as you described. You know, developer builds a new website, it goes live and then you know, a year or two later when they’re like, “We don’t get any traffic organically what’s going on?” They reach out to an SEO agency and it’s pretty clear that they’ve done a disallow all inside the robot’s text file. So making sure that the robot’s text file isn’t blocking things that you want to make sure unblocked is pretty critical.

David Pagotto:                  But all so making sure about the sections of the site that are being blocked that you want to block. And for example, if you have sections of the website that house … you know on the server you might have like a folder for all of your assets. You might have like PDFs, you might have doc files, you might have html pages, whatever that may be. Making sure that kind of thing is blocked. Like you said, most of the content management systems handle it really well naturally and you don’t have to go in there and make changes but sometimes you might want to go in there and block things like parameters, block things like tags and categories depending on contents being duplicated and things like that as well.

Kate Toon:                          I’m just going to expand on that one a little bit because I think you know, just make sure people understand. So in like for example Yoast, you can go in to the taxonomy section of Yoast and say, “Look, I don’t really want Google crawling through my author archive or my tag archive.” Because every time you pop a tag into the … inside WordPress it’s going to generate a little archive of all the blog posts that have been tagged with that tag. And those pages could compete … be competing with your really good pages, your product pages and your service pages.

Kate Toon:                          Like, for example, if I’ve got a tag copywriter and I’ve tagged lots of posts with that, and I’ve got archive set up, I could also have my product page, which is kind of selling me as a copywriter. And those two pages could be competing with each other, so often SEO types and web developers will say, “You know, disable your tag archive, your author archive. Sometimes your projects category archive, even your blog category archive, sometimes kind of dependent on your site.” And that’s very easy to do with Yoast. So that’s a good thing. What about no indexing, no following individual pages? Can you take us through that a little bit?

David Pagotto:                  So yeah, I mean the best way to think about it is that you’ve got a robot’s TXT that sits at the top of the pyramid and that controls what the search engine crawlers have the ability to actually crawls. If you disallow something there, they won’t bother even crawling it. Now you can also in the Meta robots set up tags like no index and no follow, which say the page will be crawled, but if it’s no index it won’t be added to Google’s search index. So it crawls the page, but it doesn’t put it into index. And the no follow attribute basically sets out that the default links on that page automatically applied the no follow attribute.

Kate Toon:                          So why … what would be the benefit of having a page, which is crawled but not indexed?

David Pagotto:                  Well, generally speaking we would prefer to have it not crawled at all, because we … the problem with using … and in most instances it’s absolutely fine, particularly on the small side. But the problem with having pages that are crawled but then set to no index is that you’re wasting crawl budget efficiency in terms of getting them to crawl a page that you’re telling them not to index. So it’s kind of a little bit counter intuitive.

Kate Toon:                          Yeah, it doesn’t really make much sense.

David Pagotto:                  Doesn’t really make much sense.

Kate Toon:                          And another thing a lot people stress about is, password protected content and content that’s behind a login or a membership, and you know most … again it should be said that most of that content you wouldn’t necessarily want it to be crawled or indexed possibly. If it’s completely like you know private content, you don’t really want to give that away and sometimes you might have it indexed but then when they click through this some kind of password protection. So usually on most of the membership plugins that you get with … again with WordPress and with other Content Management Systems, this is all handled for you. As soon as you put something behind a log in or password protection, it’s going to be fine. It’s not going to be, it could be crawled, it could even be indexed, but no one’s going to be able to see the content. Is that right? Does that sound right?

David Pagotto:                  That’s right. I mean it depends on the setup of how you do the password protection and depends why you want to protect it. But it’s very easy to get that content crawled and indexed. And it’s also easy to block it depending on why it’s there to begin with.

Kate Toon:                          Yeah, exactly. It depends on how you want to do it. So for example, the content within The Recipe for SEO Success is all, it’s not indexed or crawled and the content for like my SEO Nibbles course is crawled and is indexed. But you have to log in to see it because I kind of want people to know what’s in there, but I don’t want them to be able to see it until they’ve logged in. So it kind of depends what your business rules are really.

David Pagotto:                  Yeah.

Kate Toon:                          So we’ve got a couple of questions from members of my Digital Master chefs mentoring group and the first one’s from Nadine Crow. And this is something commonly asked about large kind of E-commerce sites. “Would you instruct Google not to crawl certain sections of a very large site to preserve crawl budget?”

David Pagotto:                  So I think ultimately if there’s value in the content then it should be available to crawl. There’s no reason why it should be blocking sections of a site that is valuable in order to prioritise other sections of the site. It doesn’t matter how big your site is, the crawl budget will allow for it to be crawled over time, unless it gets stuck in some kind of infinite loop, which sometimes happens. But generally speaking, if the content is valuable, you should have it available to crawl and it might not be every day, it might not be crawled every week depending on your authority. But it will be crawled it will be indexed and if it’s valuable it will be ranked.

Kate Toon:                          So I have seen, you know, I think there’s a recent talk at brightness SEO where someone talked about how for a largely comm site they did kind of cut off certain sections of the site to promote other sections of the site. It was kind of TAT school. It was, you know, they tried different things over a couple of months. You wouldn’t recommend that?

David Pagotto:                  Well it depends on what section of the site as well. So you know, if there’s valuable content it should be available. But if the content is duplicate, if it’s thin, if it’s a page that shouldn’t really be there. If it’s consolidating products where people have colour variations at separate products back to one. There’s all sorts of reasons why you would not index or block content. But I would say that if the content is valuable, it should be crawlable and it shouldn’t be indexable.



Kate Toon:                          So I guess that brings us onto the next kind of thing, which is like talking about the reasons why you would stop content being crawled and as you’ve mentioned a couple of SOG content pages, which often happens on an E-commerce site where they’ve got like 17 variations just for colour and having a dropdown. And also sites where you know, maybe there’s lots of thin content from back in the day and also Zombie pages. I love your … tell us what a Zombie page is. I love that expression.

David Pagotto:                  So you know, a Zombie page is … can be defined as a page that’s kind of, it’s on the site, but it has ultimately no value or very little value. So good examples of this kind of stuff would be pages that are automatically generated from the demo theme that you installed. And then there’s a whole bunch of these lower MIPS and pages.

Kate Toon:                          Yeah. Hello World.

David Pagotto:                  Hello. Yeah, people that leave the Hello World app. Like you should be … you’d be surprised how many people we get come to us and still have that log.

Kate Toon:                          Oh my God. I have so many people on the course and I’m like, “Is … have you actually read a blog post called Hello World?” I’m just going to suggest it’s not a great title.

David Pagotto:                  That’s weird. That’s odd. Things like when you instal sliders into … or like pop up plugins for example, into WordPress. Sometimes it’ll create a bunch of elements that have their own pages.

Kate Toon:                          Yeah.

David Pagotto:                  And then you know, zombie pages can also cover pages where you’ve got content from … perhaps really thin content. Super invaluable. It was put up 10 years ago, you know, and it kind of covers pages that really like when you think about pages that have little to no value.

Kate Toon:                          Yes. And we’ll call them zombie. They are the walking dead of pages and they-

David Pagotto:                  That’s right. And then there’s all sorts of ways that they manage to creep their way into the website often totally unknown to the owner.

Kate Toon:                          Yeah, exactly. And it’s one of the first things we do on my big SEO courses. We get the students to crawl their site and have a look at all the pages and we use a couple of different methods to do that. The first is the classic site colon which you just type into Google and that will give you that … I love that one because it scares people. And they go, “Why have I got 6,000 pages indexed?” Then it’s like, “Ah, now.” Because I mean you’ve got 10 pages on the site. So that’s a really good one for quickly identifying issues. And then both of our favourites all is obviously Screaming Frog. We have freed up to about 500 pages. And you might think, well I haven’t got 500 pages. But then when you see all the random things that are being crawled, you might eat up that Screaming Frog budget quite quickly.

Kate Toon:                          And you know, this is a really important thing for anybody to do is to crawl your site and see what content is being crawled. And Screaming Frog is great. It’s going to show you your URL’s, your title tags, your meta descriptions, your old tags, your [inaudible 00:15:57] runs. It’s going to show you all your response codes, all your 404s, your 301 redirects, your 200s. Awesome. And I think that often shows up a lot of horror, doesn’t it?

David Pagotto:                  Absolutely.

Kate Toon:                          Yeah. And that’s where you find all your zombies and you have to start deciding whether to kill them. So let’s talk about some different types of kind of crawling issues that come up and often pop up when you run a Screaming Frog or the site content search. The first one, which is for many people a throwback to a little bug in Yoast about six or eight months ago I think, and is finding that your … all your media attachments have been crawled. So for listeners you might not know, but by default often what happens is websites will create a page for every media file that you upload into your site, and you don’t really want that. You don’t want a page which is just your image on the page. It doesn’t make sense out of context. So how do you deal with media or attachments being crawled, what’s your solution to that?

David Pagotto:                  Generally the best way to handle that is to blocking the robot’s text file or you know, with Yoast for example, you can set it to no index as part of the options as well, which is an acceptable way to handle it on a smaller scale.

Kate Toon:                          Yeah, perfect. I’m sorry, I was just taking a big sip of water there and expecting you to talk for longer but you didn’t. Listening to me gurgling in the background. I do apologise. Yeah, so I think for anybody who went through that Yoast issue, it’s being resolved now. A lot of people use the Yoast purge, where they kind of purge those major attachments from the index. If you’re still seeing some show up, then I think it’s again, just check your settings in Yoast and sometimes you do just have to wait for things to fade away. I’m not sure I would be going through, and three I won’t redirecting lots of random [inaudible 00:17:40] attachments, would you?

David Pagotto:                  Yeah, that’s right. I would just set everything to no index or block it, and it won’t take long for Google’s crawls and the other search engine crawlers to work their magic.

Kate Toon:                          Now another thing that often comes up in Screaming Frog is lots of random CSS and JavaScript files that are being kind of crawled, because obviously things like WordPress, every bleeding plugin adds another CSS and another JavaScript. How do you handle those? Do you just let them be crawled or again, do you [crosstalk 00:18:08]?

David Pagotto:                  Yeah. You know I think generally we just leave them in place. It isn’t a problem if there’s, you know, a small number of days, like an expectedly small number. But sometimes, if you do a crawl and there is a massive number of JavaScript files, CSS files, all sorts of things, you know there’s definitely a consolidation paste phase to happen. But for the most part, you know, having a few bits and pieces pop up there and in Screaming Frog isn’t going to cause any problems.

Kate Toon:                          And I think that’s a really important thing to touch on. Because I think lots of people will go into search console and they’ll see lots of 404s and broken links or they’ll get some random little things coming up in Screaming Frog, and it’s like some of these things are acceptable. It’s okay to have some 404s. You don’t need perfect sight hygiene. And so I think some people get obsessive about it and then redirect every single 404 and it’s like they were probably better, if you’ve got limited time. If you are DIY-ing your SEO, there’s probably better things you’d be spending your time doing that will have a bigger impact on your ranking and conversion and traffic and all that kind of thing. So don’t get obsessed. Don’t fall down the rabbit hole. So let’s talk about a few other things.

Kate Toon:                          A big issue obviously for most people that’s going to have the biggest impact on crawlability, is GPT content. Now sometimes this happens without people knowing and sometimes it’s almost like a deliberate, the example used a lot is the kind of the E-commerce store with all the iterations, but there are other instances of it as well. So what’s the … What are your thoughts on GPT content? Because most people often say, “Oh, you know, stats are real Canonical on the one that you want Google to love and then just leave it like that.” But you don’t think that’s the best solution, do you?

David Pagotto:                  Well, look the … it’s far better. I mean just from a crawl perspective to have a single page that has the content versus 10 where nine have a canonical tag pointing back to the master.

Kate Toon:                          Yes.

David Pagotto:                  You can do that. And you know there is a level of, okay, from a duplicate content point of view, you’re not going to get pinged by Google. You know, you’re not going to get new problems. But from a crawlability point of view, you’re wasting a lot of efficiency. You’re creating a lot of drag. Those rivets certainly aren’t shaved down.

Kate Toon:                          You and your rivets. No, I agree. And I also think from more importantly than all of this, from a usability point of view it’s effing annoying to have a page where I have to look at another colour Swatch, I have to reload the page because no matter, even if you’ve shaved your rivets down in the site is loading under three seconds on a mobile phone, that’s still frustrating to wait for that. So I think there’s usability issues as well. But let’s … you know another classic [inaudible 00:20:40] content issue. I’m coming back to Barry. Barry the plumber and his plumbing Peter’s gym, plumbing New Town, plumbing Stanmore, plumbing whatever. Copied his page, he’s changed one word, got loads of them in his Fitzer and that bloody annoying thing is is that they work their ranking. When you Google it, he is ranking and yet Google continuously tells us this is not what we should be doing. It’s not good for the user. It’s not good for anybody. But Barry’s getting away with it. Why David? Why?

David Pagotto:                  So yeah, I mean look, the truth is like you said, they still … those kind of pages still really work, and there’s a lot of … when you look at the information that comes out of Google often I like to equate it to asking a police officer the information about what’s legal and not legal. And often, the answer you get is so black and white. That’s definitely illegal. And Google have taken the stance that these pages are definitely illegal. Definitely frowned upon them. We definitely don’t want them and they’re definitely bad. And in a lot of instances they certainly are like in terms of where people just use the same content and change one, retain to the location and duplicate it out a thousand times. And we have clients come to us where they’ve done that previously and we have to start cleaning things up. And it turns … it can be very messy. So …

Kate Toon:                          Yeah.

David Pagotto:                  On one side of it, you’ve got the fact that you know, this kind of thing … I mean, I would say the duplicate, if you approach it with the attitude of duplicating out a massive number of pages and that being duplicate content and providing no user value whatsoever, that is setting yourself up for failure, big time. Even if you’re getting some initial burst of performance in the short term.

David Pagotto:                  from the other side of it, when we look at Barry, it’s possible for Barry to be servicing nearly every day, three or four different suburbs around him where he’s got awesome unique contents, there’s testimonials from clients in those areas. The content speaks to his work that he does within the areas and perhaps a variation of types of work and why it’s there. You know, you can do like, when I’ve seen this done really well, I’ve done videos that talk about that specific area. I mean there’s a way to do it well. And there’s a way to do it terribly. And when it’s done, when Barry the plumber does three location pages for areas that he services and the content’s awesome. And it’s like, there’s videos and there’s client testimonials from those areas and there’s like some really specific content and it actually adds values to users going there.

David Pagotto:                  And they know that Barry is in that area that he services and that people are happy in that area. I mean there’s a way to do well. And then there’s the way where you’ve got someone that’s trying to generate leads and then sell them. So they’ve created 5,000 pages were the only changes is the suburb and every other piece of content in there as duplicate. You know, it’s this massive spec … there’s a massive difference on the spectrum of doing that well with the right intention, with the right philosophy and just taking it to a level that obviously, you know, Google hates. And you know, you will see the repercussions for that at some stage if not already.

Kate Toon:                          Yeah, I agree. And I think as well, once again as you said, adding testimonials and adding directions and your favourite coffee shop nearby and where you can park and bus details. All of that stuff is actually useful because these pages, although often they do rank when you click through to them and start reading them from a humour perspective, as you said, they’re horrible. They’re not going to help you convert. So you’re great, well done. It’s almost vanity ranking. You’ve got to position one, but is that page actually generating any inquiries or conversions? It’s probably not, because it is a combination of the two, so yeah.

Kate Toon:                          Okay, so we’ll leave Barry alone a little bit. Let’s talk about some common questions we often get in the I love SEO community on Facebook. So obviously one of the ways that we can deal with, pages that are no longer there or four … pages that lead to 404s is to set up 301 redirects. So this page has gone Google or it … we’ve created a new piece of content over here, follow this 301 redirects and that happens, that’s awesome. But the question people ask is how long do you have to leave those 301 redirects up like for the rest of time or will Google eventually caught on and you can delete them.

David Pagotto:                  Look generally, you know, as a part of let’s you know … if you have a plugin for example inside WordPress that handles the redirects and you add that redirect into that kind of plugin. Generally it’s something that lasts indefinitely.

Kate Toon:                          Yeah.

David Pagotto:                  And you know, 301 is meant to be a permanent redirect and you have to be mindful that if you’re going to redirect one thing to another thing, it has to make sense for it to be redirected. Like it’s … sometimes a 404 is the right answer. If you’re discontinuing something that you’re not going to be … there’s no relevant pages that tie to that on the website for whatever reason, you know it’s acceptable to have a 404, like it’s not 404 saying, look, we no longer really do this, but you know, you have an interesting 404 page and give users search functionality. You can do some fun quirky stuff that you often see these days, which is really cool. But to answer the question of how long, I mean a 301 redirect should be ultimately set up with the idea of it being permanent.

Kate Toon:                          Yup. So again, there’s some good points there. I think we told Harry in the podcast that, you know, having a few 404s is not the end of the world. In Google search console John Mueller has said that they are in order of priority so you can work from top to bottom and then just mark as fixed. But yeah, and I think even if you are totally anal about this kind of thing, someone somewhere is going to see your 404 error page. So make it sexy. From a copywriting perspective I always find it a really fun area to kind of give a little brand experience and just continue that tone of voice and do something playful. Often as well I think some people, especially with E-comm stores, you know they’ll read the direct … the discontinued product to another product and then that product will discontinue and they’ll redirect it to another product.

Kate Toon:                          And that can lead to these redirect chains, which again really mess with your crawl efficiency. So if you’re listening to this and people often ask this question, if you have a product that is never ever coming back into stock, then I think it’s a good idea to redirect to the product category and let people just see the wealth of what you have. If it’s going to be coming back at some point, leave the page there with an out of stock message and have some kind of data collection that says, “Hey, this is coming back. Pop your email in here and we’ll notify you when it comes back.” It’s a really powerful way of actually driving more sales.

David Pagotto:                  So one thing to add to this is there’s a bit of a difference between an internal 301, and an external 301 in terms of how we view it. So when you run a Screaming Frog crawl for example, what you’re seeing there is 301s that exist on the website itself, and those 301s should be swapped over to direct links where possible. You know, that’s how those should be handled. Whereas it’s okay to have the 301 in place in the background. For example, a page that it’s no longer internally linked to. So from an old page, an old product page to a new product page of the new variation of that product, you set up a 301 redirect there, but you don’t want to see any internal 301s on the website itself.

Kate Toon:                          Yep, makes sense. Okay. Let’s talk about a few final bits and bobs before we wrap up this episode on crawlability. Again, other types of redirects. If we’re redirecting from http to https, which is more often done in a core file rather than a 301, like in your ht access file or something like that. Another one that we leave forever, right?

David Pagotto:                  Yeah, absolutely. I mean you shouldn’t have in this day and age, everything we know about https, everyone should be on https with an SSL certificate. So if you have a WordPress instal and it’s installed on the https version. All of the other versions, the http, the www, non-www, however … whatever variant there is, should 301 redirect are a wild card to http version. And it should be done forever.

Kate Toon:                          Fabulous. And I guess the other one we’ve talked about this people, but people get very antsy with me about domain redirects. So maybe they’ve changed their domain and you know, they’ve moved to a new one and they sets up redirects and then it comes around again. They’re like, “Well, do I need to keep the domain for another year?” And like you said, it’s like $9 or something. And they’re like, “Yeah, but you know, can’t I just let it expire and look.” And your approach is just pay the $9, right?

David Pagotto:                  That’s right. Keep the domain, make sure it’s redirected. I mean, we see this problem all the time when a client has a migration strategy across from your website or a new brand name and all this kind of stuff. And we have seen some absolute disasters where people have let domains lapsed with so much authority and they should have been a proper migration strategy in terms of redirects and things like that, that weren’t done. You know, once you lose that domain and it goes into someone else’s hands, it’s very difficult to kind of try to get it back. So for the cost of the domain, for the brand value, for everything that you’ve built previously with that domain, you just keep it.

Kate Toon:                          Yeah. And again coming back to those internal links, often internal links are set as absolute links rather than relatives. So you’ve got the full URL in there and you’ve forgotten to change it and you’ve got links within your own site, pulling things that are old domain that’s now … someone else owns.

David Pagotto:                  Yeah.

Kate Toon:                          I’m not using myself as an example. I never did this. This did not happen to me. It happened to me though. Even the people who are supposed to know what they’re doing sometimes forget that couple of different internal links that they didn’t change because they’re not idiots. Yes, keep the domain people don’t let it expire. So we’ve talked a lot about WordPress. And obviously sites like WordPress make all this kind of stuff quite easy because you’ve got plugins like Yoast, you’ve got 301 redirect plugins, you know you can get in there and fiddle with the robot’s txt and the sitemap and kind of make changes. But then we have other platforms like Shopify which don’t let us fiddle. And so you know, is there much we can do about improving our crawlability on sites like Shopify.

David Pagotto:                  Yeah I mean, I’d focus all on that tactical side of things in sites like that, largely things like the robot’s text file and sitemap and all that kind of stuff that Shopify would handle natively is done quite well. And you don’t need to really worry about that. But things like duplicate content and things like the zombie pages we talked about and things like, even just like, you know, compressing images before they go up on the site and all of that like tactical stuff can still be executed on site that’s built in Shopify for example. So I focus on the tactical side of things.

Kate Toon:                          Yeah, I mean focus on what you can fix basically, because you can’t fix what you can’t fix. So that’s fantastic David. I feel like I know a whole lot more about crawlability and I hope the listeners do too. Thank you so much for coming on the show.

David Pagotto:                  Kate. It has been an absolute pleasure. Thank you so much for having me.

Kate Toon:                          Fabulous. So as you know, if you have any questions about crawlability, feel free to head to the I love SEO Group on Facebook. And at this time I usually give a shout out to one of my lovely listeners. So this is from tDumbg and she says, I’ve not only been learning a lot of new things about SEO, but I’ve had success with implementing the things I’ve learned. Well there’s nothing better than that. So thank you for that review and thanks to you for listening. If you like the show, please don’t forget to leave a five star rating and review on iTunes, Stitcher, Spotify, or wherever you heard the show. I’m running out of ones to read out. So if you haven’t … if you’ve been listening for a while and have never got round to it, I would be very grateful if you would pop in and leave one, hopefully a positive one. That would be really nice.

Kate Toon:                          It helps people find out more about the lovely world of SEO. Gosh, I’m babbling, and makes me happy. And also don’t forget to check out the show notes for this episode at where you can learn more about David. Check out his website at … or some useful links about tools that we’ve mentioned in the show. And finally, don’t forget to tune in to my two other podcasts, the Hot Copy Podcast, a podcast for copywriters all about copywriting, and The Confessions of a Misfit Entrepreneur. Until next time, happy SEO-ing.