-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure SEO tools don't complain about trailing slash #4327
Comments
Although you're right about many SEO tools complaining about things that aren't important, this one actually is. Search engines do treat URLs with and without trailing slash as duplicate content if they serve the same content. The best solution is to let the users decide whether they want to use trailing slashes or not and then 301 redirect to whatever the user has set in project settings ("use trailing slashes" or "don't use trailing slashes"). The next best solution is to use a rel="canonical" tag on every page that links to the version of the URL (either trailing slash or non trailing slash) that search engines should consider the primary one that will get indexed. However, this doesn't solve the crawl budget/efficiency issue: if both versions return a 200, crawlers will crawl 2x the number of HTML pages just to get served duplicate content, which is far from optimal. Ref: https://developers.google.com/search/blog/2010/04/to-slash-or-not-to-slash
|
@infinitnet is there any evidence you can link us to about this? I am genially curious. It sounds really dumb if they behave that way. I would think google and co are smarter than this. |
https://developers.google.com/search/blog/2010/04/to-slash-or-not-to-slash Also Google:
Sounds like this is generally not a big deal, the only problem there might be is that google uses twice more budget for slash and without links. If you have couple dozen of pages this is a non-issue at all, if you have a giant site and you are hitting the limits of crowling budget - then yes, it becomes a problem. Anyways, we will fix it soon. |
@kof A huge part of Google's communication is tailored for PR, ie. they tend to claim that Google is smart enough to figure out X or Y when the reality often looks different. I don't have a case study on this particular issue because 95% of websites I work with use WordPress which redirects by default. But I'm happy to reproduce this issue for the /blog section of your site so you can see it for yourself. On your docs subdomain you already seem to redirect from non trailing slash to trailing slash but your /blog section doesn't and it also doesn't have canonical tags. If you're okay with this, just let me know and I'll run an experiment where you'll see "Duplicate Without User-Selected Canonical" errors pop up in Google Search Console in a few days as well as the trailing slash variations of your blog posts being indexed in Google Search and they should get some impressions as well.
Yes, but a lot of sites have more than a few dozen pages - especially if you do programmatic SEO for example which Webstudio is useful for. And even with a few dozen you might not actually run into crawl budget limitations where Googlebot would stop crawling all pages, but it's still a site-wide quality signal. If 50% of the URLs are duplicates, it affects the overall site quality scores negatively. Google has had issues with scale/resources for years already and AI content has only made this worse, meaning that domains with a lot of "useless" pages lose crawl priority and often also rankings over time (the more pages the worse this gets of course). Their recent "helpful content updates" play into this as well where sites with a lot of "unhelpful" pages rank worse. This is mostly related to user signals and how visitors interact with the sites/pages, but if 50% are duplicates that are just "there" without getting visitors, it plays into that part of the algorithm as well. Ref: https://developers.google.com/search/blog/2022/08/helpful-content-update
|
@kof Our experience in a heavily SEO focused field is: Googles Crawlers are both smart and extremely basic. And you cannot predict which facet you encounter. Businesses still see bad damage, due to duplicate content by staging.acme.com getting indexed (where one could also assume that bots were smart enough to skip subdomains with this name). I consider the duplicate slash topic far too dangerous to take a risk. I'd allow users either pattern (with and without slash). The opposite pattern should get redirected automatically. |
Currently both trailing slash and without work as a url, which means there will be links for both and search engines might think its a duplicate content (unlikely but SEO tools complain like its the 90s anyway)
The text was updated successfully, but these errors were encountered: