Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure SEO tools don't complain about trailing slash #4327

Open
kof opened this issue Oct 21, 2024 · 5 comments · May be fixed by #4329
Open

Ensure SEO tools don't complain about trailing slash #4327

kof opened this issue Oct 21, 2024 · 5 comments · May be fixed by #4329
Labels
complexity:medium Up to 1 week of work prio:2 Always look for prio:1 issues first before working on prio:2 type:bug Something isn't working

Comments

@kof
Copy link
Member

kof commented Oct 21, 2024

Currently both trailing slash and without work as a url, which means there will be links for both and search engines might think its a duplicate content (unlikely but SEO tools complain like its the 90s anyway)

@kof kof added type:bug Something isn't working prio:2 Always look for prio:1 issues first before working on prio:2 complexity:medium Up to 1 week of work labels Oct 21, 2024
@infinitnet
Copy link

Although you're right about many SEO tools complaining about things that aren't important, this one actually is. Search engines do treat URLs with and without trailing slash as duplicate content if they serve the same content. The best solution is to let the users decide whether they want to use trailing slashes or not and then 301 redirect to whatever the user has set in project settings ("use trailing slashes" or "don't use trailing slashes").

The next best solution is to use a rel="canonical" tag on every page that links to the version of the URL (either trailing slash or non trailing slash) that search engines should consider the primary one that will get indexed. However, this doesn't solve the crawl budget/efficiency issue: if both versions return a 200, crawlers will crawl 2x the number of HTML pages just to get served duplicate content, which is far from optimal.

Ref: https://developers.google.com/search/blog/2010/04/to-slash-or-not-to-slash

If both slash and non-trailing-slash versions contain the same content and each returns 200, you can:
Consider changing this behavior (more info below) to reduce duplicate content and improve crawl efficiency.

@kof
Copy link
Member Author

kof commented Nov 4, 2024

Although you're right about many SEO tools complaining about things that aren't important, this one actually is

@infinitnet is there any evidence you can link us to about this? I am genially curious. It sounds really dumb if they behave that way. I would think google and co are smarter than this.

@kof
Copy link
Member Author

kof commented Nov 4, 2024

https://developers.google.com/search/blog/2010/04/to-slash-or-not-to-slash

Also Google:

Leave it as-is. Many sites have duplicate content. Our indexing process often handles this case for webmasters and users. While it's not totally optimal behavior, it's perfectly legitimate and a-okay. :)

Sounds like this is generally not a big deal, the only problem there might be is that google uses twice more budget for slash and without links. If you have couple dozen of pages this is a non-issue at all, if you have a giant site and you are hitting the limits of crowling budget - then yes, it becomes a problem.

Anyways, we will fix it soon.

@infinitnet
Copy link

infinitnet commented Nov 4, 2024

@infinitnet is there any evidence you can link us to about this? I am genially curious. It sounds really dumb if they behave that way. I would think google and co are smarter than this.

@kof A huge part of Google's communication is tailored for PR, ie. they tend to claim that Google is smart enough to figure out X or Y when the reality often looks different. I don't have a case study on this particular issue because 95% of websites I work with use WordPress which redirects by default.

But I'm happy to reproduce this issue for the /blog section of your site so you can see it for yourself. On your docs subdomain you already seem to redirect from non trailing slash to trailing slash but your /blog section doesn't and it also doesn't have canonical tags. If you're okay with this, just let me know and I'll run an experiment where you'll see "Duplicate Without User-Selected Canonical" errors pop up in Google Search Console in a few days as well as the trailing slash variations of your blog posts being indexed in Google Search and they should get some impressions as well.

If you have couple dozen of pages this is a non-issue at all, if you have a giant site and you are hitting the limits of crowling budget - then yes, it becomes a problem.

Yes, but a lot of sites have more than a few dozen pages - especially if you do programmatic SEO for example which Webstudio is useful for. And even with a few dozen you might not actually run into crawl budget limitations where Googlebot would stop crawling all pages, but it's still a site-wide quality signal. If 50% of the URLs are duplicates, it affects the overall site quality scores negatively. Google has had issues with scale/resources for years already and AI content has only made this worse, meaning that domains with a lot of "useless" pages lose crawl priority and often also rankings over time (the more pages the worse this gets of course).

Their recent "helpful content updates" play into this as well where sites with a lot of "unhelpful" pages rank worse. This is mostly related to user signals and how visitors interact with the sites/pages, but if 50% are duplicates that are just "there" without getting visitors, it plays into that part of the algorithm as well.

Ref: https://developers.google.com/search/blog/2022/08/helpful-content-update

The signal is also weighted; sites with lots of unhelpful content may notice a stronger effect. In any case, for the best success, be sure you've removed unhelpful content [...]

@oslogrolls
Copy link

oslogrolls commented Nov 12, 2024

@kof Our experience in a heavily SEO focused field is: Googles Crawlers are both smart and extremely basic. And you cannot predict which facet you encounter.

Businesses still see bad damage, due to duplicate content by staging.acme.com getting indexed (where one could also assume that bots were smart enough to skip subdomains with this name). I consider the duplicate slash topic far too dangerous to take a risk. I'd allow users either pattern (with and without slash). The opposite pattern should get redirected automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:medium Up to 1 week of work prio:2 Always look for prio:1 issues first before working on prio:2 type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@kof @infinitnet @oslogrolls and others