I came across an interesting problem the other day. As part of our URL normalization strategy at AlertMe, we have been adding a trailing slash to URLs without file extensions. We did a lot of research when deciding on this tactic and the general consensus around the web was to use trailing slashes for directories and (obviously) no slashes on filenames. See this article from the official Google Webmasters blog: https://webmasters.googleblog.com/2010/04/to-slash-or-not-to-slash.html (I know it’s old, but the concept is still relevant).
We even tested a number of publisher URLs to see what their redirection strategies were. Every one we tested responded correctly to both the slash and no-slash versions of the URL. Some redirected to a trailing slash, some redirected to no trailing slash, but they all returned (or redirected to) the intended page.
Well, the other day we came across a publisher that returned 200
status codes with an Unhandled request
message in the body. This first caused issues with our crawlers as the URLs were technically returning valid pages, but with no content. Then there was the issue of redirecting users to these URLs. They would get the Unhandled request
messages instead of the correct page. 🤦🏻♂️
Long story short, we had to change our strategy to remove trailing slashes from ALL URLs. As I mentioned previously, we haven’t come across any one who doesn’t support no trailing slashes. This required a bit of cleanup work on the database as well. For other publishers, URLs with the trailing slash wouldn’t matter. But we had a few hundred records from our new partner that needed the trailing slashes removed.
Of course, MySQL doesn’t have a native REGEX replace function, but luckily there was a pretty simple query that took care of the issue. So if you ever need to normalize the URLs in your database, here’s a quick fix.
1 2 3 4 5 6 7 |
UPDATE `articles` SET url = if( SUBSTRING(url, -1, 1)='/', SUBSTRING(url, 1, LENGTH(url)-1), url ) WHERE `url` LIKE '%some-domain.com%' |
If you want to check the output first, run this SELECT
query:
1 2 3 4 5 6 7 8 9 |
SELECT id, url, if( SUBSTRING(url, -1, 1)='/', SUBSTRING(url, 1, LENGTH(url)-1), url ) as new_url FROM `articles` WHERE `url` LIKE '%some-domain.com%' |
Not overly exciting, but this was a helpful pure MySQL workaround for cleaning up our URLs.
Tags: databases, mysql, normalization, redirection, urls
Did you like this post? 👍 Do you want more? 🙌 Follow me on Twitter or check out some of the projects I’m working on. You can sign up for my WEEKLY newsletter too. You'll get links to my new posts (like this one), industry happenings, project updates and much more! 📪