How To: Normalize URLs Stored in MySQL

After discovering an issue with a partner's URL redirection implementation, we had to change our strategy and clean up our database with a simple query to normalize URLs.

I came across an interesting problem the other day. As part of our URL normalization strategy at AlertMe, we have been adding a trailing slash to URLs without file extensions. We did a lot of research when deciding on this tactic and the general consensus around the web was to use trailing slashes for directories and (obviously) no slashes on filenames. See this article from the official Google Webmasters blog: https://webmasters.googleblog.com/2010/04/to-slash-or-not-to-slash.html (I know it's old, but the concept is still relevant).

We even tested a number of publisher URLs to see what their redirection strategies were. Every one we tested responded correctly to both the slash and no-slash versions of the URL. Some redirected to a trailing slash, some redirected to no trailing slash, but they all returned (or redirected to) the intended page.

Well, the other day we came across a publisher that returned 200 status codes with an Unhandled request message in the body. This first caused issues with our crawlers as the URLs were technically returning valid pages, but with no content. Then there was the issue of redirecting users to these URLs. They would get the Unhandled request messages instead of the correct page. 🤦🏻‍♂️

Long story short, we had to change our strategy to remove trailing slashes from ALL URLs. As I mentioned previously, we haven't come across any one who doesn't support no trailing slashes. This required a bit of cleanup work on the database as well. For other publishers, URLs with the trailing slash wouldn't matter. But we had a few hundred records from our new partner that needed the trailing slashes removed.

Of course, MySQL doesn't have a native REGEX replace function, but luckily there was a pretty simple query that took care of the issue. So if you ever need to normalize the URLs in your database, here's a quick fix.

mysql
UPDATE `articles` SET url = if( SUBSTRING(url, -1, 1)='/', SUBSTRING(url, 1, LENGTH(url)-1), url ) WHERE `url` LIKE '%some-domain.com%'

If you want to check the output first, run this SELECT query:

mysql
SELECT id, url, if( SUBSTRING(url, -1, 1)='/', SUBSTRING(url, 1, LENGTH(url)-1), url ) as new_url FROM `articles` WHERE `url` LIKE '%some-domain.com%'

Not overly exciting, but this was a helpful pure MySQL workaround for cleaning up our URLs.

Comments are currently disabled, but they'll be back soon.