19 min read

SEO friendly URLs: 10 steps to the optimal site structure

By Julie Molloy, Head of Marketing 18 May, 2015

When developing a new site with SEO in mind, one critical aspect of the site design that is typically overlooked is the URL format.

URLs play an important role in achieving strong SEO performance from your site for a number of reasons. While keyphrases in a URL may no longer be weighted as heavily as they used to be, we all know that there is far more to a comprehensive SEO strategy than cramming in as many keywords as you can.

Your URL format is critical to the crawlability and comprehensibility of your site to search engines. If you build a site with a sub-optimal URL format and later realise the errors of your ways, you will have great difficulty reversing your bad decisions after your site is already live and indexed. Here at QueryClick, we like to get URLs right the first time. To help you do the same, I have put together a list of my top-ten tips for creating the perfect, SEO-friendly URLs:

1. Be Guided By Your Keyphrase Research

Every URL slug for every page on your site should be informed by keyphrase research. You need to ensure that the way your pages are referred to in your page URLs is in line with the way people will be searching for those pages. Don’t rely on default page names created by your CMS to generate search-relevant URL slugs!

For example, if you run an ecommerce site and one of your products is a men’s black merino wool jumper, don’t have your page URL generate as something like:

example.com/merino-blk

Rather, do use something like:

example.com/mens-black-merino-sweater

2. Keep Your Slugs Human – (And Robot) Readable

When crafting your slugs, you will frequently want to use more than one word in the slug. But how should you separate the words? The answer is simple: use hyphens! It’s that easy. For example:

example.com/example-page-name

Here is a brief list of things that I frequently see people use as separators that should never be used in URLs:

  • Underscores: example.com/example_page_name Not only does this look terrible in the address bar, but it also looks terrible on-page when users see it as a link, as the underlining use on most links hides the underscores
  • Spaces: example.com/example page name Wow this is an awful idea, but I still see it used occasionally. Not only do spaces expose you to a whole load of potential problems with broken links and incorrectly-parsed URLs, but they also look atrocious to users because spaces are not a legal character in URLs and so will be escaped by browsers into the familiar ‘%20’ (see the tip below on the use of reserved characters)
  • Plus Characters: example.com/example+page+name Please, just don’t.
  • Nothing: example.com/examplepagename Again, this ugly format presents obvious usability issues. Beyond that, though, it is not guaranteed that Google or other search engines will be able to make sense of these long, unbroken strings of characters and reliably infer the keywords that they represent. Furthermore, it leaves you open to the risk of what I call the “Experts Exchange” issue where you could end up with unintentional readings, depending on where the user inserts the spaces.

3. Use Hierarchical URLs To Describe Your Information Architecture

Most sites are naturally structured hierarchically, with the homepage linking to some form of category or listing page, and these category or listing pages linking to lower-level pages such as sub-category pages, product pages or articles. To make it easier for both users and search engine spiders to comprehend this hierarchy and to be able to infer from an individual URL where in that hierarchy a given page sits, your URLs should reflect this hierarchical nature.

While the exact structure of the URL will depend heavily on your site and what it contains, the following format is what I generally recommend:

SEO friendly URLs: 10 steps to the optimal site structure

By following this logical URL structure, the URL can serve as a ‘breadcrumb trail ‘to aid with user navigation. Indeed, many users (I bet that includes you, dear reader) often navigate a site by directly editing the URL to, for example, return to the parent category page from a product page.

In addition, many sites will benefit from improved keyphrase targeting when including category page names in product URLs, particularly on sites where the product names themselves may not always be keyphrase-rich. For example, if you run an ecommerce site that sells bikes and one of the products is the ‘Raleigh RX Elite’, which of the following URLs would rank better for a search for “Raleigh RX Elite Bike” ?:

mybikestore.com/raleigh-rx-elite

mybikestore.com/bikes/raleigh-rx-elite

There are exceptions to this however, such as the following:

4. Don’t Introduce Duplication Through Categories

Most sites over a certain size will contain some sort of category hierarchy, through which users can access different bottom-level pages such as product pages. If your products (or equivalent) can be listed in more than one category and the category appears in the URL, then you might end up with duplication of the type shown below:

example.com/clothes/menswear/blue-socks

example.com/clothes/underwear/blue-socks

example.com/sale/blue-socks

example.com/blue-socks

It’s worth reiterating that we need to design our URL format to help prevent duplication as far as possible, and having multiple URLs per product in this way would obviously not be ideal. If you have products that can appear in multiple categories then you need to choose one of the following options:

  1. Don’t include any of the category hierarchy in product URLs, so all product pages will have a URL of the format example.com/[product-name]
  2. Canonicalise all products to a single URL form

There’s not a lot to choose between the above two approaches, and generally either will work fine. I tend to prefer option 2, particularly if the category names are important for keyphrase relevance, but only if we can canonicalise to a chosen category for any given product (ie the canonical URL is example.com/clothes/underwear/blue-socks rather than example.com/blue-socks). That’s not often possible, however, and you will find most CMSs force you to canonicalise to the /[product-name] URL variant, which is no better or worse than option 1.

5. Keep Your URL Length Under Control

The defining standard for URLs, RFC 2616, does not specify a maximum length for URLs, however there are practical limitations. Different web servers and browsers behave differently when URLs start to get really long, but the bottom line is that your URLs should all be under 2048 characters in lengthin order to be compatible with all modern browsers. For this reason, Google also went on the recordand confirmed that you should keep it under 2000 characters.

So that’s the technical limit, but for usability reasons you should probably keep all your URLs much shorter than that. Users are less likely to share massive URLs, so keep it compact where you can without sacrificing on any of the other considerations in this post!

6. Don’t Use Reserved Characters

RFC 3986 defines a set of 18 reserved characters that can serve as delimiters, depending on the URI scheme. Although not all of them have special meanings in HTTP, some browsers and crawlers will request percent-encoded versions of these characters in the URL, potentially causing compatibility issues and definitely making your URLs look ugly if users ever try to copy and paste to share them. As such, they should never be used as part of a URL slug. These characters are as follows:

:   /   ?   #   [   ]   @   !   $
&   '   (   )   *   +   ,   ;   =

Furthermore, some non-reserved characters are also typically percent-encoded and so should not be used. These are as follows:

"   %   [space] .   <   >   \
^   _   `   {   |   }   ~

If in doubt, a good rule of thumb is to only ever use alphanumeric characters and hyphens in your URL slugs and strip out all other punctuation and symbols.

7. Think About How You Clean Your Slugs

Given what we discussed in Tip 6 above, if you’ve got bits of text that you’re going to ‘clean’ to get rid of unwanted punctuation before turning into URL slugs, you need to think carefully about how you’re going to do that. A common practice is to simply delete every disallowed character, but this won’t result in pretty or readable URLs. For example, say we had a page on our site about Ben & Jerry’s ice cream, if we cleaned the phrase “Ben & Jerry’s” to create the URL slug by just removing all non-alphanum characters, we’d be left with:

example.com/benjerrys/

Not great, is it? Another way is to replace all non-alphanum characters with hyphens, but our example would then become:

example.com/ben—jerry-s

Also terrible. Notice that triple-hyphen? That’s a result of replacing the ampersand character (&) and both spaces around it with hyphens. Also the hyphened “s” looks quite bad, too.

Ok, so can we do better? What if we replace all ampersands with “and”, all spaces with hyphens and remove all other non-alphanums?

example.com/ben-and-jerrys

Well that’s much better, isn’t it? That replacement scheme is probably a good start, but you may find cases on your site where it doesn’t work well. So keep an eye out, be careful and come up with something that works for your site!

8. Avoid Duplication

Your URL scheme must be built with avoiding or minimising duplication in mind. We already discussed one common cause of duplication in Tip 4, but there are many possible ways to unwittingly introduce duplication to your site through a poorly-planned URL format. Here are a couple of common gotchas to watch out for:

  • URLs with and without trailing slashes: Some URLs have trailing slashes at the end of the slug (eg example.com/some-page/) and some do not (eg example.com/some-page). While Google is quite good at understanding that both URLs probablyrefer to the same page, these two URLs are still technically distinct and so do constitute duplication. There is no preference for slashes or no-slashes from an SEO point of view, so choose one and be consistent! The best way to enforce consistency is to set a site-wide redirect rule to your preferred format. For example, if you decide that you want your URLs without trailing slashes, set a redirect rule that will match any URL request with a trailing slash and 301-redirect it to the no-slash equivalent.
  • Use of the www subdomain: Some sites like to use the subdomain www as the subdomain for the main part of their site (examples include Google, Amazon and many others) whereas some like to use no subdomain for the main part of their site (examples are fewer, but stackoverflow.com and moneyweek.com are two that come to mind).From an SEO point of view, either is fine but you must be consistent! If you have a wildcard DNS recordconfigured for your domain, you may well be serving duplicate content from both yourdomain.com and www.yourdomain.com without realising it. Again, the solution is to either create a redirect rule to redirect all traffic to www.yourdomain.com (as advocated by the guys at www.yes-www.org) or create a redirect rule to send all traffic requesting www.yourdomain.comto yourdomain.com (as advocated by the guys at no-www.org).

On the whole, just be mindful of any aspect of your URL format that could lead to different URLs pointing to the same content.

9. Watch Out For QueryStrings

Just as with Tip 8 above, you must be mindful of any aspect of your URL that can vary without changing the page contents. QueryStrings are the most common cause of this. For the record, a querstring is a name-value pair appended to a URL to pass additional information to the web server in addition to the base URL. The querystring is separated from the base URL using a question mark character as follows:

www.example.com/crayons?colour=red

The part in bold above is the querystring (the question mark is a separator and not part of the querystring itself). Querystrings can be used for a huge manner of things and can be processed by the web server in whatever way the developers of the underlying web application wish.

There are potentially lots of SEO considerations around the use of querystring parameters, but one principle that you should be careful to follow is to avoid the use of querystring parameters that do not affect the page contents such as tracking parameters, ID tags or other user-specific information. Session-specific data should be passed in cookies, form posts, sessions or any other container object supported by the web application framework.

10. Avoid Separating Your Content Across Multiple Subdomains Where Possible

It’s often tempting, when integrating a new part of your website such as a blog or news section, to simply add it on as a new subdomain. This can simplify the implementation and integration process, because you can create a separate DNS record for that subdomain, meaning that the new part of your site can easily be on separate platforms and hosted at different physical locations. For example, you could create a new blog at:

blog.example.com

Or, if you went to the trouble, you could use a subdirectory on your main subdomain:

www.example.com/blog

Today I am recommending that you go with the latter configuration if you at all can. This is a bit of a contentious issue, as Google say that they treat subdomains as equivalent to subfolders.

However, my experience and the anecdotal experience of many other site owners and digital marketing consultants across the web is that separate subdomains perform poorly.

My best guess at the reasons for this is that Google is treating the separate subdomains as completely separate sites (contrary to its own statements, not for the first time!). This means that if you have a well-established domain with lots of history, backlinks and authority, your new content will not benefit from that authority if it is located on a separate subdomain (or at least it won’t benefit to the same degree).

So, unless you have a good reason to use separate subdomains, keep all your content consolidated in one place to maximise your performance.

Own your marketing data & simplify your tech stack.

Have you read?

See all articles