May 11, 2011 at 10:12 PM
Edited May 11, 2011 at 10:23 PM
We are currently addressing the feature requests and critique we have gotten for how our URL’s look – they are way too verbose, developers lack control over casing and how characters gets replaced and the .aspx extension is not desired. URL’s
like http://contoso.fr/fr/Home/About.aspx should be
Extra path info
One area where we haven’t quite been able to find a silver bullet and would like to get some input and ideas is in relation to ‘extra path info’ which extends an existing page URL and can be consumed by a functionality embedded on the page
to control what it shows.
Today a URL with extra path info would look like http://contoso.com/Home/Products.aspx/books and the .aspx part ensure that we can easily identify what is the ‘page part’ and what is the ‘extra
part’ of the URL. We can easily send a 404 etc. when something is wrong today. Under a new URL regime this URL could become
http://contoso.com/products/books and we can no longer easily identify the page url part.
A quick example: http://server/products is a page that contain a C1 Function that by default will list products, but if the C1 Function is called with a product id as “extra path info”, that particular
product is shown by the function instead of the product list. So the URL
http://server/products/baconnaise would show the same Composite C1 page instance as
http://server/products, but the C1 Function embedded on that page would behave differently, showing either a list or a single product.
When intercepting the request http://server/products/baconnaise Composite C1 would find that “/products” is the page path (deepest match found in the page sitemap) and “/baconnaise” is extra
path info. The “/products” page would be rendered, leaving it up to functionality on that page to use (or not use) the extra path info.
The good part
Being able to just add a C1 Function to any page and then be able to use extra path info to ‘navigate inside the C1 Function’ using extra path info is a very easy and intuitive. You simply just create a C1 Function that feast on extra path info
and then add it to a page where desired, and you can get sweet URL’s and move things around if needed without digging into code or config. Yay!
The bad part
A very loose URL regime like this is something that SEO people have warned me never to allow. For instance a page like
http://server/about would serve the same content as
http://server/about/add-anything-here since the CMS would find “/about” to be the best matching page. No 301 redirection or 404 errors would happen, a search engine would just find a potential limitless
amount of URL’s with duplicate content (just add /something to the page and you have a new URL with the same content).
“Bad links” could happen in situations like this:
- A page is moved or the URL title is renamed. Requests using this pages old URL would yield the parent page content, not a 301 redirect or a 404 errors. Content duplication could become an issue.
- Someone with evil intent create links like http://server/about/crappy-company which with some effort could actually show up on Google results higher than the original /about url. Not cool.
- A link to a page is misspelled, making the parent page be rendered. Duplicate content.
- (other situations worth noting?)
Should we handle this?
Yes. Duplicate content issues are almost guaranteed to happen over time and slowly grow worse and I have spent enough time with SEO people to know that duplicate content is something a CMS should combat, not promote.
How should we handle this?
I would like to quickly recap the good part again: We have this nice and easy to use feature, where you as a dev can hook on to the URL of the page that host your C1 Function and use the extra path to route into your data. Our MVC Player works like this
and it’s fairly sweet. An end user can add your C1 Function where they want it and this “just work”, no config or code changes are required and URL’s and routing adapt naturally.
Here are a few ways to prevent the duplicate content issue listed in no particular order. Most of them break the “just works” goodness described above, question is which approach is desirable or if better ways exist:
- Pages that accept extra path info must be ‘white listed’ first. Adding a C1 Function that uses extra path info will not function as expected before a user must explicitly allow it for the particular page. This could be a checkbox when editing
the page or a config setting. A request to “/products/baconnaise” will yield 404 until “/products” is explicitly allowed to be requested with extra path info.
(Good: Pretty much fix the issue. Bad: user actions are required, will while list anything, including /products/crappy-company)
- Composite C1 will allow the extra path info request to execute, but unless code executed as part of the request explicitly notify Composite C1 that it “used the extra path info” the rendered page is thrown away and a 404 is returned. This is
kind of like the white list idea, except code do the “white listing” for the current request while executing and bad extra path info may still yield a 404.
(Good: Allow devs to fix the problem with a high level of detail, letting bad url’s fail. Bad: Dev need to care about calling this. How would XSLT Function devs do it?)
- <link rel=”canonical” … /> is used to combat duplicate content. By default the current rendering page’s URL (i.e. /products) would be specified as canonical URL. Code that actively uses the path info is responsible for delivering
a more exact canonical URL. By default pages will render just fine with /extra/stuff – the canonical url will contain the current pages current short form URL.
(Good: things just work. Bad: require a canonical link element regime by default, devs must expand on it of lose google indexing inside their C1 Function urls)
- Introduce a “url validation provider” feature – dev can write a plug-in that gets called with seemingly invalid URL’s and then okay it at request time (perhaps pass it to a specific page). If no provider okay’s the url the
request yields 404.
(Good: You can write C# code that gets to okay a URL. Bad: We didn’t solve the problem, just moved the headache to a provider).
Here is a relevant video if you are not familiar with duplicate content or the canonical link element:
Sorry if this post became long and murky - I hope it make sense and you either have ideas to share or can identify a model you would prefer.