John Topley’s Weblog

Database IDs Have No Place In URIs

I’ve been beta testing Jeff Atwood’s and Joel Spolsky’s latest venture, Stack Overflow. In case you haven’t heard, Stack Overflow is a new site where programmers can go to get their programming questions answered by other programmers. It’s a similar idea to sites like Expert Sexchange [sic], but without the pay wall and general crapness.

I really like what I’ve seen so far. I think Stack Overflow is very well put together and even during the current beta stage it’s pretty slick. The reputation and badges system whereby you gain points and virtual awards for the contributions you make, is fun and strangely addictive. Most importantly though, it’s already genuinely useful. However (and you knew this was coming), one aspect of the implementation of Stack Overflow troubles me somewhat: it appears the site is using numeric database IDs within URIs.

Note that I use URI rather than URL because one of the stated aims for Stack Overflow is that its content be readily indexable by search engines, so that people can enter their specific programming question into Google and hopefully amongst the top results should be the definitive answer on Stack Overflow. It’s known that Google does place importance on URIs, so you want them to be meaningful and immutable.

URIs in Stack Overflow look like this:

http://stackoverflow.com/questions/13204/why-doesnt-my-cron-job-work-properly

—As you can see, there’s a number, followed by a Google-friendly version of the question title, which is often called a “slug”.

It’s generally considered A Bad Idea to expose your database IDs in URIs and here are several reasons why:

  • If you have to move the site to a different box, can you guarantee that those database IDs will remain the same?
  • If you have to restore the site’s data from a backup, can you guarantee that those database IDs will remain the same?
  • If you have to switch to a different database server (say from Microsoft SQL Server to Oracle), can you guarantee that those database IDs will remain the same?

If you answered “no” to any of the questions above, then you’ve broken a fundamental rule of URIs (permalinks), which is that they’re supposed to stay the same forever! Including database IDs exposes implementation detail to the world. It might give me an idea of how many questions or answers there are and I might feel inclined to start hacking around, putting different numbers in to see if anything interesting happens.

URIs should be meaningful, not cluttered with meaningless information. Another example of this is URIs that include pseudo file extensions. For example, .do (Struts) or .aspx (ASP.NET). Why should a site’s visitors care what technology it’s implemented in? Think about what information you’d want your URIs to divulge if you were able to look at them in a hundred year’s time and discard everything else. Keep them meaningful and clean.

I did question the use of database IDs using the Stack Overflow Feedback Forum and got the following official response:

“without the numeric ID, it’d be nearly impossible to map text to database IDs.”

Now unless I’m fundamentally misunderstanding a key part of how Stack Overflow is implemented—which is entirely possible for I have no access to the project team or source code—I find it difficult to understand this statement. Surely all that’s required is an indexed database column that stores the question slug that forms the end of the URI. The slug itself can easily be automatically generated when a new question is saved. Then you can simply retrieve a question by its slug. Using the ActiveRecord dynamic finders in Ruby on Rails it might look like this:

question = Question.find_by_slug params[:id]

—Now you have a fully-formed Question instance and you can go off and get its answers etc. Of course in production code you’d want to make sure that the incoming request parameter is trustworthy rather than passing it straight to the finder method, but that’s not the point of the example. The key point is that the slug in the URI is the key that’s used to get hold of your model object; everything can still be stitched together using numeric database IDs under the hood, just don’t expose that to your site’s visitors. Please don’t pollute your URIs with implementation specifics.

Comments

There are 9 comments on this post. Comments are closed.

  • avatar John Conners
    19 August 2008 at 21:06

    Yeah, I'd have to agree with you there John - exposing the implementation of a web app to the outside world is just asking for trouble if you ever want to change the implementation - or at least makes it harder. Take Wordpress for example, there's no need to expose the post id through the url, the slug is stored in the table and that's what gets exposed. You could port this site to another platform (perish the thought) and it wouldn't cause a problem as you're not showing that primary key.

  • avatar Ian Nelson
    19 August 2008 at 21:19

    In the grand scheme of things that could be "wrong" with a site, I think this is fairly low on the list. If this is all that you have to complain about, then it raises my expectations of the quality of Stack Overfow even higher (my beta invite hasn't come through yet!).

    But I am very surprised that people of Atwood's calibre would come out with a response like the "nearly impossible to map" comment above. Your site is powered by Wordpress and doesn't include database IDs in the URL. Mine is running on Graffiti, itself developed on ASP.NET - the same stack as Stack Overflow, and it too has meaningful URLs devoid of database IDs. So, this is obviously a solved problem and I think it's a shame that Stack Overflow haven't taken the time to tidy up their URLs. Especially as Jeff makes such a big deal over whether to drop the "www" prefix or not.

  • avatar Joe Grossberg
    19 August 2008 at 21:57

    Or use GUIDs. That also prevents users from viewing different rows by incrementing a decrementing IDs (which may or may not matter). They're also a fixed length, which makes indexing easier, IIRC.

  • avatar Duncan Smart
    19 August 2008 at 22:36

    I agree that the IDs could probably be left out and unique "slugs" could be generated (much like Digg.com), but *all* of your bullet points can be answered "yes", and without much difficulty (e.g. SET IDENTITY_INSERT tablename ON).

  • avatar John Topley
    20 August 2008 at 08:02

    @Ian

    I actually think it's more than a minor point because what goes in the browser address bar is an important and often overlooked part of the experience of using a web application.

    @Joe

    I hope you're not advocated GUIDs in URLs - that would be horrible!

    @Duncan

    That's fair enough, but I think I'd prefer to completely avoid the problem in the first place!

  • avatar Ranju V
    20 August 2008 at 21:02

    I agree. This is exactly the strategy I adopted for my blog, i.e., dynamically construct a slug that refers to blog entries. I couldn't get rid of the "aspx" extension though as it involves messing around with the web server and my hosting provider doesn't let me do that.

  • avatar Ian Nelson
    21 August 2008 at 21:01

    I think you've just been proved right:

    http://beta.stackoverflow.com/questions/21064/massive-reputation-jump

  • avatar John Topley
    21 August 2008 at 21:13

    Yeah, I saw that. I must admit, I am enjoying the extra reputation (while it lasts!)

  • avatar Anders Sandvig
    22 August 2008 at 09:14

    I agree with you in principle, but I don't think the IDs used in Stack Overflow is a problem. If they decide to move databases or switch technology it will be quite easy to create a mapping between the current database IDs and the new keys--a mapping you are saying they would have to create anyway. If they can't rely on the slug strings being unique, they would still have to generate some kind of unique identifier, so why not let the database engine do it for them?

    You are making an assumption that the number you see in the URI is the primary key of the table containing the questions. After all, this doesn't have to be true. And even if it is today, it need not be so in the future.

    I think the naming scheme of http://hostname//describing-text/ is much better than the all-too-often seen http://hostname/?id=.

Think about what information you’d want your URIs to divulge if you were able to look at them in a hundred year’s time and discard everything else. Keep them meaningful and clean.


Archives

  • Jan
  • Feb
  • Mar
  • Apr
  • May
  • Jun
  • Jul
  • Aug
  • Sep
  • Oct
  • Nov
  • Dec
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014

More Archives


Sign In