A measurement of link rot: 57%

I submitted my PhD on the 31st August 2005 (9 months before Twitter started, almost two years before the first iPhone). The easiest version to find (click here) contains the minor revisions requested by my examiners and some typographical changes to fit it into the Computer Lab’s Technical Report series.

Since it seemed like a good idea at the time, my thesis has an annotated bibliography (so you can read a brief precis of what I referenced, which could assist you in deciding whether to follow it up). I also went to some effort to identify online versions of everything I cited, because it always helpful to just click on a link and immediately see the paper, news article or other material.

The thesis has 153 references, in two cases I provided two URLs, and in three cases I could not provide any URL — though I did note that the three ITU standards documents I cited were available from the ITU bookshop and it was possible to download a small number of standards without charge. That is, the bibliography contained 152 URLs.

I did note:
Unfortunately, URLs rot away and lead to abandoned web servers, bankrupt companies or sometimes to shiny new content that omits the interesting features I have noted within the older documents that I viewed. If this happens to you when a URL fails to function, then a search engine will probably locate the document’s new home; or maybe preservation systems like archive.org will still be functional so that you can look at the web as it used to be

In fact, one of the documents I cited (from 2001) was already unavailable in August 2005, so my URL already pointed at archive.org.

This week, to mark the fifteenth anniversary, I have checked the status of the URLs in my thesis bibliography. Sadly, I find that 82 of them (57%) no longer work.

It’s not just random blog posts, or geocities (remember that?) sites that don’t work. Major companies, industrial research labs and even a number of computer science departments have reorganised their web presence and decided that maintaining old URLs as a courtesy to others is not worth their effort.

I fully expect that search engines would locate the new location in some cases, but many of the documents are gone — as indeed is the whole of the site that used to host them. In passing I will note that for $5000 you can buy the domain that used to hold policy documents issued by the Indian Government!

Pleasingly my archive.org URL still works, and I expect that will be the best way to find most of the missing documents — or of course, for proper academic papers (and that’s a fair proportion of what I cited) then you could head off to that ancient institution “the library” and see if you could find some of those flattened sheets of wood pulp that “academic papers” used to exclusively use.

2 thoughts on “A measurement of link rot: 57%

  1. There aren’t any DOIs … although there were national standards in 2000, DOIs were not an international standard until 2010. They took a while to catch on at all, even then, DOIs for RFCs (I cite 13 and all of those links (9%) still work) were not introduced until 2015.

Leave a Reply

Your email address will not be published.