Web content labelling

As we all know, the web contains a certain amount of content that some people don’t want to look at, and/or do not wish their children to look at. Removing the material is seldom an option (it may well be entirely lawfully hosted, and indeed many other people may be perfectly happy for it to be there). Since centralised blocking of such material just isn’t going to happen, the best way forward is the installation of blocking software on the end-user’s machine. This software will have blacklists and whitelists provided from a central server, and it will provide some useful reassurance to parents that their youngest children have some protection. Older children can of course just turn the systems off, as has recently been widely reported for the Australian NetAlert system.

A related idea is that websites should rate themselves according to widely agreed criteria, and this would allow visitors to know what to expect on the site. Such ratings would of course be freely available, unlike the blocking software which tends to cost money (to pay for the people making the whitelists and blacklists).

I’ve never been a fan of these self-rating systems whose criteria always seem to be based on a white, middle-class, presbyterian view of wickedness, and — at least initially — were hurriedly patched together from videogame rating schemes. More than a decade ago I lampooned the then widely hyped RSACi system by creating a site that scored “4 4 4 4”, the highest (most unacceptable) score in every category: http://www.happyday.demon.co.uk/awful.htm and just recently, I was reminded of this in the context of an interview for an EU review of self-regulation.

However, leaving aside what the value of rating, the key problem with the notion of self-rating websites, and this was quickly obvious even in the first flush of enthusiasm in the mid-1990s, is that it is really very difficult for a webmaster to rate their site in an honest and helpful manner without going to considerable effort. This means that it is just not economic to spend your money so as to assist the small minority of visitors who might be offended by some small aspect of your content.

So The Sun cheerfully put a correct “may contain partial nudity” onto their “page 3” pictures of topless women, but to avoid having to think about such things they scored all their news pages “1 1 1 1”, even on the day when the headline was “Boy 12 rapes Girl 11” and the story gave some of the details…

For webmasters who wanted to make an honest attempt at correctly rating every single one of their pages, and one was Sylvia Spruck Wrigley at Demon Internet (where I worked in the 1990s), the whole process was extremely time consuming. I recall her spending ages trying to work out how to rate the pages within a Guy Fawkes themed section of the website; what was a suitable rating for a page that mentioned 1605 interrogation techniques and the punishment for treason?

The RSACI scheme was eventually swept up into ICRA, and PICS was developed as a meta-system for handling the “algebra” when multiple ratings schemes exist in parallel. However, because it was hard work to do it properly and only a minority of webmasters could be bothered to even produce generic “1 1 1 1” labels for their corporate sites, the idea of self-labelling their sites has almost died away (in dot-com speak, it no longer has mindshare). Nevertheless, despite continued evidence of its failure, the approach continued to get support from EU funds (so here’s a relaunch in late 2000), and even in 2007 Microsoft continue to ship a Content Advisor so that parents can set the particular rating levels that their offspring may visit.

An easy category to understand is “bad language”, so the ICRA rating scheme has the categories:

  • 4: Abusive or vulgar terms
  • 3: Profanity or swearing
  • 2: Mild expletives
  • 1: None of the above

and a parent can select what their kids are permitted to view, perhaps with a different value for the 8 year old and the 10 year old (older children are generally considered to be more mature).

In passing one might note that modern approaches to labelling such as this W3C effort are looking at much more complex scenarios and subtlety of labelling, going way beyond 1,2,3,4. If they are ever implemented, they’ll pose a considerable challenge to user interface designers!

One of the groups who still think that self-rating is a Good Idea is politicians (it’s an easy out — they don’t need to consider censoring anything; webmasters will self-label their pages and thereby quietly censor themselves from being viewed by those who set Content Advisor to show that they care). Of course politicians get to tell people what to do, so they have told their civil servants to put ICRA ratings onto UK Government websites.

You can see Content Advisor in action at the Department for Culture Media and Sport. They proudly display an ICRA logo on their front page, and they have labelled their gambling pages (they look after gaming, horse racing etc) to show that there is indeed discussion of gambling. Some parents may have thought that they were only blocking poker websites, or discussions of horse-racing odds, but the DCMS webmaster has made sure they can’t find out about the what the Gambling Act 2005 does either.

Surprisingly perhaps, there’s a certain amount of “bad language” on Government websites and you might think that this would be easy for the webmasters to get right, if they cared (and had the time to care). However, the evidence is that even getting this right is too much trouble.

For example, looking at the Home Office website, it currently hosts 14 documents that contain the word “fuck” and 4 that contain the word “cunt” (their research studies and reports of prison inspections often report witnesses in their own words). However, they are all in PDF files — so Content Advisor won’t prevent them being viewed 🙁

The Department of Health had five “fucks” when I last looked (though mysteriously Google doesn’t seem to fully index their site at the moment) and these are in HTML documents, and so it is perfectly possible to label them correctly. However, to take one example, Chapter 19 of the Inquiry into Child Abuse in North Wales is viewable here, but when one inspects the ICRA tags for the page:

<meta http-equiv="pics-label" content='(pics-1.1 "http://www.icra.org/ratingsv02.html" l gen true for "http://www.dh.gov.uk/en/Publicationsandstatistics/ Publications/PublicationsPolicyAndGuidance/Browsable/ DH_4927518" r (nz 1 vz 1 lz 1 oz 1 cz 1) "http://www.rsac.org/ratingsv01.html" l gen true for "http://www.dh.gov.uk/en/Publicationsandstatistics/ Publications/PublicationsPolicyAndGuidance/Browsable/ DH_4927518" r (s 0 n 0 v 0 l 0))'/>

or rather more simply, feeds the page’s URL into the ICRA label checker one discovers that there is no “potentially offensive language” on the page. However, when one reads it one finds:

B’s only complaint about Ysgol Talfryn is that he was assaulted by a teacher there at about 10.10 am on 22 February 1989. His account of this was that the teacher (Y) asked him, unusually, to read aloud, whereupon B told Y to “fuck off”.

My! How standards have fallen at the Department of Health. This text is apparently not “abusive or vulgar”, “profanity or swearing” or even a “mild expletive”…. why, back in my grandmother’s day, even the now innocuous “bloody” would have frightened the horses!