WikiLeaks have decided to save other people some bandwidth and make some of my powerpoint slides available on their site. Since they usually publish censored or confidential information, they’re presumably completely unaware of how these slides have been available to the public from two different websites since the day of the talk.
Remarkably similar slides (I’m often lazy!) were also presented in talks I have given this year to INEX (the Irish Internet Exchange Point) [slides here], to EuroBSDCon [slides here] and the BCS (Herts branch) [slides here].
These talks have been covered various technical aspects of the blocking of child sexual abuse images for sites that appear on the IWF list. I’ve been mentioning the blocking of Wikipedia just over a year ago, and the blocking of archive.org up to last February. However, I’ve also thrown in a couple of slides about some more recent research, yet to be published, which explores a different way of determining what is on the IWF list. That seems to have been what has interested WikiLeaks.
The new method I am using is far from elegant. Essentially I check if an ISP using “DNS poisoning” to redirect traffic to a filtering machine, is or is not incorrectly resolving a particular hostname.
The algorithm is:
for $hostname in (list of all valid hostnames)where the only “clever” aspect is that I’ve realised that I can source a “list of all valid hostnames” (or at least a good approximation to that) from the ISC‘s passive DNS database.
if (resolve(hostname) == cache-IP-address)
print "hostname is blocked"
I should point out in passing that the ISC data is not generally available, so duplicating this work is not generally possible.
Anyway, what I’ve been discussing in my various talks is how this method enables me to reverse-engineer about 60% of the IWF list (the other hostnames are too obscure to appear in the ISC data). This 60% comes in two parts. 35% are sites that are intentionally hosting illegal material, and 25% are reputable, and entirely legal, sites that host a great deal of legal material.
Since I would commit a criminal offence by visiting any of these sites and viewing a child sexual abuse image, my 35/25 split is achieved by making deductions from secondary sources obtained by doing Google searches on the domain names.
I’ve chosen to publish the legal site names on my slides because knowing the sitename will not enable anyone to locate the specific URL that leads to illegal material. I have NOT of course published any domain names from the 35%. The general point I’ve been making in my talks is that it is disappointing that the IWF has not chosen to get in prompt communication with the legal sites, since they will doubtless be horrified to learn what they are inadvertently hosting and (assuming it is illegal in their jurisdiction) immediately remove it.
Anyway, you can see all of this on WikiLeaks here; or you can find all of the slides from my various talks here on my own pages. Those who are really keen can hear podcasts of the versions of this talk that I gave to the BCS (Herts) here; at EuroBSDCon here; or you can even watch a video of me when I talked to INEX here.
It seems we must now add “and WikiLeaks republishes widely available material merely because it appears to someone to be a little salacious”.