Mapping the Privila network

September 3rd, 2007 at 20:42 UTC by Steven J. Murdoch

Last week, Richard Clayton described his investigation of the Privila internship programme. Unlike link farms, Privila doesn’t link to its own websites. Instead, they apparently solely depend on the links made to the site before they took over the domain name, and new ones solicited through spamming. This means that normal mapping techniques, just following links, will not uncover Privila sites. This might be one reason they took this approach, or perhaps it was just to avoid being penalized by search engines.

The mapping approach which I implemented, as suggested by Richard, was to exploit the fact that Privila authors typically write for several websites. So, starting with one seed site, you can find more by searching for the names of authors. I used the Yahoo search API to automate this process, since the Google API has been discontinued. From the new set of websites discovered, the list of authors is extracted, allowing yet more sites to be found. These steps are repeated until no new sites are discovered (effectively a breadth first search).

The end result was that starting from bustem.com, I found 294 further sites, with a total of 3 441 articles written by 124 authors (these numbers are lower than the ones in the previous post since duplicates have now been properly removed). There might be even more undiscovered sites, with a disjoint set of authors, but the current network is impressive in itself.

I have implemented an interactive Java applet visualization (using the Prefuse toolkit) so you can explore the network yourself. Both the source code, and the data used to construct the graph can also be downloaded.

Screenshot of PrivilaView applet

Entry filed under: Security economics

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe to the comments via RSS Feed


Calendar

September 2007
M T W T F S S
« Aug   Oct »
 12
3456789
10111213141516
17181920212223
24252627282930