Mapping the Privila network

Last week, Richard Clayton described his investigation of the Privila internship programme. Unlike link farms, Privila doesn’t link to its own websites. Instead, they apparently solely depend on the links made to the site before they took over the domain name, and new ones solicited through spamming. This means that normal mapping techniques, just following links, will not uncover Privila sites. This might be one reason they took this approach, or perhaps it was just to avoid being penalized by search engines.

The mapping approach which I implemented, as suggested by Richard, was to exploit the fact that Privila authors typically write for several websites. So, starting with one seed site, you can find more by searching for the names of authors. I used the Yahoo search API to automate this process, since the Google API has been discontinued. From the new set of websites discovered, the list of authors is extracted, allowing yet more sites to be found. These steps are repeated until no new sites are discovered (effectively a breadth first search).

The end result was that starting from bustem.com, I found 294 further sites, with a total of 3 441 articles written by 124 authors (these numbers are lower than the ones in the previous post since duplicates have now been properly removed). There might be even more undiscovered sites, with a disjoint set of authors, but the current network is impressive in itself.

I have implemented an interactive Java applet visualization (using the Prefuse toolkit) so you can explore the network yourself. Both the source code, and the data used to construct the graph can also be downloaded.

Screenshot of PrivilaView applet

Leave a Reply

Your email address will not be published. Required fields are marked *