Fashion crimes: trending-term exploitation on the web

News travels fast. Blogs and other websites pick up a news story only about 2.5 hours on average after it has been reported by traditional media. This leads to an almost continuous supply of new “trending” topics, which are then amplified across the Internet, before fading away relatively quickly. Many web companies track these terms, on search engines and in social media.

However narrow, these first moments after a story breaks present a window of opportunity for miscreants to infiltrate web and social network search results in response. The motivation for doing so is primarily financial. Websites that rank high in response to a search for a trending term are likely to receive considerable amounts of traffic, regardless of their quality.

In particular, the sole goal of many sites designed in response to trending terms is to produce revenue through the advertisements that they display in their pages, without providing any original content or services. Such sites are often referred to as “Made for AdSense” (MFA) after the name of the Google advertising platform they are often targeting. Whether such activity is deemed to be criminal or merely a nuisance remains an open question, and largely depends on the tactics used to prop the sites up in the search-engine rankings. Some other sites devised to respond to trending terms have more overtly sinister motives. For instance, a number of malicious sites serve malware in hopes of infecting visitors’ machines, or peddle fake anti-virus software.

Together with Nektarios Leontiadis and Nicolas Christin, I have carried out a large-scale measurement and analysis of trending-term exploitation on the web, and the results are being presented at the ACM Conference on Computer and Communications Security (CCS) in Chicago this week. Based on a collection of over 60 million search results and tweets gathered over nine months, we characterize how trending terms are used to perform web search-engine manipulation and social-network spam. The full details can be found in the paper and presentation.

We found that 18% of the trending terms included at least one search result flagged as malware within 72 hours of the term appearing in the Google’s list of trending terms. At any point in time, around 4% of the currently “hot” terms include results pointing to malware that has already been detected by Google. A further 2% of “hot” terms link to malware that has not yet been detected, on average. For consistently popular terms, the figures are considerably lower — 2% of such terms include links to detected malware and only 0.2% have links to malware not yet appearing in Google’s blacklist.

We also encountered many low-quality MFA sites such as eworldpost.com (screenshot here), which appeared high in Google’s search results for 549 distinct trending terms between July 2010 and March 2011. In all, around 40% of trending terms included MFA sites such as eworldpost.com in their results.

Looking at the terms themselves, we found that the less popular terms attract more malware and ads. One third of terms whose peak popularity was under 1,000 searches per month included malware in their results, compared to under 10% of terms attracting more than 100,000 monthly searches. We observed a similar effect for MFA sites. This suggests that search engines can choose from more legitimate options for the more lucrative terms, as compared to “long-tail” search terms.

We then estimated the number of visitors who are exposed to malware and MFA via trending search terms by linking our results to Google’s own estimates of visits per search term. We estimate that over 4 million users are exposed to low-quality MFA sites when searching for trending terms each month, compared to around 50,000 visits pointing to malware. We further estimate that these visits translate to monthly revenues of around $100,000 for MFA sites and $60,000 for malware-distributing sites. This is certainly a lower-bound on the revenues available to miscreants by poisoning search, given that there are many additional search terms to target in addition to those currently trending. Nonetheless, I do think these calculations provide additional empirical support to the argument that many estimates of cyber-criminal revenues are overblown.

Furthermore, when combined with our earlier finding that malware and MFA sites both target the search results of less popular terms, these revenue estimates suggest that MFA and malware could be viewed as economic substitutes by the purely profit-motivated adversary. Consequently, any crackdown on one monetization vector could make the other more attractive. This is important, because Google initiated a crackdown on low-quality ad-sites in February 2011, during the middle of our data collection. This fortunate timing allowed us to measure the impact of Google’s intervention. We found that traffic to MFA sites from trending terms fell by around half after the algorithm change, likely reducing the profitability of MFA sites.

What might this mean for the future? Perhaps malware distribution will be seen as more financially attractive to miscreants, in which case we could see more malware-distribution targeting trending terms. Such a shift in strategy is not without precedence. Several years ago, typosquatting was used to direct customers to pornographic websites and carry out phishing attacks. Following a crack-down on such practices, domain squatters settled on a more lucrative model — syndicating pay-per-click ads. Now, at least a million typo websites are in use, and the vast majority simply host ads, drawing in hundreds of millions of dollars of revenue annually.

The open question is whether a significant crackdown on low-quality ad sites might simply shift the economics in favor of distributing malware. However, search engines have already demonstrated a willingness to fight malware distribution, in addition to combating MFA sites. Consequently, we remain optimistic that search engines might be willing to crack down on all abuses of trending terms.