February 17th, 2010 at 13:00 UTC by Tyler Moore
For more than a decade, aggressive website registrants have been engaged in ‘typosquatting’ — the intentional registration of misspellings of popular website addresses. Uses for the diverted traffic have evolved over time, ranging from hosting sexually-explicit content to phishing. Several countermeasures have been implemented, including outlawing the practice and developing policies for resolving disputes. Despite these efforts, typosquatting remains rife.
But just how prevalent is typosquatting today, and why is it so pervasive? Ben Edelman and I set out to answer these very questions. In Measuring the Perpetrators and Funders of Typosquatting (appearing at the Financial Cryptography conference), we estimate that at least 938,000 typosquatting domains target the top 3,264 .com sites, and we crawl more than 285,000 of these domains to analyze their revenue sources.
We find that 80% of typo domains are supported by pay-per-click ads. Often, the typo domains show ads that promote the correctly spelled site, along with the site’s competitors. Screenshots of selected examples.
Another 20% of typo domains include static redirects to other sites. For example, 156 misspellings of yellowpages.com redirect to the competing website yellowpagesoftheworld.com. We devised an automated technique that uncovered 75 otherwise legitimate websites which benefited from direct links and redirects from thousands of misspellings of competing websites.
So what’s the harm in typosquatting? First, typosquatting confuses consumers, causing them to visit sites different than the ones they intended to visit. Second, site operators must pay large sums of money to ad platforms such as Google AdWords in order to reach the users who specifically requested the corresponding sites. Third, we found evidence that ad platforms exacerbate typosquatting. Using regression analysis, we determined that websites in categories with higher pay-per-click ad prices face more typosquatting than websites whose keywords fetch lower ad prices.
Just how much revenue comes from ads on typo sites? It is difficult to know for certain, since Google and others do not disclose revenue figures at the granularity of particular advertising programs such as AdSense for Domains. We attempt a back-of-the-envelope estimate using Alexa reports of website popularity. We estimate that typo domains matching the top 100,000 websites collectively receive at least 68.2 million daily visitors. If these typo domains were treated as a single website, that site would be ranked by Alexa as the 10th most popular website in the world. It would be more popular, in unique daily visitors, than twitter.com, myspace.com, or amazon.com!
According to our analysis, 57% of typo sites include Google pay-per-click ads. Combining our observations with financial reports and others’ estimates, we conclude that Google’s revenue from typosquatting on the top 100,000 sites is $497 million per year. This is significant, and not only for the advertisers who are losing out by paying to get their ads placed on typo sites. It matters also because Google’s competitors rely on typosquatting to a much smaller extent: In our testing, Yahoo’s ads appear on 21% of typo sites, and we did not find a single Microsoft ad on any typosquatting site. Looking at Google’s ever-growing share of online search and search advertising, we are struck by the role of typosquatting — making Google look that much larger, to advertisers and to analysts, when in fact this typosquatting traffic is entirely ill-gotten.
However, other findings leave us optimistic about the feasibility of significantly reducing typosquatting. Google’s ad click links indicate which Google partner is paid for clicks at a given typo domain. We found high concentration among Google partners engaged in typosquatting: Of typo domains showing Google ads, 63% use one of five Google advertising IDs. So while the sheer number of typo sites remains high, the number of key perpetrators is small.
Our web appendix details many specific typosquatting domains — including the registrars and hosting companies who support those domains and, crucially, the ad networks whose payments put the system in motion.
Our full posting: Measuring the Perpetrators and Funders of Typosquatting and web appendix.
UPDATE (2010-02-17): New Scientist has published an article.