April 7th, 2006 at 17:12 UTC by Richard Clayton
Last October I was approached by Poul-Henning Kamp, a self-styled “Unix guru at large”, and one of the FreeBSD developers. One of his interests is precision timekeeping and he runs a stratum 1 timeserver which is located at DIX, the neutral Danish IX (Internet Exchange Point). Because it provides a valuable service (extremely accurate timing) to Danish ISPs, the charges for his hosting at DIX are waived.
Unfortunately, his NTP server has been coming under constant attack by a stream of Network Time Protocol (NTP) time request packets coming from random IP addresses all over the world. These were disrupting the gentle flow of traffic from the 2000 or so genuine systems that were “chiming” against his master system, and also consuming a very great deal of bandwidth. He was very interested in finding out the source of this denial of service attack — and making it stop!
Poul-Henning had already identified that the bad packets were the ancient NTPv1 format, whereas the traffic he wanted to handle was exclusively NTPv4. He supplied me with some extensive packet dumps and asked me to try and find out the cause of his problems. There was quite a lot of material (about 1.4GB of .gz files) to look at — on a typical day he’d receive 3.2 million bad packets (that’s 37 a second!). Analysing the patterns (and lack of patterns) within these dumps strongly suggested that the traffic wasn’t synthetic, viz: it was coming from where it said it was; or it was being spoofed to look as if it had come from real addresses using algorithms that were an order of magnitude better than any I had seen before.
If the source were genuine, then it was worthwhile contacting some of the endpoints. I was quickly able to isolate the traffic coming from AS2529 which is used by Demon Internet, a UK ISP for whom I regularly consult. I approached their “abuse team” to determine if any of the source addresses within their network were known to be sources of email spam, had been reported as being a part of a botnet, or had been known to have been infected with a virus or worm in the past. A handful had been implicated, but most of the rest were “clean”. So it didn’t look as if I was looking for a botnet herder with a grudge against the Danes!
At that point I realised that one of the sources was known to me. A decade ago, in another life, developing Turnpike, I’d dealt with a problem Nick Wedd was having with his machine corrupting files. I hoped that he’d remember me, and so I sent him an email to ask if he knew of any reason why his system would suddenly be sending NTPv1 packets to a machine in Denmark.
To cut a very long story short, it turned out that he did remember me, but that he had a complex network with four or five machines, routers, wireless links and all sorts. Over a week or so, by a process of elimination (he kept track of which of his machines were switched on) and monitoring (his stepson David Prime ran a copy of Ethereal to dump any of the v1 packets); we eventually narrowed the packet source down to his D-Link DI-624 wireless router.
I purchased a DI-624 on eBay and found that its exact operation depended upon the particular firmware version it was running. Typically the firmware contains a list of 50 or so NTP time servers and it will choose one at random and ask it for the time – by sending out a naively constructed NTPv1 packet. If there is no answer (because the remote server doesn’t reply, or the response is firewalled off) then after 30 seconds it will choose another server and try again. If it does get a reply then it won’t ask again for a hour or so.
And it’s not just the DI-624. Many other D-Link products have the same behaviour, and over the years they have shipped tens of millions of devices. So all of these enquires add up (especially the unanswered ones)… to about 37 packets a second on each of the world’s stratum one timeservers!
This isn’t how NTP is meant to work. Consumer devices should ask one of their ISP’s time service machines (probably running at stratum 3), the ISP will synchronise these to a stratum 2 device that is firewalled off from customers, and that machine will chime with some nearby (same continent) stratum 1 machines. Leaving aside the denial-of-service issues there’s not much point in consumers sending packets half-way across a continent to a stratum 1 machine — network variability will mean that they get as good or better results from a nearby box.
Now D-Link do provide a way of configuring a DI-624 to contact a nearby NTP server — and setting this will mean no extraneous stratum 1 traffic. However, the default configuration is to access the stratum 1 timeservers and hence the stage is set for a DDoS attack on part of the key infrastructure of the Internet.
If this story sounds familiar then it is. Back in May 2003 the University of Wisconsin – Madison found itself under a DDoS attack of hundreds of thousands of packets a second. In that case it was Netgear routers that were configured to send Simple NTP (SNTP) packets to a single server. To make matters worse, if they didn’t get an answer within 5 seconds they tried again. Hence, after a network outage (or perhaps a widespread power outage) there would be so much network congestion that answers would be lost — and so the congestion would continue for hours.
Although no-one discusses legal negotiations in public, shortly after Dave Plonka worked out what was causing the incoming traffic, Netgear spontaneously made a generous gift of $375,000 over three years “to improve wireless security on campus and to build out our campus network”. Which was nice.
However, in the current case, D-Link don’t seem to be feeling quite so generous. Poul-Henning reports in an open letter to D-Link (which means that I can finally report the material above) “I have been accused of extortion. I have been told that I have no claim, been told that I exaggerate the claim.”
In my own opinion, shipping equipment that generates 37 packets a second to Poul-Henning (and hence about 2K packets/second to all of the stratum 1 servers as a whole — that’s about a T1 of traffic) is hardly trivial. If D-Link were running their own time servers, as in my opinion they should be, it would cost them about $1000/month for the bandwidth alone.
Poul-Henning has a particularly strong case because in the canonical list of NTP stratum 1 servers his machine is listed as "Service Area: Networks BGP-announced on the DIX; Access Policy: open access to servers, please, no client use;" so access by random consumers who own D-Link routers is clearly not permitted. Indeed, many of the other stratum 1 NTP servers used by D-Link have similar rules — either setting geographic limits or, often, requiring that only stratum 2 servers should connect. So D-Link, who must have consulted a similar list when they were writing their firmware, appears to be entirely ignoring well-publicised access restrictions.
Of course, even if D-Link immediately saw the error of their ways, they’ve nailed these lists of time servers into the firmware of all the devices they’ve shipped. Consumers don’t generally re-flash their kit (the instructions tend to be pretty scary) and so it will be years before the traffic to Poul-Henning starts to die down even if, as he may be forced to, he removes his server from the DIX.
Those dumb little D-Link boxes in the corner of the lounges all over the world (I’ve heard them called “clocksuckers”) will still be demanding, whether or not they get an answer, to know what time it is. And this is just so that they can timestamp their logs correctly; even though no-one will ever look at their contents… What a waste, quite literally, of time!
Entry filed under: Security economics