I’m at IMC 2017 at Queen Mary University of London, and will try to liveblog a number of the sessions that are relevant to security in followups to this post.
I’m at IMC 2017 at Queen Mary University of London, and will try to liveblog a number of the sessions that are relevant to security in followups to this post.
11 thoughts on “Internet Measurement Conference”
Vasileos Giotsas kicked off with a talk on inferring BGP blackholing carried out by ASes to deal with DDoS traffic. This has been going on for years but there’s little open data on who does it, how and for whom. The AS of a target (the blackhole user) can use a BGP update to notify others of a blackholed prefix under RFCs 6535 and 7999, so they can drop traffic. The blackhole provider is not just the upstream ASes but IXPs that provide interfaces for their community. However communities are not standardised. Vasileos used data from four BGP collectors round the world, peered with over 2,700 ASes, assembling the data for each blackhole event. Over the past three years, the number of providers has doubled from about 40, and over 160,000 prefixes have been blackholed in the period, with visible spikes for events ranging from the Turkish coup to Mirai. Providers include both IXPs and transit providers; their services are most popular among content providers and hosters, often serving ephemeral or low-ranked domains.
Romain Fontugne was next on pinpointing delay and forwarding anomalies. When a customer complains to their ISP that a website is too slow, the usual approach is to use traceroute, ping and operator mailing lists – which being manual is slow, expensive and doesn’t scale. Romain’s solution is to exploit the existing measurement data from an existing public source, the RIPE Atlas. Particularly useful are “builtin”, which sends a traceroute every 30 minutes from 500 servers to all DNS root servers, and “anchoring” which sends traceroutes every 15 minutes to 189 collaborating services. The core of the work is finding smart ways to analyse noisy data, traffic asymmetry and packet loss. With enough probes, you can monitor the health of many intermediate links in real time; in fact the public data suffice to monitor hundreds of thousands of links in real time with no extra burden on the network. In addition, Romain has packet loss models to spot router failure and a health metric for ASes. Case studies include the June 2015 Telecom Malaysia route leak which affected 170,000 prefixes, and related congestion in Level 3 in London.
Yves Vanaubel is working on Tracking Invisible MPLS Tunnels. Routing data collected by firms like Caida are often inaccurate because opaque MLPS clouds made up of many invisible tunnels lead to naive scans inferring a small number of nodes with an unrealistically large degree (e.g., over 128). How can routes be inferred more accurately? Yves has developed two techniques: direct path revelation, for networks not using MLPS internally, and mostly relevant to Juniper devices; and recursive path revelation for networks using all MPLS, mostly relevant to Cisco. In the former you try to run a trace to an internal prefix and see if routers reveal themselves; in the latter, you try to run a trace to the egress router, and try internal prefixes. He’s run experiments on PlanetLab with 91 nodes, selecting high-degree nodes from the Caida dataset, and mapping the length of invisible tunnels found.
Amogh Dhamdhere is working on inferring Internet congestion. This is a complex subject, tied up with commercial manoevring (as in the Comcast/Netflix pairing negotiations); how can we get good data? M-lab started using throughput-based measurements based on NDT data in 2014-15, and Amogh has been thinking about the best methodology to use. Simple network tomography – inferring congested links from end-to-end throughput data – is harder than it looks; see for example the previous paper. But various things can be done In the top five US ASes, the client and server were not more than one hop away in 82% or more of the NDT tests. As for link diversity, the M-lab server in Atlanta (hosted by Level 3) found 1 or 2 links to Comcast and Frontier, but 14 to AT&T. He also uses bdmap to identify interconnects between ASes, but only a small fraction of them may be testable by M-lab; speedtest.net covers more as it thousands of servers rather than hundreds.
Rodérick Fanou is investigating the causes of congestion in African IXPs. Recent work in South Africa and elsewhere has shown that consumers in Africa often don’t get the advertised broadband performance. Roderick believes his is the first congestion study in the continent. He has been deploying vantage points in five countries and did time-series latency probes to perform network tomography for a year till February 2016-7. He validated his results via IXP operator interviews. Sustained congestion cases include GIXA, hosted by Ghanatel; this was providing free transit to a content network through a 100Mbps link, which was congested, while serving its own paying customers through a 1Gpbs link which was not. Another ISP, Netpage, ended up paying for an upgrade so its customers could get Google traffic without congestion. Roderick concludes that the IXP ecosystem is highly dynamic in Africa, so longitudonal measurement and monitoring would be valuable. In questions, someone remarked that Google should put their servers at the exchange rather than at the monopoly telco.
Amogh Dhamdhere spoke again, on iTCP Congestion Signatures. Conventional speed tests don’t tell us much about the nature of congestion; it can be self-induced, as with last-mile flows, or did it hit an already-congested link upstream? These induce different TCP retransmit behaviour after all. He’s found that self-induced congestion can be detected by looking at the covariance, and the difference between the maximum and the minimum, of the round-trip time. These can be fed into a classifier. For example, a strong correlation between throughput and TSLP latency suggests that congestion is external. He has done controlled experiments in a testbed to validate the method, getting 100% accuracy in detecting self-induced congestion and 75% for the external variety.
The morning’s last speaker was Qiao Zhang, talking about measuring data-centre microbursts. He designed a high-resolution counter collection framework that can sample at 25 microseconds while keeping sampling loss below 1%, and deployed it at Facebook. He sampled a random 2-minute sample per hour over 30 racks for 24 hours. He defined a microburst as when utilisation goes over 50% for a short period. He looked at three levels: web, cache and hadoop, and found that the 50% threshold could have been as much as 80% with little change. Median inter-burst time is about a millisecond; the bursts themselves are correlated with an-app behaviour such as scatter-gather. The directionality is down for web and hadoop (because of fan-in), but up for cache (as answers are bigger than queries).
Franziska Lichtblau has been studying attacks that rely on spoofed IP addresses. She’s been collecting data on actual spoofed interdomain traffic. She first had to figure out how to detected spoofing efficiently, which she does by parsing ASes’ prefix lists; rather than using the existing Caida “customer cone”, which doesn’t account for peering relationships, she developed her own system called Fulcrum to do this better. It’s not trivial because of multi-AS organisations, hidden AS relationships, and stray traffic. She tuned her system to be conservative, with a low false positive rate. Applying this to a large European ISP, she found 0.012% of the bytes were spoofed – much of them being trigger traffic for DDoS attacks, which give rise to much larger total traffic volumes. She also found that 30% of IXP members don’t filter traffic at all. As for who’s being spoofed, it’s mostly Chinese ISPs, and the traffic is mostly going to NTP servers.
Mattijs Jonker was next, explaining how a third of the Internet is under attack. He’s been characterising the DDoS ecosystem, which has grown hugely of late with booter services doing DDoS for hire for a few dollars. In order to understand targets, attacks and protection services, he uses the UCSD network telescope (a /8 darknet), amplification honeypots (which offer large amplification, to attract attackers), active DNS measurement services and analyses of protection offerings. Two years’ data, till 2/2017, they saw almost 21m attacks, or 30,000 a day, targeting a third of the actively used /24 blocks; about half targeted http and a further 20% https. A total of 210m web sites were attacked over the two years, or about 2/3 of the total. The traffic peaks correspond to attacks on the big hosters. Finally, there are a lot of attacks targeting web infrastructure.
Zhongjie Wang has been looking at evading stateful Internet censorship. Like a standard network intrusion detection system, a censorship system may be vulnerable to hacks that mislead it about TCP and other state. Zhongjie has been playing with the Great Firewall of China and has got a 95% success rate with his techniques. The basic idea is to desynchronise TCP states and program states. He has insertion packets that are accepted by the GFW but dropped by the server, and evasion packets the other way round. Some of their packets were designed after careful study of the behaviour of the Linux and Windows kernels. However the GFW has been getting smarter, and now resynchronises on seeing multiple SYN or multiple SYN/ACK packets – or even SYN/ACK with an incorrect ACK number. Then it updates its SEQ number. It’s hard to the GFW to be fully immune to such attacks but its black-box nature and changing behaviour make measuring it hard.
Fangfan Li works on exposing (and avoiding) traffic-classification rules. There has been much work on obfuscation, domain fronting and the like, but such methods can be fragile to changes of classification methods. Fangfan’s project, “liberate”, is about automatically detecting the classification rules, characterising them, and selecting evasion techniques from a suite; these are tried iteratively until come combination works, or the repertoire is exhausted. Detection uses the “record and replay” mechanism he presented at IMC 15. The basic evasion techniques are match and forget, split and reorder, and flushing the classifier state. H found, for example, that he could defeat the great firewall by cache flushing, but the required delay varied by time of day from 50 seconds to 250 seconds.
Panagiotis Papadopoulos explained to us how If you are not paying for it, you are the product. The authors infer the amount that advertisers are paying for you by participating in the advertising ecosystem and tapping the real-time bidding mechanism. They conducted two probing ad-campaigns in 2016, using various features: cities in Spain, time of day, and day of the week. They also got 1594 mobile users to volunteer to use their proxy. Their probes enabled them to estimate the value of encrypted winning bids. It turned out that IOS users are worth more than Android users; time of day and location affect the price too. Notably, encrypted prices are 1.7x higher than cleartext prices: advertisers are paying real money to hide what they’re paying. There are high-value users: 2% of them cost 10-100x more than the median, with a median user costing only 25 Euro per thousand impressions.
Next was Joe DeBlasio, exploring the Dynamics of Search Advertiser Fraud. There has been little previous work on search ad fraud, in work with people from Microsoft. They analysed two years’ data, excluding account compromise and deceptive ads (which don’t move the needle); Bing typically gets 6 million fraudulent clicks a month detected within six months, and a further 1–2m later. About 25% of new advertiser accounts are closed within a day; many fraudsters are just too loud, trying to push as many ads as they can before they get shut down; but the good fraudsters mimic normal business. The top 1% account for 60% of clicks received, and the top 10% get 90% of them. Bing’s auctions, like Google’s, go on expected payout, which includes past performance. Fraudsters prefer broad to narrow matching, and are even more likely than genuine advertisers to just submit the modal bid for the market. Targeted verticals include third-party tech support (“install printer”) though this is down now as Bing’s trying to kill this entire vertical; weight loss; and impersonation from phishing to predatory streaming sites (but it’s hard for bing to tell genuine predatory businesses from the outright crooks). The impact on genuine advertisers is low, as they compete with fraudulent advertisers less than 1% of the time; but fraudsters compete with each other 90% of the time. Overall, fraudsters are forced to behave like normal advertisers, there are only a few elite fraudsters, and targeted interventions work. In questions, he has not having replicated Edelman’s work on the relative quality of ads and organic search results; and the fraudsters who bid top dollar are generally using stolen credit cards, and get caught after a while.
Jeff Kline was the least speaker in the ad session, discussing user agent strings. These appear in http requests and enable content to be customised for the display device. He’s from comScore, which logs billions of http and https requests; and while the big browsers account for most user agents, there are millions of them with a power-law distribution. Even the millionth most popular UA is seen thousands of times a day. Writing regular expressions to get only tables, or only some IoT species, is surprisingly hard, and long-term maintenance is hard.
John Rula has been studying the role of cellular networks in the Internet. Mobile subscriptions have grown to 7.3 billion and now generate 50% of web traffic, with mobile traffic growing 45% annually. Mobile and cellular are different; mobile is devices, while cellular is an access technology – the best path to connect the rest of the world. 50% of mobile traffic is wifi. We generally can’t tell which traffic is cellular, we don’t know how cellular networks are organised, and can’t see global usage trends. This work therefore labels cell networks and traffic at a global scale; the data are validated against ground truth and then used to look at global trends. The methodology is that “a large CDN” uses network information API to establish whether the device is on cellular and then uses this to label IP addresses (wifi hotspots that connect over cellular can get mislabeled). Traffic to Akamai is a good enough proxy for growth. John calculates the cellular ratio for each /24 v4 or /48 v6 address range; if it’s >50% cellular he labels that IP range as cellular. This has better than 98% precision, but lower recall if they don’t have coverage for some /24s. He finds 350,000 /24 subnets and 23,000 /48 cellular subnets. Africa is 53% cellular, as is 49% of all IPv6 demand and 16.2% of global demand, and 26% of Asia/Africa traffic. The highest is Ghana at 94%. The USA is not very representative; there’s lots of public DNS usage on cellular outside the US. The top 10 ASs make up 38% of cellular demand, and most networks that do cellular also serve fixed-line traffic. Cellular addresses are much more concentrated, thanks to carrier-grade NAT.
Apurv Bhartia has been looking at practical techniques to improve 802.11ac performance. Wifi is becoming more dense with more competing networks and new standards, so we need new measurement techniques. Apurv is looking at the techniques Cisco Meraki have introduced: the TurboCA channel planning algorithm reduces wireless TCP latency by up to 40%, the FastACK TCP wireless enhancement, and 802.11ac. This has a higher data rate at 3.39 Gbps (10 times more than before), and all new devices are using it: there’s 50% penetration for APs and 45% for clients. Using data from more than 1 million Meraki networks, each with at least 1 AP; their their cloud management platform; and from 100K active APs with 1.7M devices, he finds that 45% of devices now support 80MHz channel width. 2.4G is 10x more congested than the 5G band; on 2.4G management frames are sent slowly and take up a lot of spectrum. 35% of Meraki APs have their channel width manually tuned down; this can increase range and reduce interference. Traffic is variable and so channels might need to be reassigned but changing them disrupts traffic flow. This was evaluated in a museum (which it helped) and a university (which was already maxing out its uplink so better wifi made no difference). There was 40% lower TCP latency with TurboCA: TCP was not designed for wireless and does not understand that packets get batched up, particularly when there is congestion. So the AP forges the ACK and then drops the real ACK that comes later.
Shichang Xu has been dissecting VOD services for cellular. It’s becoming important to understand how streaming video on demand works over cellular networks, as 60% of mobile data traffic is streaming video, and this is predicted to be 78% by 2021. Over a quarter of users have problems with it daily. So Shichang did a measurement study of 12 popular
VOD services, using HTTP Live Streaming (HLS), Dynamic Adaptive Streaming over HTTP (DASH) or Smooth-Streaming. Clients download a manifest and then choose from tracks with different bandwidth characteristics; this can be complex as there can also be different language versions etc. Many services use VBR, where the peak bitrate may be twice median; some services declare the median and others the peak. The actual bitrate should be available as it can result in better track selection. Some services defensively select tracks using less than 50% of the available network bandwidth, and some clients re-download some segments when bandwidth improves (but this can cause congestion). Shichang looked at Exoplayer, which cannot easily use segment replacement as it has a double-ended queue and dropped support for segment replacement as a result. He evaluated an improved segment replacement algorithm, which replaced segments independently.
Carlos Andrade has been studying connected cars in a cellular network. Cars are a new kind of connected device, whose relevant characteristics are emergency response, infotainment, mobile device traffic, firmware over the air upgrades (FOTA), self-driving and high-resolution navigation. What will be the impact on the network? FOTA is his main example. Car firmware updates are 10s of megabytes to gigabytes; there are 250 million cars in the US and 90% of new cars will be connected by 2020. As software recalls cost $100 per car, FOTA would be cheaper – but can be life-critical, so must be done right. Cars are only on the network for about an hour per day (the 99.5th percentile is 3.6 hours), much less than mobile devices; they last 11 years on average, not the 2 years of mobile phones. Cars currently only do downloads when the engine is running. Bulk downloads can push cells up to 100% utilisation from their normal max of 80%. So Carlos analysed 1 billion radio connections from 1 million connected cars from 90 days in the US. Cars appear more on weekdays than on weekends, 80% on weekdays and 65% on weekends. 60% show up in non-busy cells that will not be impacted by bulk downloads, but lots of handovers make scheduling harder.
Austin Murdock’s problem is Target Generation for Internet-wide IPv6 Scanning. There are too many IPv6 addresses for exhaustive scanning so all sorts of tricks have to be used by the scanner. You start off from known addresses as seeds and look for patterns that can be extended; there are many of these to try. He gives a number of examples of range extension algorithms and heuristics. 6 billion probes brought 55 million responses; it turns out that CDNs such as Akamai have /96 blocks of address space that respond to all probes. Once this was understood, 98% of the responses went away, leaving about a million responses corresponding to 3000 routing prefixes
Mark Allman has been developing a universal measurement endpoint interface. His PacketLab saves measurement researchers from having to rewrite their code for every different endpoint. You can write an experiment once and run it anywhere, cutting the costs – particularly to operators. Its key technical ideas are to move key logic from the endpoint to a control server and to have programs on the endpoint that define allowed experimenter behaviour and can be audited by the operator, who can issue a certificate authorising what can be done. The experimental packets are crafted by the control server. This will be in addition to their existing Archipelago interface.
Joel Summers has been working on automatic metadata generation for active measurement. Experimenters need to document what they collected so they can understand later what might have been happening, for calibration against other tools, and to bake in reproducibility from the beginning. Their tool, SoMeta, has been evaluated on constrained tools such as two versions of Raspberry Pi (on version 1b it took 12% of CPU in the worst-case scenario); and also to check that it can identify local network effects in a controlled lab environment as well as a home broadband environment.
Pryce Bevan has developed a high-performance algorithm for identifying frequent items in data streams. His algorithm takes a pass over a stream and compute a summary to answer queries quickly later; they optimise the existing weighted-update algorithms of Misra and Gries. The key idea is to decrement by the median counter rather than the minimum, together with a careful choice of hash table.
Kenjiro Cho’s topic is hierarchical heavy hitters – significant traffic clusters across multiple layers. He provides a recursive lattice search algorithm based on Morton’s z-ordering. This runs orders of magnitude faster than bitwise partitioning. He provides an open-source tool and an open dataset.
Samuel Jero has been taking a long look at QUIC, a new UDP-based Internet protocol designed by Google for http traffic. QUIC is designed for modern web browsing and has seen significant deployment, having appeared in Chrome in 2013; it has 7% of Internet traffic and has a vigorous standardisation community. Google tells us it improves page load time 3%, reduces search latency 8% and cuts YouTube buffer time 18%. However, independent evaluations are in their infancy. Samuel has been developing a methodology and testbed network for automatic head-to-head comparisons of QUIC with TCP (+ TLS + http/2), including root-cause analysis, over multiple platforms and channels. Quic performs better than TCP on the desktop, except in the presence of reordering; Samuel reported possible tweaks to Google who’re working on them. On mobile platforms performance tends to be application-limited; on cellular networks QUIC performs better on LTE but there’s no difference on 3g because of the reordering issues. QUIC’s congestion window increases faster, for reasons not yet understood; but it’s unfair to TCP.
Jan Rüth has been trying to understand TCP’s initial window. The first measurements of window size effects were in 2001 and 2005; increasing page sizes and better networks led to proposals to increase it, and then we got a standard. But what do people actually do? Jan has followed the 2001 work, starting a lot of connections with a small maximum segment size and a large receive window. With no ACKs, the number of bytes received is the window size, modulo losses. There’s a whole lot of technical details and problems to be dealt with; see the paper for details.
Brian Goodchild argued that the record route option is not an option. Measuring Internet routing is hard as the protocols weren’t designed for it. The record route IP option allocates extra space in each packet header and instructs routers to record their addresses; it can handle up to 9 hops. It has established use cases, such as reverse traceroute. However, roughly half the paths between PlanetLab nodes dropped IP-options packets. But do we need ubiquitous support? Brian set out to measure the effectiveness of record route. he found that most destinations do respond to record route, from at least one vantage point; two-thirds are within 9 hops of his closest vantage point; and the world’s getting smaller – that would have been 6% in 2011. This may be due to increased peering.
Tuesday’s last talk was by Will Scott, who has been studying the Internet in Cuba. People there don’t get access at home; there are about 400 wifi hotspots in the country, with 60 in Havana, with access costing about a dollar an hour. This is supplemented by secondary markets not just around wifi hotspots but hard drives that get passed around and offline Internets that people use to chat to friends and family. For example, a street network in Havana, run by volunteers, serves 50,000 people. Scott has been down a couple of times to study this phenomenon, Snet. He started with snowball sampling, hardware observation, then ping scans, traceroute and content sampling. The backbone consists of directional wifis that connect to others and local wifis to the housing block, with some ethernet for LAN gaming. The routing is anchored on about ten pillars, and flaps occasionally, but sort of works, based on open shortest path first. The most popular sites are gaming, forums, blogs and live chats; there are also dumps of stackexchange and wikipedia for reference. They have a github and an MPM mirror to support developers. A lot of software is (very) out of date, and often the only way to do account recovery is to talk to the person running the service. Teamspeak is used to administer everything, including the shared zone file; different communities have their own /16. Curiously, although this is all strung together by volunteers, with no item of kit costing more than $100, the pain points are much the same as on the proper Internet.
The security session (which contained all the award-winning papers) started with Johanna Amann surveying https security after DigiNotar. Since the big CA compromises a few years ago, a number of mitigations have been proposed; but are they being used well, or at all? Johanna did a number of large-scale active and passive scans, including 192 million domains from the .com registry. The biggest is certificate transparency (CT), promoted by Google, which has semi-trusted publicly-auditable append-only logs, with browsers such as Chrome verifying proofs of inclusion. These proofs can be provided by the server or the CA, and in the latter case with the cert or via OCSP. Abut 10% of websites use CT and almost all send the proof with the cert rather than using the OCSP route. Most of the cases where a TLS extension is used are instances of Let’s Encrypt. Some interesting misconfigurations were found. Other mitigations include scvs (used by 96% as it’s deployed with server upgrades); hsts (strict transport security) found on 3.5% of domains, and generally used correctly; hkpk (public-key pairing, which limits certs from other domains) is used on only 0.2%. The overall lesson is that ease of deployment – including the cost, complexity and risk of deployment – are all-important.
Joe DeBlasio is interested in password reuse attacks. These involve passwords leaked by site A to log on to site B – usually an email provider, as that is often they key to many further exploits via password reset. Many compromises have become public, but what can be done to detect compromises in progress? He’s developed TripWire, whose proof-of-compromise involved setting up accounts on 2300 sites, leading to the detection of 19 new compromises over 24 months. They did not get consent from the monitored sites as this is impractical at scale but took counsel’s advice and do not disclose the names of compromised sites. They pre-manufacture email addresses with either easy or hard passwords and register both on each monitored site to detect whether there was a breach of plaintext or near-plaintext; they got accounts on 2,300 sites out of 30,000 attempted. Their registration bot is best-efforts, blocked by good CAPTCHAs or demands for credit cards. An email provider supplied 100,000 accounts and monitored them. Ten of the 19 compromised sites had hard passwords compromised, so there was a plaintext breach; four of these are Alexa top 500. Most compromised accounts were not visibly abused; a quarter were used for spam, and one had a password changed, but three-quarters would have got no indication. The 1750 logins came from 1300 IPs, mostly residential; access was bursty, suggesting a botnet proxy. Notifying compromised sites is hard; two-thirds didn’t respond at all, and none notified their users. Some owned up to old software, plaintext passwords etc. – with small sites typically saying they knew about problems but didn’t have time to fix them yet. Large sites typically responded within an hour with engineers and lawyers; one of them had had complaints from Twitter users, and denied them. Not one site could pinpoint the compromise. In questions, Joe admitted not populating the email accounts with interesting material, as this is hard; future work might involve putting credentials for other sites in there and seeing if they’re used. For small sites, the prize of compromise is password reuse; for email compromise it’s interesting content turning up (some of the accounts were visited regularly).
Shehroze Farooqi has been investigating OAuth access token abuse by collusion networks. Many popular Facebook apps are vulnerable to access token leakage and abuse. As an example, you can log into Spotify directly, or using your Facebook account; in the latter case Spotify gets a token authorising it to access certain information. TFC 6819 already discusses leakage at the client side; Shehroze was interested in how many apps were vulnerable, so scanned the 100 most popular and found that nine were susceptible, all with about 50m active users. Leaked tokens can be used in passive attacks (to steal email, location, birth date etc) and active attacks too (from fake likes and comments to spreading malware). Collusion networks are websites to which users deliberately submit tokens in return for likes and comments using vulnerable apps such as HTC Sense; Shehroze found over 50 of them. He milked them to observe cumulative users (increasing, but tailing off) and generated likes (increasing steadily). Having submitted 11k posts, he got 2.7m likes. He proposed an implemented, with Facebook, some mitigations: access token rate limits, honeypot-based access-token invalidation, temporal clustering to detect bursty activity, and IP rate limits. He collected data on the effectiveness of these measures; they have been effective for several months, though an arms race is to be expected. Botnets could be used to defeat IP rate limits, but honeypots can mitigate tat to some extent; there are controls on app developers, and action might also be possible against the offending websites.
The last speaker of the first session was Taejoong Chung, talking about why DNSSEC deployment is rare. To sign your .com domain you need to send a DS record with your public key to the .com registrar. However you bought the domain from GoDaddy, not Verisign, so you need to work through them; there may be further steps if you went through a reseller to GoDaddy, and use CloudFlare too. There are too many manual steps where things can go wrong. For example, people can upload DS records from unregistered email addresses and get them signed, or send them from the right address and get the key bound to the wrong domain. Typos can also break the chain of trust. So Taejoong looked at the top 20 registrars and found that only three support DNSSEC on their own servers; only two check whether they got an upload right. Each registrar has a different policy and nine don’t support DNSSEC at all. The top 12 cover 88% of domains; all but two will generate a DS record for you, but as noted only two will check it. Free DNSSEC support encourages people to deploy it, and there are some interesting observations: for example, the .nl registrar has almost universal DNSSEC deployment, but is also a reseller for .com, org and .net where uptake is about 60%. Third-party operators such as Cloudflare have somewhat complex procedures; about 40% of the people who tried to set it up appear to have failed.
Yuanshun Yao has been looking into the dependability of machine learning as a service. A bunch of companies are offering AI/ML in the cloud; you upload your training data, then specify the type of model to be tried, which parameters you want to tune, and so on. However, this user control affects performance significantly, which opens up security issues. Theoretical modelling is hard, so Yuanshun did an empirical study of the three big platforms and three startup offerings, by uploading training data and tuning all the available control dimensions. There’s quite some variance in how much the tweaking affects performance. But does this enable us to reverse-engineer the optimisations using different datasets? Yuanshun created some datasets specifically to have corner cases that would elicit information about algorithms, such as overlapping point clouds and concentric rings. Google, for example, switched between linear and nonlinear classifiers
Xianghang Mi has been busy characterising IFTTT. “If this then that”, or IFTTT, is a trigger-action platform supporting expressive applets in networks of IoT devices, which it links to each other and to web services. Six months of measurements show that it has been growing rapidly as a way of automating applets (now 16%) and IoT-related tasks. He also has a testbed. Xianghang observes that the scripts used to automate tasks are actually rather sensitive but does not yet have a methodology for evaluating the security exposure.
Savvas Zannettou has been studying the dissemination of fake news between communities. There are now major “alternative” news sources such as 4chan’s politics board /pol/ with it high volume of hate speech and racism, and reddit’s pro-Trump board. How can this phenomenon be analysed quantitatively? Savvas compiled a list of 99 mainstream and alternative news sites, and hundreds of thousands of tweets. He then looked at how URLs propagated from one community to another by graph analysis, and in particular by Hawkes processes. This enables him to estimate the influence of different sources; bridge communities can have high influence. NLP holds out the prospect of more nuanced analysis as you can look at the actual text rather than just URL chains; but a lot of the influence actually works through the sharing of images.
Janos Szurdi started the abuse and ethics session with a talk on email typosquatting. We now have 14 years of research on typosquatting, which feeds a lot of tech support scams. Janos’s work has been on what user mistakes can benefit typosquatters (typos in email domains such as email@example.com); how often they’re made (they squatted some domains, with ethics approval), how many emails the domains in the wild receive (the yield depended on the popularity of a target nearby on the qwerty keyboard; 45% of suitable domains were owned by 1% of registrants), and what happens when mail is sent to actually squatted domains (2/3 of plausible domains can receive email; honey emails were sent to 50,000 domains of which 40% accepted but only 33 emails were actually read with two attempts to access sensitive information referred to). From this he estimates that the squatters get 1211 domains get 850,000 emails a year at a cost of a penny each. The emails sent to their domains could be used to launch spearphishing campaigns as they contained significant amounts of sensitive information.
Peter Snyder was next, and has been gathering data on doxing. This consists of collecting information on target individuals and then placing it online on sites such as pastebin or 4chan to embarrass them. This can lead to further real-world harms such as SWATting. What’s going on at scale? Peter has worked with public data, automating the collection files from doxing sites in the summer and winter of 2016 and building a classifier to spot dox files based on hand-labelled samples. Some 1.7 million text files yielded 5500 dox files from 4,328 dox incidents; victims range from 10 to 74 with a median age of 22, and were mostly in the USA; released data often included names, addresses, and online accounts, but also credit card numbers and criminal records; the perpetrators (according to the “why I did it” text) ranged from competitive (doxing elite security people), vengeance and justice (the most common), and politics. Twitter handles suggested some coordination but in small cliques. Peter compared doxed with random control Instagram accounts; many accounts went private after doxing, although there has been significantly less account status change since Instagram and Facebook improved abuse controls in 2016. He suggests a doxing notification system like “Have I been pwned”; an anti-SWAT-ting list for law enforcement; and takedown agreements with pastebin.
The third speaker was Daniel Thomas discussing the ethical issues of research using datasets of illicit origin (full disclosure: Daniel’s one of my postdocs). The previous paper was an example of research with data of illicit origin; there are many more, and when using illicit data there may be many human participants identifiable from it by others, whom we must not expose to harm. This research was motivated by research on DDoS which used leaked booter databases for ground truth – which identify high-school kids who’ve committed crimes. Under what circumstances can such data ethically be shared? Daniel looked at data obtained using malware, password dumps, leaked databases, classified data (like Snowden) and financial data (like the Panama papers). The Menlo Report is a default staring point for security research. One has to identify stakeholders and harms, get consent where possible, and think about safeguards, justice and the public interest. Possible legal issues range from computer misuse and data protection to indecent images or terrorism, either of which could make it illegal to hold the material. The paper surveys 28 cases where research papers discuss such issues. It transpires that safeguards are often ignored, as are legitimate reasons for research; both harms and positive benefits remain unidentified. Few papers record explicit ethics approval. In short, there appear to be significant deficiencies in the ethics process. Similar issues arise in economics, journalism and elsewhere; many researchers use data of illicit origin.
The last session was started by Bradley Huffaker who likes digging into geolocation databases. He wants to measure coverage, consistency and accuracy across four databases and has been using a number of techniques such as round trip time to cross-check accuracy; this showed that 2.2% of Atlas probes may be mislocated. He uses a 40km notional city radius to check consistency. NetAcuity was best on accuracy with 89.4% of IPs in the right country and 73% in the right city; much of its advantage comes from being better than ARIN.
Vasileios Kotronis has been studying shortcuts through colocation facilities. Overlay networks can cut latency; but where’s the best place to put a small number of relays for that, or for resiliency? He spent a month selecting PlanetLab relay sets and testing round trip times, using 4,500 relays and 1,000 endpoints, starting with the assumption that they should be close to eyeballs.
Wouter de Vries’s subject is load-aware anycast mapping. Users are routed to the nearest instance of an anycast service, in theory; but the catchments are messy for the same reasons BGP is. Can this be measured better than staring at log files or probing? There are 563 RIPE Atlas probes in the Netherlands but only 19 in China. His system, Verfploeter, uses passive vantage points with ICMP. He presents a case study of the b-root DNS server and compares it with RIPE Atlas, and also has a testbed called Tangled with nine sites for testing and measuring anycast. In effect, VerfPoeter sees 460 times as many vantage points as Atlas.
The last paper of IMC 2017 was by Moritz Müller, talking about recursives in the wild. A typical authoritative name server has a mix of unicast and anycast servers, accessed through recursive resolvers. The resulting behaviour can be opaque; so how do recursive resolvers behave in the wild, and how can we use this knowledge to improve name server performance? He set up a test domain with seven servers worldwide and hammered them with RIPE Atlas probes. Resolvers in 2,500 ASes had access; query shares and RTTs were measured. Resolvers generally favoured the faster service but some queried slower servers occasionally and a few always queried one server; some changed preference slowly, perhaps because of caching. The main outcome of the study was a recommendation that all servers be anycast, so long as you can put them in sites with very good peering.