« new pats posted - 20090729 (maintenance pats release) | Main | new pats posted - 20090730 (maintenance pats release) »

July 30, 2009

Why I have a headache today

Enemieslist started out as a loosely defined effort to stop spammers using botnets on hosts with generic names. This was sort of an innovation at the time, because most postmasters were only worried about "dynamics". Blocking mail from "generic" hosts is a fairly straightforward concept, however, and back in the dark ages of 2003, it was inspired by two related issues.

Things had gotten to the point where we got so much spam that it meant reporting it was a massive waste of time. This was especially true when dealing with entities that were big enough to have a significantly large customer base that they also had a correspondingly large number of customers with botted PCs. I'd received enough auto-acks by then, and was tired of policing ISP's networks for them. It's one thing to report an open relay to postmaster on a single MTA; it's quite another to report all of the spam you get from, say, BellSouth customers. Every day. For years.

The other thing that contributed to my overall concept of "generic" at the time was that a spammer named Brian Westby had used a forged address in our domain as the sender for a massive spam run, lasting several weeks and only ending when the FTC cut him off. We got many tens of thousands of what would come to be called "outscatter" messages, many including the headers, if not the body, of the original message they'd received, then later decided to bounce "back" to the purported sender. It was obvious that the original messages had been sent directly from end user nodes with generic names; back then, a fair number of them were dialups though that has changed with the broadband rollout worldwide. And they all had generic, provider-assigned names. Often, they included some form of their IP address, or a token indicating what they were: dialup, DSL, cable, and other typical end-user market services (contrasted with leased lines and the like).

Reasoning that if we didn't want the outscatter from spam that had been received from these hosts, we probably didn't want messages directly from them, either, and drawing on a background that included experience with regular expressions, I started building a database of patterns that matched the names I'd seen in the headers of the outscatter messages. At first, I didn't classify anything - simply being generic was enough - but over time I started to distinguish between dynamic and static and various subcategories such as NATs and proxies and webhosts and resnets. As the data set grew, I found myself tracking other subsets, such as "outmx", simply to prevent myself from making a mistake classifying some of the more weirdly named legitimate mail sources. And with that, the definition of "generic" was stretched, perhaps beyond redemption.

A recent experience we had serves as an illustration of how weird this can get. We got some spam from a host with a dyndns.org HELO. As DynDNS is a service that allows people to set up static DNS labels pointing to (probably) dynamic IPs, it's a pretty clear case of "this IP is probably dynamic", so we classified everything under dyndns.org as "dynamic". We quickly realized that there is significantly more clue allocated to the sorts of folks who know to get a dyndns.org hostname, and it's likely they're the sorts who like to run personal mail servers and so forth. Disregarding the issue of whether their ISP forbids such practices, or charges more for static IPs and custom PTRs, the bottom line is that the hostname is static, even if the IP underneath it is dynamic.

So, how should Enemieslist classify such a host?

The question is what my old college professor would have called "transgressive"; it isn't easily answered in the simpler context in which it is asked, and in fact tends to disrupt the categories and framework of concepts that the question relies on. Because DynDNS allows you to choose any label you wish for your hostname, it's not technically "generic" - defined as "relating to or common to or descriptive of all the members of a genus (or set)". You can make the argument that all of the members of that set have a common characteristic, namely, they're masks or aliases for other names, but that's pretty weak.

I've already talked a bit about why the concept of "dynamic" versus "static" is wiggly, namely, it's merely a matter of intention rather than of duration, but this goes further. It's actually a problem we have with several of our subclasses, such as NATs and resnets, and it boils down to this: in the context of reputation, names are only truly attached to their IPs, but not necessarily to the hosts that occupy those IPs, or use those IPs as injection points for their traffic, at any given time. This is of course true in a sense for all botted hosts, whose activity is controlled from afar via a "command and control" or C&C host, after all.

So when we judge a naming convention, we're making assumptions about the hosts that will be assigned the names' corresponding IPs. And even then, we're making assumptions about the traffic that will come out of those IPs - in the case of an insecure NAT, that traffic comes from a host on the LAN or VPN behind that interface, for example. Resnet IPs may be statically assigned to drops in dorm rooms, but if the computers using those drops change every semester, the reputation of those hosts may change as well.

So we are forced to make judgements on hosts as a set, or class, based on what we know about their names and the netblocks that have hosts with those names. Usually, the host part of the label is sufficient to judge dynamicity; we don't often need to examine the domain name itself (though all of our patterns are fully anchored, and sometimes they provide clues about technology), "genericity" is determined through observation (such as if the IP is part of the name) or by having multiple examples fitting the same pattern. But in the case of DynDNS, it's the domain itself that indicates "non-static", because that's the whole point of the service.

The final resolution to this question, in the case of DynDNS, is unclear. One saving grace of it is that any host that HELOs with a dyndns.org hostname is also likely to have a dynamic, or at least generic, PTR, or no PTR at all, so all is not lost. But it may well be that we will need to create a new class, as we have had to do to describe "cloud computing" services, and assign it to all of the domains used by DynDNS and other such services. Another option is to simply avoid classifying it at all. But neither of these is satisfying, given that our purpose here is to help evaluate the risk that a specific host may be part of a botnet, on a sliding scale with dynamics at the "bad" end.

So, a headache.

Posted by schampeo at July 30, 2009 12:06 PM

Trackback Pings

TrackBack URL for this entry:
http://enemieslist.com/mt/cgi-bin/mt-tb.cgi/968