SURBLs need to be used with programs that can extract domains from message body URIs, which so far are new and few, at least in the context of traditional RBL usage.
In contrast, most other RBLs list the IP addresses or domain names of spam operations, spam mail servers, open relays, open proxies, etc. Programs that use these traditional RBLs generally check those list entries against message headers. They generally do not check message bodies, i.e., the content of messages. Most prior efforts to look at message body URIs have converted the domains they find into IP addresses for comparison to a numeric RBL.
We feel that SURBLs directly address the problem of spam-advertised domains. We also feel that there is a significant performance advantage in comparing SURBL domain names to found message body URI names, since there is no delay needed to resolve the names into numbers.
Important Note: Matt Kettler says: DO NOT run SA 2.63 on a production server. Upgrade to 2.64 or 3.x because 2.63 has a MIME parsing bug that can be used to DoS your server.
For SpamAssassin 3.X, there is a suite of programs contained in the plugin URIDNSBL including some that can be used with SURBLs and some that can't:
For more information, please see the SURBL Quick Start.
A very popular and fast name server specifically meant for serving up RBLs is rbldnsd. SURBL zone files are available in rbldnsd format. (They are also available in BIND format, though rbldnsd is recommended as significantly leaner and faster.) There are links and instructions for using rbldnsd with rsync in the Links section.
Then arrange with the RBLs to get rsync access to their zone files. Since rsync only transmits differences, the zone files are kept updated in a very efficient manner. To get rsync access to SURBL zone files, please fill out our rsync access form. Other RBLs may have similar procedures for gaining rsync access.
Then configure your mail or SpamAssassin servers using RBLs to do the lookups on your local RBL name server. Many people run the local DNS for their RBLs on their mail server(s), which tends to work well since it keeps everything on the same box. If your mail server is separate from your RBL name server, then set up DNS on the mail server to resolve using that RBL name server. The following documents may be helpful in setting up local caching:
This may work better with applications that decode the bits into individual list results, but it is a change from before and it may break other applications' use of the testpoints.
The multi.surbl.org BIND zone file contains:
test.surbl.org 604800 IN A 127.0.0.126 604800 IN TXT "multi.surbl.org permanent test point" test.multi.surbl.org 604800 IN A 127.0.0.126 604800 IN TXT "multi.surbl.org permanent test point" surbl-org-permanent-test-point.com 604800 IN A 127.0.0.126 604800 IN TXT "multi.surbl.org permanent test point" 2.0.0.127 604800 IN A 127.0.0.126 604800 IN TXT "multi.surbl.org permanent test point"The multi.surbl.org rbldnsd zone file contains:
test.surbl.org :126:multi.surbl.org permanent test point test.multi.surbl.org :126:multi.surbl.org permanent test point surbl-org-permanent-test-point.com :126:multi.surbl.org permanent test point 2.0.0.127 :126:multi.surbl.org permanent test pointThose resolve into the following Address records:
Name: test.surbl.org.multi.surbl.org Address: 127.0.0.126 Name: test.multi.surbl.org.multi.surbl.org Address: 127.0.0.126 Name: surbl-org-permanent-test-point.com.multi.surbl.org Address: 127.0.0.126 Name: 2.0.0.127.multi.surbl.org Address: 127.0.0.126But note that only the last, two-level domain surbl-org-permanent-test-point.com will work as the base domain for a URI in a test message for SpamCopURI or urirhsbl. This is because URIs with test.multi.surbl.org.multi.surbl.org, etc., won't be detected by most SURBL-using programs because they're supposed to be reduced down to a two-level domain which would be surbl.org for those.
http://surbl-org-permanent-test-point-MUNGED.com/or:
http://127.0.0.2-MUNGED/without the "-MUNGED"s. So if you send yourself a message with any of those unmunged testpoints as URIs, the messages should match any SURBLs you have installed. (The name of the list, in the earlier examples sc.surbl.org, is only added to DNS queries on the RBL.)
http://spamdomain.com/can be rewritten as:
http://spamdomain-MUNGED.com/That would require some awareness on the part of the person forwarding a spam or discussing a listed spam site, but it's just as doable and humanly readable as munged email addresses, which people do all the time.
It's a good practice to use little or no filtering on your anti-spam mailing list messages and abuse contact addresses, or to whitelist them around spam checking.
For example, if you're using spamd, make sure it's started without the -L or --local flags, which force local tests only.
If you are running Amavis, make sure amavisd.conf has $sa_local_tests_only = 0. (Uncomment this line if it was commented out before, then set the value to zero to enable network tests.)
If you are using MIMEDefang, make sure you set $SALocalTestsOnly to zero:
# If boolean true, skip SA network tests
$SALocalTestsOnly = 0;
to enable SpamAssassin network tests from your mimedefang-filter.
Also make sure that you have a recent Net::DNS installed. Too old versions of Net::DNS seems to be a common reason for RBL checks not working, especially when upgrading from an older version of SpamAssassin.
urirhssub URIBL_JP_SURBL multi.surbl.org. A 64
body URIBL_JP_SURBL eval:check_uridnsbl('URIBL_JP_SURBL')
describe URIBL_JP_SURBL Contains a URL listed in the JP SURBL blocklist
tflags URIBL_JP_SURBL net
score URIBL_JP_SURBL 3.0
Where body above was previously header.
Here is the changelog reference:
r54022 | felicity | 2004-10-07 22:21:30 +0000 (Thu, 07 Oct 2004) | 1 line bug 3734: uridnsbl rules work on body data, not header data, so change the rule type from header to body
However the act of resolving the domain can confirm for spammers that your specific address was reachable, that you've opened their spam, etc. And the name resolution can significantly delay the amount of time it takes for your provider's servers to process each message. (The delay is on the order of a few seconds to a few minutes; not a big deal to an end user, but a major bottleneck to the provider's servers which typically need to handle thousands of messages much more quickly.)
In contrast, SURBLs contain mostly domain names that have appeared in spam message body URIs. Typically those are the web sites that the spammers are trying to advertise. Using SURBLs doesn't require resolving the domains that appear in spams. That's safer, more private and much faster.
While some SURBL lists such as SC and AB use data from SpamCop, it's the Spamvertised sites that they use and not the sender IP addresses. Those are the web sites advertised in the message bodies of spams which have been reported to SpamCop.
The difference is between blocking spam senders based on message headers, as the SpamCop Block List is commonly used for, versus blocking based on URIs advertised in spam message bodies, which is what SURBLs are used for. This is useful because spammers frequently shift the IP addresses they send spam from, but they tend to advertise the same sites ad nauseum.
Instead we list only on the domains and occasional IP addresses that actually appear in spam URIs. That way if a web server has one spammer and 499 non-spammers, only the spammer's domain will be listed, leaving the 499 non-spammers unaffected. Our efforts are thus focussed only on the abusers.
One of our goals is to arrive at "set it and forget it" data that an ISP can use at the MTA level with little fear of blocking legitimate messges. That requires very low false positive rates. Creating collateral damage would be moving away from that goal.
If we hear that any false positives (legitimate domains) appear on SURBL lists, then we research them and if appropriate remove them quickly.
If it's the case that domains expire out of the SpamCop URI data sooner than the particular spam domains remain a problem, then we can definitely see a need for a longer expiration.
The idea is that if a domain continues to appear in spam, people will continue to report it, and it will therefore continue to show up in our SURBL data.
Other techniques used in conjunction with SURBL such as other RBLs may be able to catch spams which we miss, and we may be able to improve our process to catch more spams our way. Our spam detection percentage will be remain high if the SpamCop reports represent a good cross-section of actual spams at any given time, as we believe they do.
http://drs.yahoo.com/covey/parr/*http://spammer.address/SpamCop itself seems to disambiguate (most of) the redirection. If someone is using a redirector to send traffic to spamdomain.com, SpamCop seems to detect and resolve it correctly to spamdomain.com most of the time. So the data that's used as input to sc.surbl.org already has redirectors correctly handled to some extent. In other words, we're protected on the data input side by the processing that happens at SpamCop to take out the redirection in reported URIs.
SpamAssassin programs such as SpamCopURI and urirhdbl that use SURBLs are capable of handling redirections to differing degrees. SpamCopURI 0.14 uses LWP to get Location information to untangle up to four levels of redirection sites without actually visiting the sites. URIDNSBL's urirhsbl includes patterns to extract the final domains from some redirection URIs. Further development will probably improve the handling of redirection sites.
The big picture solution is for the redirection sites to block spam domains on their own. In other words, they should not let spammers rediect through their sites. Until they do so, their services can be abused by spammers. Some, such as tinyurl.com reportedly actively block and report spammers who abuse their site. Others such as Metamark and SnipURL are using SURBLs to deny spammers access to their redirection services. Here is an Open Letter to Redirection Sites that may be used or modified to contact them.
We've seen quite a few randomized or customized (to a username for example) host names in some of the top pharmaspam sites. There are different possible reasons for the randomization: to add chaos to the names to throw off message body checkers, or perhaps to "key" spam site web visits to specific mailings in order to build a confirmed mailing list. (Such confirmed mailing lists themselves are probably a valuable commodity to sell to other spammers.) Randomization doesn't throw us off though; we catch them from the base domain part, which can't change.
sc.surbl.org is meant to be a record of the most frequently reported domains in spam message bodies that SpamCop users choose to report. In this sense it's like a broadly-based, hand-tuned black list of domains commonly found in spam. Because quite a few reports need to be received to for a domain to get added to sc.surbl.org, it effectively represents a consensus voting system about which URI domains are spammy. One improvement might be to encode the frequency data in the RBL so that more frequently reported domains could be used to give higher scores.
Future versions of the data engine behind sc.surbl.org will probably have a longer default expiration time of 10 days, and will probably also set a lower threshold and longer expiration for professional spam operations and for domains hosted at spam-friendly ISPs. We may also adjust the expirations to be longer for domains that receive very many spam reports. In essence, each spam report would describe a "crime" punishable by longer expiration days "prison sentences", with the sentences consecutively served. With the current data one additional day per ten reports looks about right.
Some sort of hysteresis mechanism could also be useful to prevent spam domains which get a low level of reporting from coming off the list and getting back on it repeatedly as sometimes happens with the original engine. This apparent "recidivism" is caused by the reports expiring and the count dropping back below the inclusion threshold. Some fresh reports then come in and raise the count back above the threshold. Longer expiration times automatically help with this to some extent as do lower thresholds.
As message body spam domain blocking becomes more prevalent, reports will tend to decrease rapidly after the flurry of initial ones, and a longer memory of those will become important.
Certainly the existing SURBL whitelist could be used to prevent Joe Jobs (false reporting or blocking of legitimate domains). We've already added some of the common domains like yahoo, hotmail, ebay and amazon, etc. These seldom appear above the threshold yet, however, so the law of averages and careful reporting seem to be on our side so far.
(Note that the above comments apply to the handling of SpamCop URI data that goes into sc.surbl.org. However the gloabl whitelist applies to all SURBLs, including sc. Once a domain or IP address is whitelisted, it's excluded from all SURBLs.)
Update: Our whitelist, which we use only to exclude domains from SURBLs, not to "allow" messages, is growing but doesn't hit data from the various SURBLs too often. The goal is to keep whitehats off the lists in the first place by being careful with the input data. The whitelists are intended to be a safety backstop to make sure domains with legitimate uses don't get added.
URIDNBL rules: (Please see this link for the full list of domains to exclude from checking.)
# Top 125 domains whitelisted by SURBL uridnsbl_skip_domain yahoo.com w3.org msn.com com.com yimg.com uridnsbl_skip_domain hotmail.com doubleclick.net flowgo.com ebaystatic.com aol.com [...]SpamCopURI also has a built-in whitelist function:
whitelist_spamcop_uri *.yahoo.comOther SURBL applications may have similar exclusion features. If not, their authors may want to consider adding local whitelisting.
Notes:
A solution is to disable such site correction or modification features on servers or clients doing SURBL queries. Alternatively, consider using regular (non-modifying) nameservers for those systems.
Note also that SURBL applications may be incompatible with DNS modification or proxy services that change the DNS query results of non-matches (NXDOMAIN results) for non-existent sites.
Note that as of 1/25/07, OpenDNS no longer modifies results for SURBL lists. It should now be safe to use OpenDNS with SURBL applications. In fact, if you find you are behind a firewall or proxy that is modifying SURBL DNS queries incorrectly, one solution is to change your DNS resolvers to use the OpenDNS nameservers. It has been reported that this corrects the problem.
Additionally some ISPs such as Verizon and others are now modifying some DNS NXDOMAIN responses in a way that causes what look like false positives on domains that are not blacklisted. They appear to be doing this to drive search traffic to other sites, but unfortunately it breaks DNS responses for SURBLs and other blacklists. Please check with your ISP if you are seeing DNS responses modified in this way. Verizon has an opt-out procedure with instructions on switching to DNS servers that do not change NXDOMAIN responses. Others such as Charter have opt-out nameservers that reportedly do not support NXDOMAIN. If so, then none of their nameservers may be compatible. One solution is to not use their nameservers.
There are also some security or privacy concerns about resolving a keyed domain name, since that could give out information about the success of a spam, for example if the recipient is keyed in the full domain name as in:
http://resolving-this-confirms-specific-recipient.spammerdomain.com/The concern here is that the act of performing the resolution itself could be used as a confirmation of a delivery attempt given a URI customized (keyed) to a specific recipient. Such a confirmation could be used to build additional spam lists, even if it just helps narrow down messages (and therefore recipients) which made it through the gauntlet of other filtering methods. In other words name resolution can potentially provide useful information to the senders.
Name resolution also adds a significant performance penalty, especially on high volume mail servers. For domains that don't resolve, a timeout of tens of seconds can result. These kinds of delays can make resolution of URI message body domains impractical for busy mail servers.
That said, if an IP address does appear in a spam URI, then it can appear in SURBLs as an IP address. The principle is to accurately record what's in spam URIs. If there's a domain name in the spam then that name can get onto a SURBL. If there's a number it too can get onto the SURBL.
Creating a list of the resolved addresses is something we considered. Doing so would be too similar to existing number-based approaches such as using the sbl.spamhaus.org RBL with the SpamAssassin command uridnsbl, of which SURBL-using urirhsbl is its domain-based twin. Another way to use number-based RBLs to check message body domains resolved into numbers is the sendmail milter which the SpamHaus site mentions: http://www.five-ten-sg.com/dnsbl.html .
However the current version of the sc.surbl.org data engine is a hybrid name and number approach, where if a domain resolves into an IP address commonly used with spamvertised sites, then that domain will get added to sc.surbl.org probably with the first report. (Note that this still requires at least one report, but the threshold for inclusion will be radically lower for major spam operators who repeatedly use the same IP address for their hosting.) This hybrid approach moves sc.surbl.org much closer towards the behavior of a number-based approach, though domains will still need that initial report, whereas a numbered list would catch the entire server by its IP address.
Of course a downside of using numbers is that they can false positive any legitimate domains that happen to be hosted on the same IP address as a spam site. That could be disasterous for a large web hosting company that had one bad apple. That's another major reason why we went with names and not numbers. Numbers can be overly broad, whereas names are highly specific to the advertised site. To us names are a finer tool: if 30% of the domains on a given IP address are used by spammers, we could list all of them and not affect the 70% non-spam domains that unfortunately happen to share the same IP address. That specificity is a strong benefit of using domain names.
The SpamAssassin 3 plugin URIDNSBL now has support for SURBL using its urirhsbl and urirhssub commands (see Quick Start and Usage sections), so that release of SA is covered.
Devin Carraway has written a plugin for the Perl-based MTA qpsmtpd to compare domains from message body URIs to SURBL domain lists. Here's his announcement on perl.qpsmtpd, and a link to his uribl plugin. Devin's was the first MTA implementation.
Exim, sendmail, postfix, qmail, qmail-ldap and Exchange programs and plugins have been written to support SURBLs in those MTA. (Please see the Links and News pages for more information.) Support of SURBL directly in other MTAs would also be useful. MTA support in general can be a very good thing, since it can reject messages directly back to the sending server. MTA support would have the added advantage of pre-empting other, more expensive processing of messages, for example in SpamAssassin.
And we are getting nearly daily reports from people writing SURBL support into various and sundry mail handling programs, which shows there is still code to write. :-)