This page illustrates a simple yet effective technique to hide an email address from spam webrobots.
The problem of spam
Spam - the unsolicited, mass-mailed email messages such as commercial ads, Make Money Fast letters, scams, and all other kinds of junk - has become a real plague of the Internet, as it is estimated that the vast majority of all sent mails nowadays are spam.
Spammers have several ways to collect valid email addresses, and one of their sources is Web pages. They use spamrobots that crawl over all the web pages they can reach, scanning them for email addresses, just like webspiders do in order to collect data for search engines. Any email address published on a webpage risks being collected and used as a target for mass mailing.
Here is explained a simple way to prevent this to happen.
Address obfuscation to confuse spam robots
Sometimes webmasters obfuscate email addresses in order to make them invisible to spambots, or to have the spambot pick up an invalid address. The most used techniques are:
1. Simple CAPTCHAExamples:
This technique has several disadvantages: is not standard-compliant (the advertised email address is effectively invalid), inaesthetic, annoying for the visitor which needs to edit the address by hand before sending a mail, error-prone (less tech-savvy visitor might send mail to the wrong address), and may be ineffective as some spambots are now able to parse correctly such addresses.
2. Advanced CAPTCHAExamples:
email@example.com (please remove all numbers)
user.domain.com (please replace first dot with @)
Unlike the previous technique, this protection is very effective but still bears all the other disadvantages.
This solution is effective but still non-standard as it makes the address invisible if the visitor is using a text-only browser or chose not to load remote images. Also, it makes impossible to add a mailto link, as it would thwart the protection.
An elegant, effective, and full-standard technique exists instead, and relies in the HTML standard:
5. HTML encodingAny HTML character can be expressed by its numeric reference or entity. I.e., any character can be specified as &#n;, where n is the Unicode character value (this value mostly coincides with the ASCII 7-bit value). Therefore A in the source of a HTML page shows an a when the page is visualized into a browser.
Hence, to hide an email address, one can just replace one or more letters with their numeric reference, e.g. firstname.lastname@example.org may be transformed into user@domAin.com. From the point of view of human visitors this makes no difference. However, because spambots parse the HTML source of the page, they pick up the obfuscated (munged) address -- which is invalid.
This idea was suggested by Liam Quinn of
HTML Help which
posted it on comp.infosystems.www.authoring.html. As it is based on HTML language
specifications (RFC 1866),
it works with any browser and any OS.
It has been argued that it would be easy to program a spam robot to re-convert encodings into text again, hence thwarting the protection offered by munging. This is true in theory, but in practice spam robots never do this (not yet anyway). Taking the time to parse the text and to interpret all the encodings would quite slow down the spam robot; as the majority of email addresses published on the Internet is in plaintext format, from the spammer's point of view it's simply not worth the trouble.
A research report ("A CDT Report on Origins of Spam") published in 2003 by the U.S. Center for Democracy and Technology confirmed at that time that this technique was effective.
TestingTo verify the effectiveness of HTML munging, I ran an informal empiric test over 15 years.
Two email accounts (mailing lists) were opened on the Yahoo! Groups platform and published here (on November 8, 2004):
Although they look the same, the first one is in plaintext and the second one is obfuscated by munging; you can check that easily via the command View Page Source of your browser. The two login names were chosen long enough so the odds that they could be found by a generator of random email addresses (a common spammer tool) were low.
No message was ever sent from these addresses. However, just because they were publicly visible on the World Wide Web, they started (on November 27, 2004) receiving spam.
Here are the figures of the received spam per month over the whole 15-year period, from 2005 to 2019.
(The test ended on December 2019 following the changes on the Yahoo! Groups platform.)
The spam received by the plaintext address is marked in red, and the spam received by the obfuscated address is marked in green:
Click on the image to view the full-size version. Monthly figures are available in this PDF.
The test showed that munging reduces the amount of spam mail received by a great amount -- from 90% up to 100% yearly.
The toolNowadays there are several websites devoted to email munging (much more complete and detailed than this page) and offering scripts that automatically obfuscate an email address.
- Enter your email address in the field, then click the Mung button. If the Link checkbox is selected, a mailto link to your email address will be included within the code.
- Select and copy the code that appears in the field below: this is your munged email address.
- Paste the code in the HTML source of your webpages, wherever it is needed.
This script runs client-side; all it does is to convert your email address on your local machine. Your email address is not transmitted to anyone, recorded, or used in any other way.
(the same included at the top of this page), a Java applet, and a Perl script;
you can download it from here.
A Python script is also available.
To run the tool on your local machine or your web site, unzip the archive in a directory of your choice and use the HTML files included.
This page was inspired by Michael Fleming's webpage MAILTO: Munging (dead link).
The Email Munger has been cited by: "L'acchiappavirus" by Paolo Attivissimo (pag. 189), WebJuice, Web Design Guide, The Center For Civic Engagement at UTB/TSC, Nadeau software consulting, the documentation for the mungeMailAddress() function in the Seagull PHP framework, IS Web Designs, and in the University of East Anglia IT Faqs.