Hiding e-mails from bots but not users
May 04, 2008As most of you have probably seen somewhere on the Internet, web sites have started showing e-mails with the period and at-sign replaced with "[dot] and [at]" or "(dot) and (at)" or some derivation of that.
Anyway, I recently came up with a compromise that both protects e-mails from bot scraping but still shows the e-mail normally to the user. Using the Behaviour javascript library and PrototypeJS, I simply replace any span marked with the email class with its appropriate human usable link.
Imagine we had this in our source:
1 <span class="email">joesmith <em>[at]</em> gmail <em>[dot></em> com</span>
and we used this Javascript to change it to something more user-friendly:
1 var email_rules = { 2 'span.email': function(el) { 3 var email = new String(el.innerHTML).stripTags().strip(); 4 var hasPeriod = false; 5 if (email.endsWith(".")) { hasPeriod = true; } 6 // change this expression to match your e-mail hiding scheme 7 email = email.sub(" \\[at\\] ", "@").sub(" \\[dot\\] ", ".").sub("\\.$", ""); 8 el.innerHTML = '<a href="mailto:' + email + '">' + email + "</a>" + (hasPeriod ? ".":""); 9 } 10 }; 11 12 Behaviour.register(email_rules);
Which will change what the user sees to
1 <a href="mailto:joesmith@gmail.com">joesmith@gmail.com</a>
And that's it! Bots see the original and users with javascript enabled see the real email as a link.
Update: Considering bots have added things like [at] and [dot] to their list of things to search for, this trick doesn't really work. But nonetheless, using the Behaviour library to add usability/functionality to your UI is a cool trick that comes up handy quite often.
Comments
Unfortunately spam bots know how to read emails when they are written as name [at] domain [dot] com so this will not protect your email address very well.
besides the possibility of a javascript aware crawler, do you think that crawlers didn't recognize [at] [dot] scheme already?
looking from the point of a crawler programmer, what I would do is to grep the visible text portion of each element for [at] [dot] scheme.
this would definitely yield some faulty addresses but anyway who cares when we talk about millions of email addresses?
You guys are right. It doesn't make sense that bots wouldn't search for both the @ and the word "at" and "dot". I wonder why some sites still do the whole at and dot thing.
Anyway, even if bots can read it I still think using the Behaviour library to add functionality/usability to the UI is a useful trick.
Leave a Comment