unbound imagination - a blog about programming

Hiding e-mails from bots but not users

May 04, 2008

As most of you have probably seen somewhere on the Internet, web sites have started showing e-mails with the period and at-sign replaced with "[dot] and [at]" or "(dot) and (at)" or some derivation of that.

Anyway, I recently came up with a compromise that both protects e-mails from bot scraping but still shows the e-mail normally to the user. Using the Behaviour javascript library and PrototypeJS, I simply replace any span marked with the email class with its appropriate human usable link.

Imagine we had this in our source:

   1  <span class="email">joesmith <em>[at]</em> gmail <em>[dot></em> com</span>

and we used this Javascript to change it to something more user-friendly:

   1  var email_rules = {
   2    'span.email': function(el) {
   3      var email = new String(el.innerHTML).stripTags().strip();
   4      var hasPeriod = false;
   5      if (email.endsWith(".")) { hasPeriod = true; }
   6      // change this expression to match your e-mail hiding scheme
   7      email = email.sub(" \\[at\\] ", "@").sub(" \\[dot\\] ", ".").sub("\\.$", "");
   8      el.innerHTML = '<a href="mailto:' + email + '">' + email + "</a>" + (hasPeriod ? ".":"");
   9    }
  10  };
  11  
  12  Behaviour.register(email_rules);

Which will change what the user sees to

   1  <a href="mailto:joesmith@gmail.com">joesmith@gmail.com</a>

And that's it! Bots see the original and users with javascript enabled see the real email as a link.

Update: Considering bots have added things like [at] and [dot] to their list of things to search for, this trick doesn't really work. But nonetheless, using the Behaviour library to add usability/functionality to your UI is a cool trick that comes up handy quite often.

Comments

posted by JamieD on 05/05/08 10:16 PM PDT

Unfortunately spam bots know how to read emails when they are written as name [at] domain [dot] com so this will not protect your email address very well.

posted by onurgu on 05/06/08 06:39 AM PDT

besides the possibility of a javascript aware crawler, do you think that crawlers didn't recognize [at] [dot] scheme already?

looking from the point of a crawler programmer, what I would do is to grep the visible text portion of each element for [at] [dot] scheme.

this would definitely yield some faulty addresses but anyway who cares when we talk about millions of email addresses?

posted by Arya Asemanfar on 05/06/08 09:14 AM PDT

You guys are right. It doesn't make sense that bots wouldn't search for both the @ and the word "at" and "dot". I wonder why some sites still do the whole at and dot thing.

Anyway, even if bots can read it I still think using the Behaviour library to add functionality/usability to the UI is a useful trick.


Leave a Comment