Blog

Defending Aganist Spambots - CAPTCHAs

CAPTCHA is a backronym for “Completely Automated Public Turing test to tell Computers and Humans Apart” and is generally the bane of any user trying to submit a public form. The concept involves displaying an image containing characters and has the human retype the characters into a text box. Computers are supposed to not be able to understand the characters in the image while humans can easily understand the characters.

This worked well in 1997, when the concept was developed, but the advances in image processing have required the images to become more and more obscured from simple plain text. Adding colors, lines, as well as messing with the shapes of the letters obscure image processing applications from detecting the text. This obscurity also makes it challenging for anyone with visual impairments to read the text and get the CAPTCHA response correct.

The user experience issues make CAPTCHAs make them an undesirable solution to spambots, but one that can be implemented when the other solutions are inadequate. UX focused sites often use CAPTCHA only in situations where other protections have returned multiple failures though the system does not want to prevent a potentially legitimate user from accessing the material. These are situations like password resets, login screens, account creations and search pages.

 Integration of a CAPTCHA solution involves either integrating a third-party source into your form, or generating the images yourself. Generating the images locally via an image manipulation library sounds like a good, cheap method for using implementing CAPTCHA , however there has been a significant effort placed on defeating the protection and everything you can think of doing to the text to prevent analysis, but still be readable by a human has been reverse-engineered. Good CAPTHCHA solutions test their image database against the best tools on a regular basis, eliminating those images defeated by the analysis tools. Consequently, homebrew CAPTCHA’s are often little better than having no protection while providing a noticeable depredating in the user experience.

Integrating a third-party solution generally involves embedding a JavaScript in your form which fetches the image and a unique ID code from the provider’s servers. The user then provides you with the plain text version and you check this, along with the image ID code which was submitted as a hidden form field, with the provider to get a pass or failure response. All of the good CAPTCHA providers have nice clear documentation about this process and attempt to make it as easy as possible to integrate their solution.

I have avoided CAPTCHAs primarily due to the poor user experience factor. Different combinations of the other methods, especially the hashcash and the Bayesian analysis have provided good protection so far.