Google will crawl HTML forms

Posted on April 14th 2008 in Google

Just read this on Webmaster Central Blog - Google is testing the latest approach to content discovery by crawling HTML forms on a selected group of sites deemed to be particularly useful. Googlebot will, upon coming onto an HTMl form, determine whether form method is GET or POST. It will actually proceed only if it’s a GET form, since they want to avoid crawling forms that may require user information input (such as usernames and passwords), which all use POST. The bot will actually “fill in” text fields with words (found on the site), choose options on radio buttons and select menus, and try to crawl resulting content, and index it if it determines it’s useful and hasn’t previously been indexed.

Anyway, the experiment is an effort to try and crawl/index, what has been called “invisible content” or “the invisible internet” for years, which is a previously untapped rich source of information which over the years has stayed hidden behind processes requiring human interaction.

Trackback URI | Comments RSS

Leave a Reply

Close
E-mail It