Google plans realtime index

At the recent Search Marketing Expo in Santa Clara, California, Google announced plans for a new system of realtime indexing. With the exception of direct feeds from social media websites such as Facebook and Twitter, the Google index is currently updated by ‘crawling’ spiders. The frequency of that process depends on the popularity of a given website; but for smaller publishers, crawling usually means a delay of several days between the publication of content and its eventual appearance in the Google index.

What is potentially so significant about Google’s recent announcement is that any content, from any website, regardless of size, could in theory be submitted to the Google index within seconds of publication. Google will use the PubSubHubbub protocol (PuSH), a syndication system based on the Atom format. This system will allow publishers to update a network of hubs with their content updates, which are then fed to subscribers (such as Google) in realtime. Because it is an open protocol, PuSH will also yield benefits for the wider search community, including Google’s principal competitors, Bing and Yahoo.

PuSH will also have important ramifications for the future of social media: for example, updates on corporate blogs could be ingested into search indexes far more quickly than at present. It will effectively reduce the lag time between the incidence of news and its coverage on the web to seconds. For communicators, the inexorable shift towards a more efficient, realtime web presents a unique set opportunities and threats.

On the one hand, it will be possible for communicators to play a more active and responsive role in online news and comment. This will have benefits for the continuity and breadth of business news coverage. On the other hand, a torrent of realtime updates from across the world will also make it harder for communicators to manage the profile of the company. Indeed, an unintended consequence of PuSH is that mistakes, as well as deliberate misinformation, will become even more visible within search engines.

The challenge for Google and others will be to find a balance between the efficiency of realtime indexing and the danger of an open door system, which risks polluting indexes with irrelevant results, as well as spam or even links to malware. Given these limits, it is likely that the realtime approach announced by Google will have to live alongside more traditional forms of crawling and indexing for some time yet.

Leave a comment