Google index gets a Caffeine boost

Search engines are racing to incorporate social media. Last week, Bing announced that it would start adding realtime updates from Twitter and Facebook to its search results. Similarly, Google also announced changes to its infrastructure, which are designed to improve the freshness and visibility of social media in search results. Google’s realtime upgrade is arguably more fundamental, as it is premised on a two-year redesign of the underlying index.

Previously, Google separated its index into layers, which were updated at different speeds. The main layer was usually updated every two weeks. Known as Caffeine, the new system separates the index into billions of ‘batches’, each of which refer to specific websites. Rather than analysing the web in larger layers, Caffeine now trawls content in smaller batches. This enables Google to update its index on a continuous and global basis.

As a result, the underlying index is constantly in flux; it is estimated that Caffeine now analyses hundreds of thousands of web pages every second. According to the official announcement on the Google Blog: “If the Google index were a pile of paper, it would grow three miles taller every second”. All of which has required a significant investment in new hardware and software. For example, Caffeine is built on the new Google File System (dubbed GFS2), which processes and stores content from the web with reduced latency. In turn, GFS2 resides within more powerful custom-built server hardware, which are believed to include new innovations such as solid state hard drives.

In sum, the race to analyse and index the growing torrent of social media is a challenging and expensive process – but one that will increasingly differentiate the web services of the future. The pressure to do so will escalate as more of our digital attention shifts to mobile devices, where recommendations and social search are taking over from straightforward search. Compared to its relatively static predecessor, the realtime web will require companies to nurture a very different set of technologies and skills if they are to remain visible.

Leave a comment