Facebook started rolling out to Graph Search the ability to search for posts, and according to its engineering blog, indexes over one trillion posts.
To put it into a better perspective, that’s around 700 terabytes of data.
Engineer Ashoat Tevosyan shared some of the technical challenges with adding posts to Graph Search.
Currently, Facebook indexes 70 different kinds of data, and the data is collected into MySQL databases.
Now, for the really geeky part:
We store our harvested data in an HBase cluster, from which we execute Hadoop map-reduce jobs to build our index in a highly parallel process. This index-building process converts the raw data into a search index that works with Unicorn, our search infrastructure. We separate the data into two parts – the document data and the inverted index. The document data for each post contains information that will later be used for ranking results. The inverted index contains what is traditionally considered to be a search index, and building the inverted index requires going through each post and determining which hypothetical search filters match.
Instead of serving the indexes entirely from RAM which would require a huge overhead, Facebook stores most of them on solid-state drives.
However, the most frequently accessed data is stored on RAM, ensuring it’s delivered as quickly as possible.
Here’s how Facebook ranks or sorts posts:
To surface content that is valuable and relevant to the user, we use two primary techniques: query rewriting and dynamic result scoring. Query rewriting happens before the execution of the query, and involves tacking on optional clauses to search queries that bias the posts we retrieve towards results that we think will be more valuable to the user. Result scoring involves sorting and selecting documents based on a number of ranking “features,” each of which is based on the information available in the document data. In total, we currently calculate well over a hundred distinct ranking features that are combined with a ranking model to find the best results.
While we all expect things to just work, it’s no short of impressive just how much engineering goes on behind a platform like Facebook.