Facebook Cassandra

Introduction

Cassandra is an open source distributed database system that is designed for storing and managing large amounts of data across commodity servers.

Provide scalability and high availability without compromising performance
Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platfrom for mission critical data.
Support replicating across multiple datacenters and providing low latency for your users
Offer the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.

Each node communicates with each other through the Gossip protocol, which exchanges information across cluster every second
A commit log is used on each node to capture write activity
Data durability is assured
Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable)

Since it uses peer to peer distributed systems, it is easy to add nodes to increase from TB to PB

Since it column index, it has dynamic schema design to allows for much more flexibile data storage

Facebook used to use Cassandra for inbox search. Now they are using HBase for the same.

Reference