Facebook Distributed System Optimization via Asynchronization — Big Data Queries

Posted on December 20, 2015 by admin

Asynchronization Query

Asynchronization Query

Each query is asynchronous, all functions return “Future Object”
Every DB query are divided into two parts

Set request
Receive response

Future Object tree

Each future object has two states

Waiting for execution
Finished execution

Once the tree structure is constructed, the execution will start from the bottom to the root.

When the root finishes execution, it means the page loading is completed.

Lazy manner

The execution process is lazy, since it first construct the trees and then execute. This is similar with Spark map-reduce, the functions forms a DAG structure, only when a node is being needed will its predecessor be executed.

Memcache

In terms of the problem of which query should be executed first, it should not be depended in the coding process.

But, there should be an extra phase to determine such kind of schedule.

Importance of which query to be executed first

“比如我们现在有两个查询需求。一个是查询你在淘宝上买过东西的朋友，另一个是查询你在淘宝上买过保时捷的朋友。常理来说，我们会先想到查询你在淘宝上的朋友，再进行另一个条件的查询，比如这样：”
```
IdList friends = waitFor(getFriends(myId));
yield return getTaoBaoBuyers(friends);
```
但是对于保时捷这个查询而言，这是不对的，因为淘宝上买保时捷的人是很少的，可能就一两个，而淘宝上的好友数可能有上百。因此保时捷的查询应该是这个次序比较优化：
```
IdList buyers = waitFor(getPorscheBuyer());
yield return getFriends(buyers);
```

Reference
[1] http://www.infoq.com/cn/news/2015/04/async-distributed-haiping

Leave a Reply Cancel reply

You must be logged in to post a comment.