Presco原理解析

整体架构

image.png | left | 747x543

大致分为几个部分:

  • Client 客户端,使用presto-client 打包的jar可执行文件
    • 常见用法: ./presto-cli.jar --server localhost:8080 --catalog mongodb --schema user --debug
  • Coordinator节点,做一些调度工作;
  • Discovery service节点,协助Coordinator节点和Worker节点互相发现的服务;
  • Worker节点,干活的节点。比如表扫描等工作;
  • Connector plugin。插件化的组件,可以通过为Postgresql、Mysql等写对应的Connector实现支持presto从这些数据库中读写数据,官方已经维护了一些主流的Connector;

更具体的Connector连接图:

image.png | left | 747x557

几种典型的部署架构图

单机版–Coordinator、Worker、Discovery Service于一身

image.png | left | 747x555

多个Worker版

image.png | left | 747x552

将Discovery server独立出去的版本

image.png | left | 747x539

高可用增强版

image.png | left | 747x553

执行流程

各个节点互相连接

image.png | left | 747x507

客户端发送请求给Coordinator

image.png | left | 747x520

Coordinator根据请求去找对应的Connector plugin拿metadata

image.png | left | 747x497

Coordinator生成执行计划(可以通过expain查看),调度worker

image.png | left | 747x505

Worker拿到任务了,找Connector plugin去读数据源

image.png | left | 747x510

Worker所有任务都在内存中执行(内存计算型)

image.png | left | 747x482

client从worker拿到结果

image.png | left | 747x476

支持多种Connector带来的好处

image.png | left | 747x537

image.png | left | 747x531

目前测试跨库数据迁移,也就是 create table DB_A.foo as select * from DB_B.bar 只在MongoDB中成功了,其他的都会触发异常,具体异常代码在 presto/presto-main/src/main/java/com/facebook/presto/split/PageSinkManager.java ,也就是Connector必须要实现 ConnectorPageSinkProvider 这个接口。

查询计划相关

执行模型-和MapReduce相比较

image.png | left | 747x440

image.png | left | 747x506

查询计划示例

image.png | left | 747x517

Presto会分为多个Stage

image.png | left | 747x508

并分到多个worker中去

image.png | left | 747x521

每个Stage又分为多个Tasks

image.png | left | 747x520

每个Task又分为多个Split

image.png | left | 747x525

总结

image.png | left | 747x417

再来一个例子

1
select count(*) from foo;

在两个Worker的例子中:

看一下执行计划:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
presto:default> EXPLAIN ANALYZE VERBOSE select count(*) from mongodb.foo.bar;

Query 20180518_083543_00010_mj5ce, RUNNING, 2 nodes, 35 splits
http://localhost:8080/ui/query.html?20180518_083543_00010_mj5ce
Splits: 0 queued, 35 running, 0 done
CPU Time: 1.4s total, 189K rows/s, 735B/s, 12% active
Per Node: 0.0 parallelism, 8.72K rows/s, 33B/s
Parallelism: 0.1
0:15 [ 263K rows, 1KB] [17.4K rows/s, 67B/s] [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 0%

STAGES ROWS ROWS/s BYTES BYTES/s QUEUED RUN DONE
0.........R 0 0 0B 0B 0 17 0
1.......R 0 0 0B 0B 0 17 0
2.....R 263K 17.4K 1K 67B 0 1 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
presto:default> EXPLAIN ANALYZE VERBOSE select count(*) from mongodb.foo.bar;
Query Plan
-------------------------------------------------------------------------------------------------------
Fragment 1 [SINGLE]
CPU: 3.24ms, Input: 1 row (9B); per task: avg.: 1.00 std.dev.: 0.00, Output: 1 row (9B)
Output layout: [count]
Output partitioning: SINGLE []
Execution Flow: UNGROUPED_EXECUTION
- Aggregate(FINAL) => [count:bigint]
CPU fraction: 100.00%, Output: 1 row (9B)
Input avg.: 1.00 rows, Input std.dev.: 0.00%
count := "count"("count_3")
- LocalExchange[SINGLE] () => count_3:bigint
CPU fraction: 0.00%, Output: 1 row (9B)
Input avg.: 0.06 rows, Input std.dev.: 387.30%
- RemoteSource[2] => [count_3:bigint]
CPU fraction: 0.00%, Output: 1 row (9B)
Input avg.: 0.06 rows, Input std.dev.: 387.30%

Fragment 2 [SOURCE]
CPU: 5.46s, Input: 955758 rows (0B); per task: avg.: 955758.00 std.dev.: 0.00, Output: 1 row (9B)
Output layout: [count_3]
Output partitioning: SINGLE []
Execution Flow: UNGROUPED_EXECUTION
- Aggregate(PARTIAL) => [count_3:bigint]
Cost: {rows: ? (?), cpu: ?, memory: ?, network: 0.00}
CPU fraction: 0.08%, Output: 1 row (9B)
Input avg.: 955758.00 rows, Input std.dev.: 0.00%
count_3 := "count"(*)
- TableScan[mongodb:foo.bar, originalConstraint = true] => []
Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
CPU fraction: 99.92%, Output: 955758 rows (0B)
Input avg.: 955758.00 rows, Input std.dev.: 0.00%
LAYOUT: com.facebook.presto.mongodb.MongoTableLayoutHandle@30d07b2d


(1 row)

整体流程: Stage0 + Stage1

image.png | left | 422x1510

分步流程-Stage-0

image.png | left | 747x662

分步流程-Stage-1

image.png | left | 747x651