阿里云容器服务网络性能测试

    xiaoxiao2026-02-26  8

    背景

    我们正在将测试环境逐步从mesos框架迁移到阿里云的容器服务,在此过程中测试了四种不同的服务之间相互访问的模式的网络性能。本文阐述了该性能测试的方法,数据和结论。

    测试目标

    服务容器之间的交互存在四种不同的访问方式:

    docker link

    docker提供的link方式,可以在一个容器中访问另一个容器。 测试host为app-link。

    hostname

    在编排服务时,为每一个服务指定一个hostname,在其他服务中可以使用hostname访问对应的服务。测试host为app。

    服务发现

    容器服务基于HAProxy实现的一套服务间HTTP访问和负载均衡的机制。测试host为app.local。

    SLB

    传统的负载均衡服务,有HTTP和TCP两种监听模式。测试中HTTP SLB host为192.168.1.10,TCP SLB host为192.168.1.40。

    分别通过这四种网络访问目标机器上的http服务接口/ok并测试延迟(latency)和吞吐量(throughput)。

    测试工具

    httping:测试延迟ab(apache benchmarking tool):测试吞吐量

    ubuntu 下安装:

    apt-get install httping apt-get install apache2-utils

    延迟测试:

    link: httping -c100 -i 0.01 -g 'http://app-link/ok' ... connected to 172.18.1.3:80 (121 bytes), seq=96 time=0.45 ms connected to 172.18.1.3:80 (121 bytes), seq=97 time=0.46 ms connected to 172.18.1.3:80 (121 bytes), seq=98 time=0.42 ms connected to 172.18.1.3:80 (121 bytes), seq=99 time=1.63 ms --- http://app-link/ok ping statistics --- 100 connects, 100 ok, 0.00% failed, time 1050ms round-trip min/avg/max = 0.4/0.5/1.6 ms hostname httping -c100 -i 0.01 -g 'http://app/ok' ... connected to 172.18.1.3:80 (121 bytes), seq=96 time=10.80 ms connected to 172.18.1.3:80 (121 bytes), seq=97 time=0.43 ms connected to 172.18.1.3:80 (121 bytes), seq=98 time=0.44 ms connected to 172.18.1.3:80 (121 bytes), seq=99 time=0.46 ms --- http://app/ok ping statistics --- 100 connects, 100 ok, 0.00% failed, time 1073ms round-trip min/avg/max = 0.4/0.7/10.8 ms 服务发现 httping -c100 -i 0.01 -g 'http://app.local/ok' ... connected to 172.18.1.2:80 (219 bytes), seq=96 time=0.69 ms connected to 172.18.1.2:80 (219 bytes), seq=97 time=0.67 ms connected to 172.18.1.2:80 (219 bytes), seq=98 time=0.74 ms connected to 172.18.1.2:80 (219 bytes), seq=99 time=0.65 ms --- http://app.local/ok ping statistics --- 100 connects, 100 ok, 0.00% failed, time 1090ms round-trip min/avg/max = 0.6/0.9/6.0 ms HTTP SLB httping -c100 -i 0.01 -g 'http://192.168.1.10/ok' ... connected to 192.168.1.10:80 (140 bytes), seq=96 time=1.19 ms connected to 192.168.1.10:80 (140 bytes), seq=97 time=1.08 ms connected to 192.168.1.10:80 (140 bytes), seq=98 time=1.15 ms connected to 192.168.1.10:80 (140 bytes), seq=99 time=1.30 ms --- http://192.168.1.10/ok ping statistics --- 100 connects, 100 ok, 0.00% failed, time 1123ms round-trip min/avg/max = 1.0/1.2/2.9 ms TCP SLB httping -c100 -i 0.01 -g 'http://192.168.1.40/ok' ... connected to 192.168.1.40:80 (121 bytes), seq=96 time=1.18 ms connected to 192.168.1.40:80 (121 bytes), seq=97 time=1.25 ms connected to 192.168.1.40:80 (121 bytes), seq=98 time=1.06 ms connected to 192.168.1.40:80 (121 bytes), seq=99 time=1.34 ms --- http://192.168.1.40/ok ping statistics --- 100 connects, 100 ok, 0.00% failed, time 1137ms round-trip min/avg/max = 1.0/1.3/2.7 ms

    测试结果

    测试了100次HEAD请求,平均时延如下表:

    访问方式延时(ms)docker link0.5hostname0.7服务发现0.9HTTP SLB1.2TCP SLB1.3

    吞吐量测试:

    link ab -lkc 10000 -n 10000 'http://app-link/ok' Concurrency Level: 10000 Time taken for tests: 0.864 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 10000 Total transferred: 2020000 bytes HTML transferred: 610000 bytes Requests per second: 11571.74 [#/sec](mean) Time per request: 864.174 [ms](mean) Time per request: 0.086 [ms](mean, across all concurrent requests) Transfer rate: 2282.71 [Kbytes/sec] received hostname ab -lkc 10000 -n 10000 'http://app/ok' Concurrency Level: 10000 Time taken for tests: 1.055 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 10000 Total transferred: 2020000 bytes HTML transferred: 610000 bytes Requests per second: 9476.49 [#/sec](mean) Time per request: 1055.243 [ms](mean) Time per request: 0.106 [ms](mean, across all concurrent requests) Transfer rate: 1869.39 [Kbytes/sec] received 服务发现 ab -lkc 10000 -n 10000 'http://app.local/ok' Concurrency Level: 10000 Time taken for tests: 4.276 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 10000 Total transferred: 3000000 bytes HTML transferred: 610000 bytes Requests per second: 2338.60 [#/sec](mean) Time per request: 4276.066 [ms](mean) Time per request: 0.428 [ms](mean, across all concurrent requests) Transfer rate: 685.14 [Kbytes/sec] received HTTP SLB ab -lkc 10000 -n 10000 'http://192.168.1.10/ok' Concurrency Level: 10000 Time taken for tests: 6.308 seconds Complete requests: 10000 Failed requests: 0 Non-2xx responses: 580 Keep-Alive requests: 10000 Total transferred: 2141800 bytes HTML transferred: 732380 bytes Requests per second: 1585.41 [#/sec](mean) Time per request: 6307.517 [ms](mean) Time per request: 0.631 [ms](mean, across all concurrent requests) Transfer rate: 331.60 [Kbytes/sec] received

    同时10000个并发请求,有580个请求SLB nginx返回了504错误,以下为response。

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html> <head><title>504 Gateway Time-out</title></head> <body bgcolor="white"> <h1>504 Gateway Time-out</h1> <p>The gateway did not receive a timely response from the upstream server or application.</body> </html> WARNING: Response code not 2xx (504) TCP SLB ab -lkc 10000 -n 10000 'http://192.168.1.40/ok' Concurrency Level: 10000 Time taken for tests: 1.891 seconds Complete requests: 10000 Failed requests: 0 Keep-Alive requests: 10000 Total transferred: 2020000 bytes HTML transferred: 610000 bytes Requests per second: 5287.14 [#/sec](mean) Time per request: 1891.383 [ms](mean) Time per request: 0.189 [ms](mean, across all concurrent requests) Transfer rate: 1042.97 [Kbytes/sec] received

    测试结果

    并发请求10000次,每秒处理的请求数如下表:

    访问方式吞吐量(RPS)docker link11571.74hostname9476.49服务发现2338.60HTTP SLB1585.41TCP SLB5287.14

    结论:

    从数据上看,使用docker link机制访问服务,无论是延迟和吞吐量都是最好的,hostname方式其次同是 HTTP 模式的服务发现和HTTP SLB,性能最差HTTP SLB的并发性能似乎并不理想,10000个请求有5.8%的请求返回了504 Gateway错误

    问题:

    虽然docker link和hostname网络性能最佳,但不清楚其负载能力如何。测试中我们发现hostname方式是具有负载能力的,不过在官方帮助文档中,hostname访问方式被放在『不具备负载均衡能力的访问方式』中,而且被描述为『能做到一定的负载均衡的作用』。可见阿里云并没有强调其负载能力。如果在生产环境中使用,负载均衡能力也是相当重要的一个指标。

    最后,这两个问题还需要向阿里云进一步确认:

    link和hostname模式的负载能力到底如何?对于具备负载衡量能力、HTTP模式,内网使用这三个要求,是否有推荐使用的网络模式?
    最新回复(0)