tensorflow-serving-benchmark

I couldn't find any benchmarks of TensorFlow Serving and wanted to know the throughput of the native gRPC client compared to a REST-based client that forwards requests to the a gRPC client (which most people are likely to use for external-facing services).

Updated to run with TF 2.3.

Usage and Results

gRPC (async)

docker-compose run grpc-benchmark

Creating tensorflowservingbenchmark_server_1 ... done

10000 requests (10 max concurrent)
2487.13826442827 requests/second

FastAPI (uvicorn, four workers, async)

docker-compose run fastapi-benchmark

Server Software:        uvicorn
Server Hostname:        fastapi-client
Server Port:            8002

Document Path:          /prediction
Document Length:        31 bytes

Concurrency Level:      10
Time taken for tests:   2.417 seconds
Complete requests:      10000
Failed requests:        0
Non-2xx responses:      10000
Total transferred:      1910000 bytes
HTML transferred:       310000 bytes
Requests per second:    4137.41 [#/sec] (mean)
Time per request:       2.417 [ms] (mean)
Time per request:       0.242 [ms] (mean, across all concurrent requests)
Transfer rate:          771.72 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    2   1.1      2      11
Waiting:        0    2   0.9      2      11
Total:          1    2   1.1      2      12

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      3
  75%      3
  80%      3
  90%      4
  95%      4
  98%      5
  99%      6
 100%     12 (longest request)

WSGI (gunicorn/falcon, four workers, sync)

docker-compose run wsgi-benchmark

Starting tensorflowservingbenchmark_server_1 ... done
Creating tensorflowservingbenchmark_wsgi-client_1 ... done
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking wsgi-client (be patient)
...

Server Software:        gunicorn/19.7.1
Server Hostname:        wsgi-client
Server Port:            8000

Document Path:          /prediction
Document Length:        12 bytes

Concurrency Level:      10
Time taken for tests:   7.116 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1790000 bytes
HTML transferred:       120000 bytes
Requests per second:    1405.31 [#/sec] (mean)
Time per request:       7.116 [ms] (mean)
Time per request:       0.712 [ms] (mean, across all concurrent requests)
Transfer rate:          245.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     2    7  25.4      6     953
Waiting:        2    7  25.4      6     953
Total:          3    7  25.4      6     954

Percentage of the requests served within a certain time (ms)
  50%      6
  66%      7
  75%      7
  80%      7
  90%      8
  95%      8
  98%     10
  99%     11
 100%    954 (longest request)

Tornado (four workers, async)

docker-compose run tornado-benchmark

Starting tensorflowservingbenchmark_server_1 ... done
Starting tensorflowservingbenchmark_tornado-client_1 ... done
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking tornado-client (be patient)
...


Server Software:        TornadoServer/4.5.3
Server Hostname:        tornado-client
Server Port:            8001

Document Path:          /prediction
Document Length:        15 bytes

Concurrency Level:      10
Time taken for tests:   19.140 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2100000 bytes
HTML transferred:       150000 bytes
Requests per second:    522.47 [#/sec] (mean)
Time per request:       19.140 [ms] (mean)
Time per request:       1.914 [ms] (mean, across all concurrent requests)
Transfer rate:          107.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:     3   19  15.2     17     447
Waiting:        3   19  15.2     16     447
Total:          3   19  15.2     17     447

Percentage of the requests served within a certain time (ms)
  50%     17
  66%     21
  75%     23
  80%     25
  90%     31
  95%     36
  98%     44
  99%     50
 100%    447 (longest request)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
client-code		client-code
server-code		server-code
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client-code

client-code

server-code

server-code

.gitignore

.gitignore

README.md

README.md

docker-compose.yml

docker-compose.yml

Repository files navigation

tensorflow-serving-benchmark

Usage and Results

gRPC (async)

FastAPI (uvicorn, four workers, async)

WSGI (gunicorn/falcon, four workers, sync)

Tornado (four workers, async)

About

Releases

Packages

Languages

rjulius23/tensorflow-serving-benchmark

Folders and files

Latest commit

History

Repository files navigation

tensorflow-serving-benchmark

Usage and Results

gRPC (async)

FastAPI (uvicorn, four workers, async)

WSGI (gunicorn/falcon, four workers, sync)

Tornado (four workers, async)

About

Resources

Stars

Watchers

Forks

Languages