Skip to content

rjulius23/tensorflow-serving-benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tensorflow-serving-benchmark

I couldn't find any benchmarks of TensorFlow Serving and wanted to know the throughput of the native gRPC client compared to a REST-based client that forwards requests to the a gRPC client (which most people are likely to use for external-facing services).

Updated to run with TF 2.3.

Usage and Results

gRPC (async)

docker-compose run grpc-benchmark

Creating tensorflowservingbenchmark_server_1 ... done

10000 requests (10 max concurrent)
2487.13826442827 requests/second
FastAPI (uvicorn, four workers, async)

docker-compose run fastapi-benchmark

Server Software:        uvicorn
Server Hostname:        fastapi-client
Server Port:            8002

Document Path:          /prediction
Document Length:        31 bytes

Concurrency Level:      10
Time taken for tests:   2.417 seconds
Complete requests:      10000
Failed requests:        0
Non-2xx responses:      10000
Total transferred:      1910000 bytes
HTML transferred:       310000 bytes
Requests per second:    4137.41 [#/sec] (mean)
Time per request:       2.417 [ms] (mean)
Time per request:       0.242 [ms] (mean, across all concurrent requests)
Transfer rate:          771.72 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    2   1.1      2      11
Waiting:        0    2   0.9      2      11
Total:          1    2   1.1      2      12

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      3
  75%      3
  80%      3
  90%      4
  95%      4
  98%      5
  99%      6
 100%     12 (longest request)
 
WSGI (gunicorn/falcon, four workers, sync)

docker-compose run wsgi-benchmark

Starting tensorflowservingbenchmark_server_1 ... done
Creating tensorflowservingbenchmark_wsgi-client_1 ... done
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking wsgi-client (be patient)
...

Server Software:        gunicorn/19.7.1
Server Hostname:        wsgi-client
Server Port:            8000

Document Path:          /prediction
Document Length:        12 bytes

Concurrency Level:      10
Time taken for tests:   7.116 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1790000 bytes
HTML transferred:       120000 bytes
Requests per second:    1405.31 [#/sec] (mean)
Time per request:       7.116 [ms] (mean)
Time per request:       0.712 [ms] (mean, across all concurrent requests)
Transfer rate:          245.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     2    7  25.4      6     953
Waiting:        2    7  25.4      6     953
Total:          3    7  25.4      6     954

Percentage of the requests served within a certain time (ms)
  50%      6
  66%      7
  75%      7
  80%      7
  90%      8
  95%      8
  98%     10
  99%     11
 100%    954 (longest request)
 
Tornado (four workers, async)

docker-compose run tornado-benchmark

Starting tensorflowservingbenchmark_server_1 ... done
Starting tensorflowservingbenchmark_tornado-client_1 ... done
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking tornado-client (be patient)
...


Server Software:        TornadoServer/4.5.3
Server Hostname:        tornado-client
Server Port:            8001

Document Path:          /prediction
Document Length:        15 bytes

Concurrency Level:      10
Time taken for tests:   19.140 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2100000 bytes
HTML transferred:       150000 bytes
Requests per second:    522.47 [#/sec] (mean)
Time per request:       19.140 [ms] (mean)
Time per request:       1.914 [ms] (mean, across all concurrent requests)
Transfer rate:          107.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:     3   19  15.2     17     447
Waiting:        3   19  15.2     16     447
Total:          3   19  15.2     17     447

Percentage of the requests served within a certain time (ms)
  50%     17
  66%     21
  75%     23
  80%     25
  90%     31
  95%     36
  98%     44
  99%     50
 100%    447 (longest request)
 

About

TensorFlow Serving benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.9%
  • Dockerfile 15.1%