References
Tensorflow Official Benchmarks (May 2017, GitHub source): https://www.tensorflow.org/performance/benchmarks
IBM Power9 benchmark results (Nov 2017, 1.4.0): https://developer.ibm.com/linuxonpower/perfcol/perfcol-mldl/
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, Facebook (Jun 2017): https://research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdf
Benchmark Source Code
https://github-dev.cs.illinois.edu/kindrtnk/DL
Official TF Benchmark System Characteristics
- Instance type: NVIDIA® DGX-1™
- GPU: 8x NVIDIA® Tesla® P100
- OS: Ubuntu 16.04 LTS with tests run via Docker
- CUDA / cuDNN: 8.0 / 5.1
- TensorFlow GitHub hash: b1e174e
- Benchmark GitHub hash: 9165a70
- Build Command:
bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
- Disk: Local SSD
- DataSet: ImageNet
- Test Date: May 2017
Our System Characteristics (more details in GitHub Repo)
- Instance type: IBM Power9, 8335-GTG AC922 server
- CPU: 2x 20-core IBM POWER9 CPU @ 2.00GHz
- SDRAM: 512G DDR4
- GPU: 4x NVIDIA® Tesla® V100, 5120 cores, 16 GB HBM 2
- Disk: Local SSD
- OS: Red Hat Enterprise Linux Server release 7.4
- Python Distribution: Anaconda python 3.6.2
- CUDA / cuDNN: 9.1/7.0.5
- TensorFLow Version: 1.5.0
- DataSet: ImageNet (synthetic)
- Precision: floating point 32 and 16
- Test Date: Mar 25 2018
The following table is the result of running with the same configurations as the official Tensorflow benchmark mentioned in "Reference" section above:
base_model | FP_type | batch_size | n_gpus | variable_update | local_parameter_device | image_per_sec | |
---|---|---|---|---|---|---|---|
0 | alexnet | 16 | 512 | 1 | replicated | N/A | 8092.12 |
1 | alexnet | 16 | 512 | 2 | replicated | N/A | 15289.09 |
2 | alexnet | 16 | 512 | 4 | replicated | N/A | 25295.36 |
3 | alexnet | 32 | 512 | 1 | replicated | N/A | 4692.38 |
4 | alexnet | 32 | 512 | 2 | replicated | N/A | 8993.17 |
5 | alexnet | 32 | 512 | 4 | replicated | N/A | 15868.53 |
6 | inception3 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 423.90 |
7 | inception3 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 823.06 |
8 | inception3 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 1561.19 |
9 | inception3 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 245.36 |
10 | inception3 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 472.07 |
11 | inception3 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 918.71 |
12 | resnet152 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 283.59 |
13 | resnet152 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 531.70 |
14 | resnet152 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 975.86 |
15 | resnet152 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 148.61 |
16 | resnet152 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 287.02 |
17 | resnet152 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 567.19 |
18 | resnet50 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 618.52 |
19 | resnet50 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 1174.06 |
20 | resnet50 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 2294.24 |
21 | resnet50 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 364.54 |
22 | resnet50 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 719.03 |
23 | resnet50 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 1402.16 |
24 | vgg16 | 16 | 64 | 1 | replicated | N/A | 405.58 |
25 | vgg16 | 16 | 64 | 2 | replicated | N/A | 766.43 |
26 | vgg16 | 16 | 64 | 4 | replicated | N/A | 1264.35 |
27 | vgg16 | 32 | 64 | 1 | replicated | N/A | 234.22 |
28 | vgg16 | 32 | 64 | 2 | replicated | N/A | 451.64 |
29 | vgg16 | 32 | 64 | 4 | replicated | N/A | 809.41 |
This figure compares the result we get with Tensorflow official ones.
Green bars stand for our benchmark results using floating point 16.
Red bars are the official Tensorflow result.
Blue bars stand for our benchmark results using floating point 32.
This figure shows the performance ratio of our floating point 16 and 32 benchmarks with respect to Tensorflow official results:
The following table provides a more comprehensive benchmark results on our system:
base_model | FP_type | batch_size | n_gpus | variable_update | local_parameter_device | image_per_sec | |
---|---|---|---|---|---|---|---|
0 | alexnet | 16 | 512 | 1 | parameter_server | N/A | 8176.13 |
1 | alexnet | 16 | 512 | 1 | replicated | N/A | 8092.12 |
2 | alexnet | 16 | 512 | 2 | parameter_server | N/A | 15724.07 |
3 | alexnet | 16 | 512 | 2 | replicated | N/A | 15289.09 |
4 | alexnet | 16 | 512 | 4 | parameter_server | N/A | 26709.08 |
5 | alexnet | 16 | 512 | 4 | replicated | N/A | 25295.36 |
6 | alexnet | 32 | 512 | 1 | parameter_server | N/A | 4645.89 |
7 | alexnet | 32 | 512 | 1 | replicated | N/A | 4692.38 |
8 | alexnet | 32 | 512 | 2 | parameter_server | N/A | 8994.28 |
9 | alexnet | 32 | 512 | 2 | replicated | N/A | 8993.17 |
10 | alexnet | 32 | 512 | 4 | parameter_server | N/A | 15563.43 |
11 | alexnet | 32 | 512 | 4 | replicated | N/A | 15868.53 |
12 | inception3 | 16 | 32 | 1 | parameter_server | gpu_parameterDevice | 368.16 |
13 | inception3 | 16 | 32 | 1 | parameter_server | cpu_parameterDevice | 289.93 |
14 | inception3 | 16 | 32 | 1 | replicated | cpu_parameterDevice | 345.22 |
15 | inception3 | 16 | 32 | 1 | replicated | gpu_parameterDevice | 336.22 |
16 | inception3 | 16 | 32 | 2 | parameter_server | gpu_parameterDevice | 569.06 |
17 | inception3 | 16 | 32 | 2 | parameter_server | cpu_parameterDevice | 593.43 |
18 | inception3 | 16 | 32 | 2 | replicated | cpu_parameterDevice | 610.20 |
19 | inception3 | 16 | 32 | 2 | replicated | gpu_parameterDevice | 594.11 |
20 | inception3 | 16 | 32 | 4 | parameter_server | gpu_parameterDevice | 972.40 |
21 | inception3 | 16 | 32 | 4 | parameter_server | cpu_parameterDevice | 1030.18 |
22 | inception3 | 16 | 32 | 4 | replicated | gpu_parameterDevice | 1076.02 |
23 | inception3 | 16 | 32 | 4 | replicated | cpu_parameterDevice | 1064.68 |
24 | inception3 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 423.90 |
25 | inception3 | 16 | 64 | 1 | parameter_server | gpu_parameterDevice | 440.50 |
26 | inception3 | 16 | 64 | 1 | replicated | gpu_parameterDevice | 436.09 |
27 | inception3 | 16 | 64 | 1 | replicated | cpu_parameterDevice | 436.16 |
28 | inception3 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 823.06 |
29 | inception3 | 16 | 64 | 2 | parameter_server | gpu_parameterDevice | 811.04 |
30 | inception3 | 16 | 64 | 2 | replicated | cpu_parameterDevice | 842.72 |
31 | inception3 | 16 | 64 | 2 | replicated | gpu_parameterDevice | 848.60 |
32 | inception3 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 1561.19 |
33 | inception3 | 16 | 64 | 4 | parameter_server | gpu_parameterDevice | 1502.41 |
34 | inception3 | 16 | 64 | 4 | replicated | cpu_parameterDevice | 1701.83 |
35 | inception3 | 16 | 64 | 4 | replicated | gpu_parameterDevice | 1605.17 |
36 | inception3 | 16 | 128 | 1 | replicated | no_parameterDevice | 491.12 |
37 | inception3 | 16 | 128 | 2 | replicated | no_parameterDevice | 972.62 |
38 | inception3 | 16 | 128 | 4 | replicated | no_parameterDevice | 1926.57 |
39 | inception3 | 16 | 256 | 1 | replicated | no_parameterDevice | 521.54 |
40 | inception3 | 16 | 256 | 2 | replicated | no_parameterDevice | 1032.37 |
41 | inception3 | 16 | 256 | 4 | replicated | no_parameterDevice | 2043.78 |
42 | inception3 | 32 | 32 | 1 | parameter_server | gpu_parameterDevice | 224.14 |
43 | inception3 | 32 | 32 | 1 | parameter_server | cpu_parameterDevice | 217.93 |
44 | inception3 | 32 | 32 | 1 | replicated | cpu_parameterDevice | 224.14 |
45 | inception3 | 32 | 32 | 1 | replicated | gpu_parameterDevice | 225.85 |
46 | inception3 | 32 | 32 | 2 | parameter_server | gpu_parameterDevice | 414.47 |
47 | inception3 | 32 | 32 | 2 | parameter_server | cpu_parameterDevice | 424.24 |
48 | inception3 | 32 | 32 | 2 | replicated | cpu_parameterDevice | 431.49 |
49 | inception3 | 32 | 32 | 2 | replicated | gpu_parameterDevice | 439.14 |
50 | inception3 | 32 | 32 | 4 | parameter_server | cpu_parameterDevice | 788.87 |
51 | inception3 | 32 | 32 | 4 | parameter_server | gpu_parameterDevice | 747.39 |
52 | inception3 | 32 | 32 | 4 | replicated | cpu_parameterDevice | 856.59 |
53 | inception3 | 32 | 32 | 4 | replicated | gpu_parameterDevice | 829.10 |
54 | inception3 | 32 | 64 | 1 | parameter_server | gpu_parameterDevice | 247.98 |
55 | inception3 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 245.36 |
56 | inception3 | 32 | 64 | 1 | replicated | cpu_parameterDevice | 247.98 |
57 | inception3 | 32 | 64 | 1 | replicated | gpu_parameterDevice | 247.96 |
58 | inception3 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 472.07 |
59 | inception3 | 32 | 64 | 2 | parameter_server | gpu_parameterDevice | 451.69 |
60 | inception3 | 32 | 64 | 2 | replicated | gpu_parameterDevice | 491.18 |
61 | inception3 | 32 | 64 | 2 | replicated | cpu_parameterDevice | 486.19 |
62 | inception3 | 32 | 64 | 4 | parameter_server | gpu_parameterDevice | 919.57 |
63 | inception3 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 918.71 |
64 | inception3 | 32 | 64 | 4 | replicated | gpu_parameterDevice | 963.25 |
65 | inception3 | 32 | 64 | 4 | replicated | cpu_parameterDevice | 962.94 |
66 | resnet152 | 16 | 32 | 1 | parameter_server | cpu_parameterDevice | 201.65 |
67 | resnet152 | 16 | 32 | 1 | parameter_server | gpu_parameterDevice | 229.96 |
68 | resnet152 | 16 | 32 | 1 | replicated | gpu_parameterDevice | 218.05 |
69 | resnet152 | 16 | 32 | 1 | replicated | cpu_parameterDevice | 193.48 |
70 | resnet152 | 16 | 32 | 2 | parameter_server | cpu_parameterDevice | 332.26 |
71 | resnet152 | 16 | 32 | 2 | parameter_server | gpu_parameterDevice | 341.70 |
72 | resnet152 | 16 | 32 | 2 | replicated | cpu_parameterDevice | 329.15 |
73 | resnet152 | 16 | 32 | 2 | replicated | gpu_parameterDevice | 375.69 |
74 | resnet152 | 16 | 32 | 4 | parameter_server | gpu_parameterDevice | 537.87 |
75 | resnet152 | 16 | 32 | 4 | parameter_server | cpu_parameterDevice | 593.27 |
76 | resnet152 | 16 | 32 | 4 | replicated | cpu_parameterDevice | 664.84 |
77 | resnet152 | 16 | 32 | 4 | replicated | gpu_parameterDevice | 624.26 |
78 | resnet152 | 16 | 64 | 1 | parameter_server | gpu_parameterDevice | 290.69 |
79 | resnet152 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 283.59 |
80 | resnet152 | 16 | 64 | 1 | replicated | cpu_parameterDevice | 287.45 |
81 | resnet152 | 16 | 64 | 1 | replicated | gpu_parameterDevice | 287.39 |
82 | resnet152 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 531.70 |
83 | resnet152 | 16 | 64 | 2 | parameter_server | gpu_parameterDevice | 510.79 |
84 | resnet152 | 16 | 64 | 2 | replicated | cpu_parameterDevice | 562.41 |
85 | resnet152 | 16 | 64 | 2 | replicated | gpu_parameterDevice | 574.11 |
86 | resnet152 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 975.86 |
87 | resnet152 | 16 | 64 | 4 | parameter_server | gpu_parameterDevice | 894.71 |
88 | resnet152 | 16 | 64 | 4 | replicated | cpu_parameterDevice | 1075.84 |
89 | resnet152 | 16 | 64 | 4 | replicated | gpu_parameterDevice | 945.07 |
90 | resnet152 | 16 | 128 | 1 | replicated | no_parameterDevice | 330.65 |
91 | resnet152 | 16 | 128 | 2 | replicated | no_parameterDevice | 648.43 |
92 | resnet152 | 16 | 128 | 4 | replicated | no_parameterDevice | 1288.11 |
93 | resnet152 | 32 | 32 | 1 | parameter_server | cpu_parameterDevice | 131.59 |
94 | resnet152 | 32 | 32 | 1 | parameter_server | gpu_parameterDevice | 137.46 |
95 | resnet152 | 32 | 32 | 1 | replicated | gpu_parameterDevice | 137.48 |
96 | resnet152 | 32 | 32 | 1 | replicated | cpu_parameterDevice | 137.46 |
97 | resnet152 | 32 | 32 | 2 | parameter_server | cpu_parameterDevice | 252.06 |
98 | resnet152 | 32 | 32 | 2 | parameter_server | gpu_parameterDevice | 258.02 |
99 | resnet152 | 32 | 32 | 2 | replicated | cpu_parameterDevice | 266.06 |
100 | resnet152 | 32 | 32 | 2 | replicated | gpu_parameterDevice | 269.09 |
101 | resnet152 | 32 | 32 | 4 | parameter_server | cpu_parameterDevice | 475.65 |
102 | resnet152 | 32 | 32 | 4 | parameter_server | gpu_parameterDevice | 428.43 |
103 | resnet152 | 32 | 32 | 4 | replicated | cpu_parameterDevice | 531.89 |
104 | resnet152 | 32 | 32 | 4 | replicated | gpu_parameterDevice | 510.74 |
105 | resnet152 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 148.61 |
106 | resnet152 | 32 | 64 | 1 | parameter_server | gpu_parameterDevice | 153.30 |
107 | resnet152 | 32 | 64 | 1 | replicated | cpu_parameterDevice | 152.38 |
108 | resnet152 | 32 | 64 | 1 | replicated | gpu_parameterDevice | 153.30 |
109 | resnet152 | 32 | 64 | 2 | parameter_server | gpu_parameterDevice | 297.53 |
110 | resnet152 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 287.02 |
111 | resnet152 | 32 | 64 | 2 | replicated | gpu_parameterDevice | 304.75 |
112 | resnet152 | 32 | 64 | 2 | replicated | cpu_parameterDevice | 302.79 |
113 | resnet152 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 567.19 |
114 | resnet152 | 32 | 64 | 4 | parameter_server | gpu_parameterDevice | 546.67 |
115 | resnet152 | 32 | 64 | 4 | replicated | cpu_parameterDevice | 591.06 |
116 | resnet152 | 32 | 64 | 4 | replicated | gpu_parameterDevice | 587.99 |
117 | resnet152_v2 | 16 | 32 | 1 | parameter_server | cpu_parameterDevice | 210.39 |
118 | resnet152_v2 | 16 | 32 | 1 | parameter_server | gpu_parameterDevice | 234.22 |
119 | resnet152_v2 | 16 | 32 | 1 | replicated | cpu_parameterDevice | 218.01 |
120 | resnet152_v2 | 16 | 32 | 1 | replicated | gpu_parameterDevice | 225.86 |
121 | resnet152_v2 | 16 | 32 | 2 | parameter_server | cpu_parameterDevice | 319.55 |
122 | resnet152_v2 | 16 | 32 | 2 | parameter_server | gpu_parameterDevice | 357.69 |
123 | resnet152_v2 | 16 | 32 | 2 | replicated | gpu_parameterDevice | 377.47 |
124 | resnet152_v2 | 16 | 32 | 2 | replicated | cpu_parameterDevice | 384.85 |
125 | resnet152_v2 | 16 | 32 | 4 | parameter_server | cpu_parameterDevice | 652.52 |
126 | resnet152_v2 | 16 | 32 | 4 | parameter_server | gpu_parameterDevice | 550.14 |
127 | resnet152_v2 | 16 | 32 | 4 | replicated | cpu_parameterDevice | 673.32 |
128 | resnet152_v2 | 16 | 32 | 4 | replicated | gpu_parameterDevice | 632.01 |
129 | resnet152_v2 | 16 | 64 | 1 | parameter_server | gpu_parameterDevice | 294.10 |
130 | resnet152_v2 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 290.34 |
131 | resnet152_v2 | 16 | 64 | 1 | replicated | gpu_parameterDevice | 294.56 |
132 | resnet152_v2 | 16 | 64 | 1 | replicated | cpu_parameterDevice | 294.12 |
133 | resnet152_v2 | 16 | 64 | 2 | parameter_server | gpu_parameterDevice | 532.29 |
134 | resnet152_v2 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 542.82 |
135 | resnet152_v2 | 16 | 64 | 2 | replicated | gpu_parameterDevice | 575.56 |
136 | resnet152_v2 | 16 | 64 | 2 | replicated | cpu_parameterDevice | 574.35 |
137 | resnet152_v2 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 1030.82 |
138 | resnet152_v2 | 16 | 64 | 4 | parameter_server | gpu_parameterDevice | 944.88 |
139 | resnet152_v2 | 16 | 64 | 4 | replicated | gpu_parameterDevice | 1032.10 |
140 | resnet152_v2 | 16 | 64 | 4 | replicated | cpu_parameterDevice | 1159.66 |
141 | resnet152_v2 | 16 | 128 | 1 | replicated | no_parameterDevice | 335.02 |
142 | resnet152_v2 | 16 | 128 | 2 | replicated | no_parameterDevice | 661.07 |
143 | resnet152_v2 | 16 | 128 | 4 | replicated | no_parameterDevice | 1296.60 |
144 | resnet152_v2 | 32 | 32 | 1 | parameter_server | cpu_parameterDevice | 132.99 |
145 | resnet152_v2 | 32 | 32 | 1 | parameter_server | gpu_parameterDevice | 139.09 |
146 | resnet152_v2 | 32 | 32 | 1 | replicated | gpu_parameterDevice | 140.51 |
147 | resnet152_v2 | 32 | 32 | 1 | replicated | cpu_parameterDevice | 138.95 |
148 | resnet152_v2 | 32 | 32 | 2 | parameter_server | gpu_parameterDevice | 252.95 |
149 | resnet152_v2 | 32 | 32 | 2 | parameter_server | cpu_parameterDevice | 257.58 |
150 | resnet152_v2 | 32 | 32 | 2 | replicated | gpu_parameterDevice | 271.95 |
151 | resnet152_v2 | 32 | 32 | 2 | replicated | cpu_parameterDevice | 271.85 |
152 | resnet152_v2 | 32 | 32 | 4 | parameter_server | gpu_parameterDevice | 447.54 |
153 | resnet152_v2 | 32 | 32 | 4 | parameter_server | cpu_parameterDevice | 484.61 |
154 | resnet152_v2 | 32 | 32 | 4 | replicated | gpu_parameterDevice | 532.23 |
155 | resnet152_v2 | 32 | 32 | 4 | replicated | cpu_parameterDevice | 532.04 |
156 | resnet152_v2 | 32 | 64 | 1 | parameter_server | gpu_parameterDevice | 155.19 |
157 | resnet152_v2 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 151.34 |
158 | resnet152_v2 | 32 | 64 | 1 | replicated | gpu_parameterDevice | 154.24 |
159 | resnet152_v2 | 32 | 64 | 1 | replicated | cpu_parameterDevice | 154.24 |
160 | resnet152_v2 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 298.92 |
161 | resnet152_v2 | 32 | 64 | 2 | parameter_server | gpu_parameterDevice | 295.79 |
162 | resnet152_v2 | 32 | 64 | 2 | replicated | gpu_parameterDevice | 308.46 |
163 | resnet152_v2 | 32 | 64 | 2 | replicated | cpu_parameterDevice | 306.41 |
164 | resnet152_v2 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 557.58 |
165 | resnet152_v2 | 32 | 64 | 4 | parameter_server | gpu_parameterDevice | 526.94 |
166 | resnet152_v2 | 32 | 64 | 4 | replicated | cpu_parameterDevice | 601.99 |
167 | resnet152_v2 | 32 | 64 | 4 | replicated | gpu_parameterDevice | 595.10 |
168 | resnet50 | 16 | 32 | 1 | parameter_server | cpu_parameterDevice | 486.00 |
169 | resnet50 | 16 | 32 | 1 | parameter_server | gpu_parameterDevice | 521.21 |
170 | resnet50 | 16 | 32 | 1 | replicated | cpu_parameterDevice | 471.64 |
171 | resnet50 | 16 | 32 | 1 | replicated | gpu_parameterDevice | 516.20 |
172 | resnet50 | 16 | 32 | 2 | parameter_server | gpu_parameterDevice | 824.17 |
173 | resnet50 | 16 | 32 | 2 | parameter_server | cpu_parameterDevice | 891.88 |
174 | resnet50 | 16 | 32 | 2 | replicated | cpu_parameterDevice | 981.99 |
175 | resnet50 | 16 | 32 | 2 | replicated | gpu_parameterDevice | 953.31 |
176 | resnet50 | 16 | 32 | 4 | parameter_server | cpu_parameterDevice | 1627.05 |
177 | resnet50 | 16 | 32 | 4 | parameter_server | gpu_parameterDevice | 1502.83 |
178 | resnet50 | 16 | 32 | 4 | replicated | cpu_parameterDevice | 1834.62 |
179 | resnet50 | 16 | 32 | 4 | replicated | gpu_parameterDevice | 1598.16 |
180 | resnet50 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 618.52 |
181 | resnet50 | 16 | 64 | 1 | parameter_server | gpu_parameterDevice | 641.96 |
182 | resnet50 | 16 | 64 | 1 | replicated | cpu_parameterDevice | 632.34 |
183 | resnet50 | 16 | 64 | 1 | replicated | gpu_parameterDevice | 638.60 |
184 | resnet50 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 1174.06 |
185 | resnet50 | 16 | 64 | 2 | parameter_server | gpu_parameterDevice | 1221.21 |
186 | resnet50 | 16 | 64 | 2 | replicated | cpu_parameterDevice | 1245.83 |
187 | resnet50 | 16 | 64 | 2 | replicated | gpu_parameterDevice | 1239.34 |
188 | resnet50 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 2294.24 |
189 | resnet50 | 16 | 64 | 4 | parameter_server | gpu_parameterDevice | 2199.00 |
190 | resnet50 | 16 | 64 | 4 | replicated | cpu_parameterDevice | 2489.61 |
191 | resnet50 | 16 | 64 | 4 | replicated | gpu_parameterDevice | 2376.21 |
192 | resnet50 | 16 | 128 | 1 | replicated | no_parameterDevice | 712.54 |
193 | resnet50 | 16 | 128 | 2 | replicated | no_parameterDevice | 1419.40 |
194 | resnet50 | 16 | 128 | 4 | replicated | no_parameterDevice | 2733.84 |
195 | resnet50 | 16 | 256 | 1 | replicated | no_parameterDevice | 749.41 |
196 | resnet50 | 16 | 256 | 2 | replicated | no_parameterDevice | 1477.06 |
197 | resnet50 | 16 | 256 | 4 | replicated | no_parameterDevice | 2932.09 |
198 | resnet50 | 32 | 32 | 1 | parameter_server | cpu_parameterDevice | 324.01 |
199 | resnet50 | 32 | 32 | 1 | parameter_server | gpu_parameterDevice | 336.18 |
200 | resnet50 | 32 | 32 | 1 | replicated | cpu_parameterDevice | 332.84 |
201 | resnet50 | 32 | 32 | 1 | replicated | gpu_parameterDevice | 332.86 |
202 | resnet50 | 32 | 32 | 2 | parameter_server | gpu_parameterDevice | 642.03 |
203 | resnet50 | 32 | 32 | 2 | parameter_server | cpu_parameterDevice | 621.98 |
204 | resnet50 | 32 | 32 | 2 | replicated | cpu_parameterDevice | 658.53 |
205 | resnet50 | 32 | 32 | 2 | replicated | gpu_parameterDevice | 648.60 |
206 | resnet50 | 32 | 32 | 4 | parameter_server | gpu_parameterDevice | 1053.87 |
207 | resnet50 | 32 | 32 | 4 | parameter_server | cpu_parameterDevice | 1201.09 |
208 | resnet50 | 32 | 32 | 4 | replicated | gpu_parameterDevice | 1246.13 |
209 | resnet50 | 32 | 32 | 4 | replicated | cpu_parameterDevice | 1282.85 |
210 | resnet50 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 364.54 |
211 | resnet50 | 32 | 64 | 1 | parameter_server | gpu_parameterDevice | 371.99 |
212 | resnet50 | 32 | 64 | 1 | replicated | cpu_parameterDevice | 371.97 |
213 | resnet50 | 32 | 64 | 1 | replicated | gpu_parameterDevice | 372.02 |
214 | resnet50 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 719.03 |
215 | resnet50 | 32 | 64 | 2 | parameter_server | gpu_parameterDevice | 722.70 |
216 | resnet50 | 32 | 64 | 2 | replicated | cpu_parameterDevice | 722.34 |
217 | resnet50 | 32 | 64 | 2 | replicated | gpu_parameterDevice | 736.44 |
218 | resnet50 | 32 | 64 | 4 | parameter_server | gpu_parameterDevice | 1317.22 |
219 | resnet50 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 1402.16 |
220 | resnet50 | 32 | 64 | 4 | replicated | cpu_parameterDevice | 1424.48 |
221 | resnet50 | 32 | 64 | 4 | replicated | gpu_parameterDevice | 1424.51 |
222 | resnet50_v2 | 16 | 32 | 1 | parameter_server | cpu_parameterDevice | 510.22 |
223 | resnet50_v2 | 16 | 32 | 1 | parameter_server | gpu_parameterDevice | 537.68 |
224 | resnet50_v2 | 16 | 32 | 1 | replicated | gpu_parameterDevice | 505.88 |
225 | resnet50_v2 | 16 | 32 | 1 | replicated | cpu_parameterDevice | 516.14 |
226 | resnet50_v2 | 16 | 32 | 2 | parameter_server | gpu_parameterDevice | 953.22 |
227 | resnet50_v2 | 16 | 32 | 2 | parameter_server | cpu_parameterDevice | 942.64 |
228 | resnet50_v2 | 16 | 32 | 2 | replicated | gpu_parameterDevice | 1011.77 |
229 | resnet50_v2 | 16 | 32 | 2 | replicated | cpu_parameterDevice | 1000.48 |
230 | resnet50_v2 | 16 | 32 | 4 | parameter_server | cpu_parameterDevice | 1643.04 |
231 | resnet50_v2 | 16 | 32 | 4 | parameter_server | gpu_parameterDevice | 1517.55 |
232 | resnet50_v2 | 16 | 32 | 4 | replicated | gpu_parameterDevice | 1703.01 |
233 | resnet50_v2 | 16 | 32 | 4 | replicated | cpu_parameterDevice | 1778.21 |
234 | resnet50_v2 | 16 | 64 | 1 | parameter_server | cpu_parameterDevice | 647.49 |
235 | resnet50_v2 | 16 | 64 | 1 | parameter_server | gpu_parameterDevice | 655.20 |
236 | resnet50_v2 | 16 | 64 | 1 | replicated | gpu_parameterDevice | 648.63 |
237 | resnet50_v2 | 16 | 64 | 1 | replicated | cpu_parameterDevice | 655.02 |
238 | resnet50_v2 | 16 | 64 | 2 | parameter_server | cpu_parameterDevice | 1232.88 |
239 | resnet50_v2 | 16 | 64 | 2 | parameter_server | gpu_parameterDevice | 1200.12 |
240 | resnet50_v2 | 16 | 64 | 2 | replicated | cpu_parameterDevice | 1283.14 |
241 | resnet50_v2 | 16 | 64 | 2 | replicated | gpu_parameterDevice | 1297.01 |
242 | resnet50_v2 | 16 | 64 | 4 | parameter_server | gpu_parameterDevice | 2176.56 |
243 | resnet50_v2 | 16 | 64 | 4 | parameter_server | cpu_parameterDevice | 2347.95 |
244 | resnet50_v2 | 16 | 64 | 4 | replicated | cpu_parameterDevice | 2553.59 |
245 | resnet50_v2 | 16 | 64 | 4 | replicated | gpu_parameterDevice | 2492.30 |
246 | resnet50_v2 | 16 | 128 | 1 | replicated | no_parameterDevice | 733.20 |
247 | resnet50_v2 | 16 | 128 | 2 | replicated | no_parameterDevice | 1445.30 |
248 | resnet50_v2 | 16 | 128 | 4 | replicated | no_parameterDevice | 2771.44 |
249 | resnet50_v2 | 16 | 256 | 1 | replicated | no_parameterDevice | 766.46 |
250 | resnet50_v2 | 16 | 256 | 2 | replicated | no_parameterDevice | 1532.91 |
251 | resnet50_v2 | 16 | 256 | 4 | replicated | no_parameterDevice | 2997.32 |
252 | resnet50_v2 | 32 | 32 | 1 | parameter_server | cpu_parameterDevice | 332.66 |
253 | resnet50_v2 | 32 | 32 | 1 | parameter_server | gpu_parameterDevice | 347.81 |
254 | resnet50_v2 | 32 | 32 | 1 | replicated | cpu_parameterDevice | 344.23 |
255 | resnet50_v2 | 32 | 32 | 1 | replicated | gpu_parameterDevice | 345.27 |
256 | resnet50_v2 | 32 | 32 | 2 | parameter_server | gpu_parameterDevice | 672.42 |
257 | resnet50_v2 | 32 | 32 | 2 | parameter_server | cpu_parameterDevice | 650.97 |
258 | resnet50_v2 | 32 | 32 | 2 | replicated | cpu_parameterDevice | 665.32 |
259 | resnet50_v2 | 32 | 32 | 2 | replicated | gpu_parameterDevice | 683.57 |
260 | resnet50_v2 | 32 | 32 | 4 | parameter_server | cpu_parameterDevice | 1248.53 |
261 | resnet50_v2 | 32 | 32 | 4 | parameter_server | gpu_parameterDevice | 1101.25 |
262 | resnet50_v2 | 32 | 32 | 4 | replicated | gpu_parameterDevice | 1258.33 |
263 | resnet50_v2 | 32 | 32 | 4 | replicated | cpu_parameterDevice | 1303.18 |
264 | resnet50_v2 | 32 | 64 | 1 | parameter_server | cpu_parameterDevice | 371.81 |
265 | resnet50_v2 | 32 | 64 | 1 | parameter_server | gpu_parameterDevice | 383.19 |
266 | resnet50_v2 | 32 | 64 | 1 | replicated | gpu_parameterDevice | 377.57 |
267 | resnet50_v2 | 32 | 64 | 1 | replicated | cpu_parameterDevice | 383.30 |
268 | resnet50_v2 | 32 | 64 | 2 | parameter_server | cpu_parameterDevice | 732.40 |
269 | resnet50_v2 | 32 | 64 | 2 | parameter_server | gpu_parameterDevice | 747.31 |
270 | resnet50_v2 | 32 | 64 | 2 | replicated | gpu_parameterDevice | 751.50 |
271 | resnet50_v2 | 32 | 64 | 2 | replicated | cpu_parameterDevice | 751.11 |
272 | resnet50_v2 | 32 | 64 | 4 | parameter_server | gpu_parameterDevice | 1353.14 |
273 | resnet50_v2 | 32 | 64 | 4 | parameter_server | cpu_parameterDevice | 1423.04 |
274 | resnet50_v2 | 32 | 64 | 4 | replicated | gpu_parameterDevice | 1445.11 |
275 | resnet50_v2 | 32 | 64 | 4 | replicated | cpu_parameterDevice | 1444.34 |
276 | vgg16 | 16 | 32 | 1 | parameter_server | N/A | 379.42 |
277 | vgg16 | 16 | 32 | 1 | replicated | N/A | 379.52 |
278 | vgg16 | 16 | 32 | 2 | parameter_server | N/A | 722.64 |
279 | vgg16 | 16 | 32 | 2 | replicated | N/A | 676.75 |
280 | vgg16 | 16 | 32 | 4 | parameter_server | N/A | 991.95 |
281 | vgg16 | 16 | 32 | 4 | replicated | N/A | 1001.89 |
282 | vgg16 | 16 | 64 | 1 | parameter_server | N/A | 403.87 |
283 | vgg16 | 16 | 64 | 1 | replicated | N/A | 405.58 |
284 | vgg16 | 16 | 64 | 2 | parameter_server | N/A | 778.24 |
285 | vgg16 | 16 | 64 | 2 | replicated | N/A | 766.43 |
286 | vgg16 | 16 | 64 | 4 | parameter_server | N/A | 1264.47 |
287 | vgg16 | 16 | 64 | 4 | replicated | N/A | 1264.35 |
288 | vgg16 | 16 | 128 | 1 | replicated | N/A | 425.12 |
289 | vgg16 | 16 | 128 | 2 | replicated | N/A | 822.49 |
290 | vgg16 | 16 | 128 | 4 | replicated | N/A | 1466.10 |
291 | vgg16 | 16 | 256 | 1 | replicated | N/A | 396.79 |
292 | vgg16 | 16 | 256 | 2 | replicated | N/A | 778.23 |
293 | vgg16 | 16 | 256 | 4 | replicated | N/A | 1371.82 |
294 | vgg16 | 32 | 32 | 1 | parameter_server | N/A | 225.86 |
295 | vgg16 | 32 | 32 | 1 | replicated | N/A | 225.86 |
296 | vgg16 | 32 | 32 | 2 | parameter_server | N/A | 424.41 |
297 | vgg16 | 32 | 32 | 2 | replicated | N/A | 418.82 |
298 | vgg16 | 32 | 32 | 4 | parameter_server | N/A | 692.95 |
299 | vgg16 | 32 | 32 | 4 | replicated | N/A | 683.66 |
300 | vgg16 | 32 | 64 | 1 | parameter_server | N/A | 236.39 |
301 | vgg16 | 32 | 64 | 1 | replicated | N/A | 234.22 |
302 | vgg16 | 32 | 64 | 2 | parameter_server | N/A | 455.67 |
303 | vgg16 | 32 | 64 | 2 | replicated | N/A | 451.64 |
304 | vgg16 | 32 | 64 | 4 | parameter_server | N/A | 815.81 |
305 | vgg16 | 32 | 64 | 4 | replicated | N/A | 809.41 |
POWER8 (p8)