Proxmox/Ceph storage performance notes

I recently found the XtremeOwnage.com blog and love the content. I will write some similar stuff.

My homelab has been a Proxmox cluster with Ceph for almost a year now. It takes a lot of hardware and is complex but I love the flexibility to rearrange the storage. Between 5 nodes I have 10 SSDs as OSDs. Some are consumer-class which I now understand are quite harmful to performance. Most of the ~1TB ones are workplace "e-waste redirect" units so are Intel datacenter-class models that presumably have PLP capacitors.  I started out with 1gbps per node so performance was never going to be great. But now I have some nodes running 5x1gbps so I am getting more interested in performance.

Here are the OSDs:


 The test pool that I created for benchmarking:


Write performance:

root@proxmox1:~# rados bench -p testpool 300 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 300 seconds or 0 objects
Object prefix: benchmark_data_proxmox1_200202
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        53        37   147.986       148    0.583851    0.320801
    2      16        90        74   147.987       148    0.147199    0.325505
    3      16       135       119   158.654       180    0.138938    0.328477
<snip>
  294      16      8702      8686   118.165       152    0.293002    0.540915
  295      16      8716      8700   117.954        56    0.203445    0.540626
  296      16      8738      8722   117.853        88    0.501285    0.541629
  297      16      8759      8743   117.739        84    0.643936    0.542475
  298      16      8784      8768   117.679       100    0.112998    0.542322
  299      16      8804      8788   117.553        80    0.232327    0.542073
2024-09-19T22:20:13.695935-0500 min lat: 0.0476777 max lat: 4.72464 avg lat: 0.543161
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
  300      15      8824      8809   117.441        84    0.234312    0.543161
Total time run:         300.739
Total writes made:      8824
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     117.364
Stddev Bandwidth:       42.9952
Max bandwidth (MB/sec): 220
Min bandwidth (MB/sec): 4
Average IOPS:           29
Stddev IOPS:            10.7488
Max IOPS:               55
Min IOPS:               1
Average Latency(s):     0.545222
Stddev Latency(s):      0.463362
Max latency(s):         4.72464
Min latency(s):         0.0476777

Sequential read performance

root@proxmox1:~# rados bench -p testpool 100 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        93        77   307.953       308   0.0125451    0.149726
    2      16       153       137   273.969       240   0.0473266    0.194905
    3      16       209       193   257.307       224   0.0672088     0.22432
    4      16       257       241   240.977       192   0.0402067    0.246331
    5      16       324       308   246.377       268    0.724575    0.241547
<snip>
   95      16      5348      5332   224.483       284    0.277306    0.283629
   96      16      5396      5380   224.145       192   0.0401958    0.283725
   97      16      5444      5428   223.813       192   0.0127102    0.284014
   98      16      5490      5474   223.407       184   0.0136518    0.284502
   99      16      5547      5531   223.453       228    0.151732    0.284663
2024-09-19T22:28:46.137589-0500 min lat: 0.00903257 max lat: 1.49518 avg lat: 0.2848
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
  100      15      5602      5587   223.458       224    0.141309      0.2848
Total time run:       100.495
Total reads made:     5602
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   222.976
Average IOPS:         55
Stddev IOPS:          7.95398
Max IOPS:             78
Min IOPS:             45
Average Latency(s):   0.28573
Max latency(s):       1.49518
Min latency(s):       0.00903257

Random read performance

root@proxmox1:~# rados bench -p testpool 100 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       110        94   375.941       376    0.182853    0.128227
    2      16       214       198   395.945       416   0.0210152    0.148397
    3      16       303       287   382.622       356    0.492813     0.15927
    4      16       396       380   379.959       372   0.0994178    0.160321
<snip>
   95      16      8401      8385   353.019       372   0.0999113    0.180418
   96      16      8504      8488   353.633       412    0.177019    0.180185
   97      16      8604      8588   354.111       400   0.0284281    0.179882
   98      16      8703      8687   354.535       396    0.326019    0.179665
   99      16      8801      8785   354.913       392    0.341465    0.179431
2024-09-19T22:31:27.955888-0500 min lat: 0.00162297 max lat: 1.04212 avg lat: 0.179455
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
  100      16      8880      8864   354.524       316   0.0563875    0.179455
Total time run:       100.436
Total reads made:     8880
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   353.658
Average IOPS:         88
Stddev IOPS:          9.03027
Max IOPS:             112
Min IOPS:             63
Average Latency(s):   0.179924
Max latency(s):       1.04212
Min latency(s):       0.00162297