Distributed software-defined storage (such as Ceph) provides for scalable and flexible storage systems, covering the full range from cold and long-term data retention, general purpose storage, cloud storage, big data to high performance compute applications. Compared to legacy appliances, the full set of configuration and architecture choices at all layers (server, network, software) is exposed. In theory, this allows them to be infinitely customizable. In practice, their performance characteristics are non-trivial to predict and estimate without a costly large pilot.
Here, we will discuss relevant design choices and their trade-offs, and how they will affect the performance and cost of the resulting system. Research into a model and simulation of actual Ceph deployments is also included.