A Model and Decision Procedure for Data Storage in Cloud Computing (CCGrid 2012)


Cloud computing offers many possibilities for prospective users; there are however many different storage and compute services to choose from between all the cloud providers and their multiple datacenters. In this paper we focus on the problem of selecting the best storage services according to the application's requirements and the user's priorities. In previous work we described a capability based matching process that filters out any service that does not meet the requirements specified by the user. In this paper we introduce a mathematical model that takes this output lists of compatible storage services and constructs an integer linear programming problem. This ILP problem takes into account storage and compute cost as well as performance characteristics like latency, bandwidth, and job turnaround time; a solution to the problem yields an optimal assignment of datasets to storage services and of application runs to compute services. We show that with modern ILP solvers a reasonably sized problem can be solved in one second; even with an order of magnitude increase in cloud providers, number of datacenters, or storage services the problem instances can be solved under a minute. We finish our paper with two use cases, BLAST and MODIS. For MODIS our recommended data allocation leverages both cloud and local resources; it incurs in half the cost of a pure cloud solution and the job turnaround time is 52% faster compared to a pure local solution.

Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’12), May 13-16, 2012, Ottawa Canada.