Anne-Marie Kermarrec, Nicolas Le Scouarnec and Gilles Straub
Accepted as a Regular Paper at The 2011 International Symposium on Network Coding (NetCod 2011).
An extended version is available on arXiv:1102.0204.
The conference version is available on the author's page.
It supersedes an INRIA research report entitled Beyond Regenerating Codes and published on HAL in September 2010.
Abstract: Erasure correcting codes are widely used to ensure data persistence in distributed storage systems. This paper addresses the repair of such codes in the presence of simultaneous failures. It is crucial to maintain the required redundancy over time to prevent permanent data losses. We go beyond existing work (i.e., regenerating codes by Dimakis et al.) and propose coordinated regenerating codes allowing devices to coordinate during simultaneous repairs thus reducing the costs further. We provide closed form expressions of the communication costs of our new codes depending on the number of live devices and the number of devices being repaired. We prove that deliberately delaying repairs does not bring additional gains in itself. This means that regenerating codes are optimal as long as each failure can be repaired before a second one occurs. Yet, when multiple failures are detected simultaneously, we prove that our coordinated regenerating codes are optimal and outperform uncoordinated repairs (with respect to communication and storage costs). Finally, we define adaptive regenerating codes that self-adapt to the system state and prove they are optimal.