Enhanced Coordinated Checkpointing in Distributed System

TitleEnhanced Coordinated Checkpointing in Distributed System
Publication TypeJournal Article
Year of Publication2015
AuthorsMeroufel, B, Belalem, G
JournalInternational Journal of Applied Mathematics and Informatics (IJAMI)
Start Page23
Keywordsatomicity, Checkpointing, collective I/O., consistency, coordination, data sieving, fault t olerance, I/O, initiator, overhead, rollback

Coordinated checkpointing is a well-known method for achieving fault tolerance in distributed computing systems. This type of checkpointing selects an initiator to manage and ensure the checkpointing process. The majority of existing works ignore the role and the importance of this initiator. The work presented in this paper can be divided on two parts. In the first part, we examine the impact of initiator choice on different types of coordinated checkpointing and we prove its importance in term of performances. We propose also a simple and an effective strategy to select the best initiator each checkpointing round. In the second part of this work, we focused on the soft checkpointing and we have strengthened the role of initiator by adding a storage manager that ensures atomicity and speed of storage checkpoints files using a smart I/O strategy.