Skip to main content
Article
Low Latency Fault Tolerance System
Computer Journal
  • Wenbing Zhao, Cleveland State University
  • P. M. Melliar-Smith, University of California
  • L. E. Moser, University of California
Document Type
Article
Publication Date
10-2-2012
Abstract

The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications within a local-area network, using a leader-follower replication strategy. LLFT provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. Its novel system model enables LLFT to maintain a single consistent infinite computation, despite faults and asynchronous communication. The LLFT Messaging Protocol provides reliable, totally-ordered message delivery by employing a group multicast, where the message ordering is determined by the primary replica in the destination group. The Leader-Determined Membership Protocol provides reconfiguration and recovery when a replica becomes faulty and when a replica joins or leaves a group, where the membership of the group is determined by the primary replica. The Virtual Determinizer Framework captures the ordering information at the primary replica and enforces the same ordering of non-deterministic operations at the backup replicas. LLFT does not employ a majority-based, multiple-round consensus algorithm and, thus, it can operate in the common industrial case where there is a primary replica and only one backup replica. The LLFT system achieves low latency message delivery during normal operation and low latency reconfiguration and recovery when a fault occurs.

Article Number
Oxford
DOI
10.1093/comjnl/bxs131
Version
Postprint
Citation Information
Wenbing Zhao, P. M. Melliar-Smith and L. E. Moser. "Low Latency Fault Tolerance System" Computer Journal (2012)
Available at: http://works.bepress.com/wenbingzhao/34/