An Autonomic Failure-Detection Algorithm

Kevin Mills, Scott Rose, Steve Quirolgico, Mackenzie Britton, and Ceryen Tan

Designs for distributed systems must consider the possibility that failures will arise and must adopt specific failure detection strategies. We describe and analyze a self-regulating failure-detection algorithm that bounds resource usage and failure-detection latency, while automatically reassigning resources to improve failure-detection latency as system size decreases. We apply the algorithm to (1) Jini leasing, (2) service registration in the Service Location Protocol (SLP), and (3) SLP service polling.

Home Up