Asymptotic Behavior of Total Times for Jobs That Must Start Over if a Failure Occurs
Søren Asmussen,
Pierre Fiorini,
Lester Lipsky,
Tomasz Rolski,
Robert Sheahan
Department of Mathematical Sciences, Aarhus University, Ny Munkegade, DK-8000 Aarhus C, Denmark
Department of Computer Science, University of Southern Maine, Portland, Maine
Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut 06269
Mathematical Institute, Wroclaw University, 50-384 Wroclaw, Poland
Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut 06269
asmus{at}imf.au.dk
pmfiorini{at}gmail.com
lester{at}engr.uconn.edu
rolski{at}math.uni.wroc.pl
roberts{at}engr.uconn.edu
Many processes must complete in the presence of failures. Different systems respond to task failure in different ways. The system may resume a failed task from the failure point (or a saved checkpoint shortly before the failure point), it may give up on the task and select a replacement task from the ready queue, or it may restart the task. The behavior of systems under the first two scenarios is well documented, but the third (RESTART) has resisted detailed analysis. In this paper we derive tight asymptotic relations between the distribution of task times without failures and the total time when including failures, for any failure distribution. In particular, we show that if the task-time distribution has an unbounded support, then the total-time distribution H is always heavy tailed. Asymptotic expressions are given for the tail of H in various scenarios. The key ingredients of the analysis are the Cramér–Lundberg asymptotics for geometric sums and integral asymptotics, which in some cases are obtained via Tauberian theorems and in some cases by barehand calculations.
Key Words: Cramér-Lundberg approximation; failure recovery; geometric sums; heavy tails; logarithmic asymptotics; mixture distribution; power tail; reliability theory; RESTART; Tauberian theorem
History: Received: June 6, 2007;
revision received: March 25, 2008;
Copyright © 2008 by INFORMS.