Grid applications composed of multiple, distributed jobs are common areas for applying Web-scale workflows. Workflows over grid infrastructures are inherently complicated due to the need to both functionally assure the entire process and coordinate the underlying tasks. Often, these applications are long-running, and fault tolerance becomes a significant concern. Transparency is a vital aspect to understanding fault tolerance in these environments. © 2010 IEEE.