next up previous
Next: Observability Up: The Maelstrom Previous: Testing the wind

Safety

Even in the simple examples of this paper, there are conditions in which a troubleshooting script can make things worse by its actions. This can happen if a corrective action is too extreme or depends upon external resources that are themselves down at the moment. If our scripts are convergent in the Cfengine sense and there are no hidden constraints, Maelstrom is relatively safe. If, e.g., a reboot depends upon hidden constraints that have not been assured, such as a service required for reboot, Maelstrom may reboot a server even though this makes the network state worse, and may well make future troubleshooting impossible without operator intervention.

Maelstrom is currently relatively naive about the limitations of its environment. It can be made safer by giving it more understanding of the imperfections within its scripts, and the hidden couplings between scripts and Maelstrom's environment.

Maelstrom cannot currently compensate for inhomogeneity or lack of convergence in scripts. In the future, there will be stronger precedence operators to control Maelstrom's actions in the presence of imperfect scripts. Recall that ``c2 : c1'' means ``c1 might theoretically precede c2''. We plan other precedence operations whose main purpose is to compensate for script deficiencies:

Both of these are still weaker conditions than the ``:'' in make. In Maelstrom, we could notate make's concept of strong precedence as follows: We use the colons to limit the number of characters one must escape inside shell commands in the configuration file (currently `:', `;', `[', and `]').

All of these syntactic mechanisms are attempts to compensate for non-homogeneous or non-convergent behavior in scripts.

To understand the importance of adding these precedence operators to Maelstrom, note that with even the first one (::) we can simulate make with Maelstrom without resorting to more script intelligence. If script c1 is:

if [ -nt foo foo.o \
  -a -nt foo bar.o] ; exit 1
g++ -o foo foo.o bar.o exit 0
and script c2 is:
if [ -nt foo.o foo.c ] ; \
  exit 0 
g++ -c foo.c 
exit 0
and script c3 is:
 
if [ -nt bar.o bar.c ] ; \
  exit 0 
g++ -c bar.c 
exit 0
then the Maelstrom declarations:
c1 :: c2 
c1 :: c3
would accomplish the same effect as the Makefile above. Even the relatively weak double-colon operator precedence avoids the need to have script c1 know all the dependencies between its files, as in the former example. This script might do redundant compilations, but in the end it will accomplish the exact same result as the Makefile. Although we discuss the possibility of `rebooting' as a result of a script, we are not happy with the prospect of automated power-cycling of servers. We are currently developing a tool that allows that kind of dangerous action to be controlled by an electronic mail or two-way pager transaction. The script that wishes to reboot a server asks us whether it should or not, and an operator can mail back a `yes' or `no' response.

One weakness of Maelstrom's scheduling is its simplicity. Many colleagues have suggested that Maelstrom should allow one to declare not just precedences, but also ``costs'' as a measure of how disruptive a particular action will be. One could then try solutions in order of increasing cost. But this would require an even more complex syntax in the configuration file, and theoretical precedences have the same overall effect (through different kinds of declarations).


next up previous
Next: Observability Up: The Maelstrom Previous: Testing the wind
Alva L. Couch
2001-10-02