System and Network Administration
lecture
in color
The practice
- Not about simple semantics of actions ...
- But about lifecycle.
- What will my actions mean:
- If I am no longer employed here?
- If I make changes to other things?
- other programs that this program uses.
- the version of the distribution or kernel.
- If a new user marches in and demands changes?
Definition of practice:
- Providing users the illusion of ordered changes!
- Don't pay attention to that person behind the curtain....
- "I can't go back, I don't know how it works!"
- This, in practice, is meaningless; the system administrator
doesn't have to roll back changes, just to pretend to do so.
Parts of practice:
- Defining expectations ("support")
- what does user need?
- when and how will you provide it?
- what will happen if it doesn't work?
- where do users go for help?
- "service-level agreement".
- "If you aren't supporting it, you aren't maintaining it."
- Elements:
- user help desks.
- user trouble ticketing.
- user problem tracking.
- Installation:
- user's goal: be able to do specific things.
- admin's goal: set up "preconditions" that make all other
further operations easier.
- Elements:
- requirements analysis.
- package management.
- pre-testing.
- repeatability.
- Change control ("planning")
- what changes are required and when?
- what services will they potentially affect?
- what will have to be tested afterward?
- Elements:
- capacity planning.
- needs planning.
- dependency analysis.
- structured testing.
- Maintenance
- user-required changes (don't let users make them!)
- upgrades (without breaking anything).
- security patches (ditto).
- "If you aren't maintaining it, you aren't managing it."
- Elements:
- software bug ticketing.
- interfacing with vendors for the user.
- invoking vendor maintenance agreements.
- Monitoring
- of static configuration.
- of application health.
- "If you aren't monitoring it, you aren't maintaining it."
- Elements:
- logging.
- users
- commands
- service requests
- log watching.
- integrity checking.
- system auditing.
The enemies of practice
- The halting problem: limits what we can know about proper function
of a system.
- Hidden dependencies (latent preconditions): limits what we can know
about the effects of a system change.
The halting problem
- The "halting problem": can't know enough to accomplish some tasks.
- Intractability: ditto.
- Actually, for finite systems, these are the same thing!
- (though some people argue that the Internet isn't finite! :( )
- there is no comprehensive static validation of a system
(that checks whether the system works without booting it:).
Dependencies
- Between software and libraries.
- Between software and other software.
- Between software and specific (well-known) system files.
- Dependencies are a "lifecycle effect". They get worse
as the age of a system increases.
Lifecycle dependency effects
- Filesystem rot.
- Script rot.
- Repository rot.
Filesystem rot
- Over time, filesystem fills up with dynamic libraries
- Can't delete one without potentially breaking something.
- Can't know what was broken -- perhaps for a long time --
until a user tries to use it.
- Result: no way to clean up filesystem.
Script rot:
- Scripts are constructed to maintain one kind of system.
- Over time, the system changes, but the script remains the same.
- Result: scripts fail to function in unforeseen (and perhaps
unpredictable) ways.
Repository rot:
- Build a big repository of programs to use.
- These are built with respect to a particular system.
- System changes, repository doesn't.
- Result: things break unpredictably.
In all cases:
- "Rot" occurs when there are changes that are decoupled from one another.
- Two kinds of rot:
- Documentation: can't delete something because you don't know the effect.
- Precondition: script or program requires hidden preconditions that are
present in one system but -- as time changes -- become unavailable.
Dealing with rot:
- Reactive: journalling, rollback, temporary changes, dependency analysis.
- Proactive: declared dependencies, baselining, standards, pre-validation.
Reactive mechanisms
- Allow one to uninstall a package.
- Do dependency analysis
Rollback: uninstallation
- unavailable using configure,make.
- available in package managers such as rpm.
- requirements:
- a journal of changes.
- old versions of any changed files.
- limitations:
- must undo file changes in opposite order.
- order of installation is unknown.
Dependency analysis
- determine beforehand what packages depend upon others.
- dynamic libraries used.
- programs opened.
- static dependency analysis
- depends upon contents of executables.
- program
ldd: describe dependencies between
programs and libraries.
- can't see what files a program opens, just the
libraries.
- dynamic dependency analysis
- monitor what each program does.
- technique: library wrapping
- when anyone calls "open", make a note of what was "opened".
- record inode number of process executable and file.
- post-run: associate inode numbers with paths.
- still trying! Microsoft has it running: see "stryder".
Standards
- Linux Standard Base (LSB): a standard for the positions and contents
of libraries, files, etc!
- Problem: vendors can't figure out how to test their software on
linux.
- Solution: provide a "standard" with three parts:
- A static environment validator that makes sure that:
- files are in the right places in the filesystem.
- libraries have the right contents.
- libraries with the right stuff in them load first(!)
- A static program validator that makes sure that
- library functions bind to standardized versions of the
functions during dynamic linking.
- library functions are called with the correct
types of arguments (pokes around in machine code)
- only well-known files are opened, and in places that
we expect them to live.
- A dynamic validation that the software functions correctly
on one host.
- Transitive validation claim: if
- the software P works correctly on one host A and
- the host A passes the environment validation and
- the software P passes the program validation and
- another host B passes the environment validation, then
- the software P will work on the other host B!
Subtle! not a violation of Church's thesis:
- Church's thesis (halting problem): there is no
effective procedure for determining whether a program
halts (completes) or not. Thus the only way to determine
whether a program halts or not is to run it.
- Step 1: determine that program produces appropriate output
("halts") on one host.
- Step 2: determine that the environment in which the program
runs is somehow standard.
- Step 3: determine that the only couplings between the program
and the outside world are standard.
- Step 4: determine that the same standard features exist on
another host, and that those couplings are the same for the
two hosts.
- Then function of the program on the second host is conditional;
since it couples to the same outside influences, and they are
similar for the two hosts, it'll probably work.
What LSB doesn't account for:
- Ordering differences between files in the LSB.
- Does program work if /etc/hosts is not in sorted order by hostname?
(It would be silly to construct such a program, but it happens)
- Content issues for files in the LSB.
- Can break a program by not providing it with enough
parameters, e.g.
Structural validation
- Construct a suite of packages that "get along".
- Clearly defined dependencies (A requires B)
- no overlap between packages ("orthogonality")
- Technique:
- Obey dependencies between packages to determine
installation order.
- Only install what you need.
- Can always come back and install more later.
- Result: RedHat Package Manager and RedHat 9.0
RedHat Package Manager
- basically, a way of dumping files into a filesystem
in an orderly manner.
- Parts of an RPM file:
- dependencies:
- "provides X": this package provides something needed by others.
- "requires Y": this package needs something from another.
- X and Y are character strings.
- an archive of files:
- absolute location (relative to /)
- always dumps files into specific locations within a filesystem.
- pre-install and post-install scripts
- what to do before and after dumping files into the filesystem.
- RPM features
- uninstall: undo changes
- imperfect: doesn't account for all possible changes.
- handles files and directories perfectly.
- pre-install and post-install scripts can be difficult to undo.
The quandary
- there is a balance between openness and closed worlds.
- "rpm -i": closed world.
- if you use RH9.0 mechanisms, it forms a "closed world",
- installs and uninstalls work.
- strong integrity constraint on the result.
- if you do anything else, even one "make install",
you violate that closure and RPM no longer insures integrity.
- "make install": open world.
- order matters (a lot)
- if you do one "rpm --uninstall" after several "make install"s,
you may break an installed package (due to lack of recording of
dependencies).
- Open and closed worlds do not get along:
- open: order matters (a lot), no possibility of undo.
- closed: only precedences matter, undo possible (but unreliable).
lecture
in color
/comp/150NET/notes/practice.php?include=style.txt
downloaded on Nov-23-2009 04:16:36 PM,
was last modified on Apr-26-2004 04:19:05 PM.
All lecture note content is copyright 2004 by
Alva L. Couch,
Computer Science,
Tufts University