lecture
in color
Security and Patterns
- Perl security: a model based upon transitive distrust.
- Basic idea: don't allow the program to affect the environment
based upon data that it has no reason to trust.
- Invoked during taint mode (
perl -T), which is invoked automatically
whenever Perl is running as a different user or group than that of the
invoker (set-user-id or set-group-id, or both).
Tainting
- All user input is initially tainted.
- Values of environment variables (
%ENV).
- Values of command-line arguments (
@ARGV).
- Input from file descriptors (
<STDIN>).
- Your current directory and
umask.
In short, everything your program "inherited" from its execution environment.
- All other values (whose content your program absolutely controls)
start out untainted.
- Tainting is an attribute of a scalar value (atom), not an attribute
of its container(!).
- Tainting an index does not taint the result of using that index
in an array or hash.
Algebra of tainting.
- Only scalar values (atoms) can be tainted.
- A variable can't be tainted; just its value.
- A copy of a tainted value is tainted.
- Any numeric or string expression incorporating tainted values is tainted.
This includes all scalar operations, such as:
- arithmetic
- string operations.
- etc.
- Surprise: a value fetched by using a tainted index into an array or hash, for which
the array or hash value isn't tainted, is not tainted.
- A reference to a tainted value is not tainted
(though the value resulting from defererencing it is!).
- There's no such thing as a tainted scalar, array, or hash, though:
- In
@ARGV, all values are initially tainted.
- In
%ENV, all values are initially tainted (keys are not).
- Arrays and hashes can contain a free mix of tainted and untainted values.
Untainting and validation
- Tainting marks and tracks values that you should not trust.
- These are potential security holes through which a hacker
might be able to attack and compromise your program.
- You must validate and untaint values that you wish to trust.
- Principal untainting mechanism: string matching.
- Can also simply replace a tainted value with an untainted one.
What you can't do with tainted data
- Tainting a value keeps you from doing anything with it that
potentially affects the environment outside the program.
- executing commands.
- writing into files.
- Any attempt to use tainted data to affect the outside environment
will result in a fatal error.
- Anything you want to do with the value inside the program is fine.
- For any operation that effects changes in the world outside the program,
Perl 'knows' the potential values that affect it
and you'll have to untaint them before the operation will work.
Examples (from Chapter 23)
$arg = shift (@ARGV); # tainted
$hid = "$arg, 'bar'"; # result is tainted.
$path = $ENV{'PATH'}; # tainted
$mine = 'abc'; # not tainted
system "echo $mine"; # unsafe, $ENV{'PATH'} and others tainted
system "echo $arg"; # unsafe, $arg tainted
system "echo", $arg; # special form ignores tainting of $arg
system "echo $hid"; # unsafe: $hid and $ENV{'PATH'} tainted
$oldpath = $ENV{'PATH'}; # tainted
$ENV{'PATH'} = '/bin:/usr/bin'; # untainted
$newpath = $ENV{'PATH'}; # untainted
delete @ENV(qw(IFS CDPATH ENV BASH_ENV)); # dump other unsafe values
system "echo $mine"; # OK, once $ENV{'PATH'} is set.
system "echo $hid"; # unsafe: command argument is tainted
open (OOF, "< $arg"); # read-only access to tainted filename OK
open (OOF, "> $arg"); # unsafe: write to tainted filename
open (OOF, "echo $arg|"); # unsafe: tainted $arg
$shout = `echo $arg`; # unsafe: $arg is tainted.
$shout2 = `echo $mine`; # safe, but results of `` are tainted!
$shout3 = `echo $shout2`; # unsafe: $shout2 is tainted!
Tainting operations
- How to figure out if a value is tainted:
sub is_tainted {
my $arg = shift; # possibly tainted
# force to string type, preserve tainting
my $nada = substr($arg,0,0); # ''
local $@; # preserve error string for caller
# evaluate line containing tainted value.
# if execution dies, it's tainted.
eval { eval "# $nada" }
return length($@) != 0; # error found
}
- How to untaint a value: match it as a parenthetic expression
in a regular expression, e.g.,
if ($number =~ /^([0-9]+)$/) {
$number = $1; # only now is it untainted
}
or
($key,$value) = $ENV{'SOMETHING'} =~ /([A-Z]+):([0-9]+)/s;
# now $key and $value are untainted
Some marvelous ideas
A not-so-marvelous idea
How tainting evolved:
- People were writing web CGI's without validating their inputs,
with "humorous" results:
use CGI;
my $cgi = new CGI;
my $email = $cgi->param('email');
open (FOO, "| mail $email") or die "can't email $email: $!";
and then calling this with the form input
ihateyou@sucker.com; rm -rf *
This means that the open call becomes:
open (FOO, "| mail ihateyou@sucker.com; rm -rf *")
which has the nasty side-effect of deleting every file to which the
web server has access in the current directory. The solution is to
untaint the entry with a strong pattern match or die:
It's necessary to be very careful to be inclusive of all reasonable email addresses here; any mistake will exclude valid addresses.
if ($email =~ /^([a-zA-Z][-+a-zA-Z0-9._]*@[-+a-zA-Z0-9._]*)$/) {
my $untainted_email = $1;
open (FOO, "| mail $untainted_email") ...
}
It's necessary to be very careful to be inclusive of all reasonable email addresses here; any mistake will exclude valid addresses.
- Basic rule of untainting: express what you want rather than
what you don't want. The former leaves no chance for error; the
latter leaves the intrepid hacker a possible opening. If, e.g.,
you write:
if ($email =~ /^([^;]*)$/) {
my $untainted_email = $1;
open (FOO, "| mail $untainted_email") ...
}
the Joe Hacker can't write
ihateyou@sucker.com ; rm -rf *
but can write:
ihateyou@sucker.com && rm -rf *
with the same effect!
Tainting and RE's
- It is somewhat humorous that the mechanism for declaring trust is one
of the most untrustable programming paradigms in Perl.
- Simple fact: it is really easy to do something other than what you intend
in a regular expression.
- In fact, it may be more of a security problem to use RE's correctly than
to correctly identify tainted values!
Review of RE's
- So far, we have at best a rudimentary understanding of RE's:
-
^ beginning of line
-
$ end of line
-
. any character except \n.
-
[...] one character of a sequence
-
(...) a group of characters.
-
A|B alternation: one or the other.
-
? 0 or 1 of preceding thing.
-
* 0 or more of preceding thing.
-
+ 1 or more of preceding thing.
- And, we know the basics of pattern matching and substitution:
-
m/pattern/ find a pattern.
-
s/pattern/replacement/ replace a pattern with a substitute
-
$var =~ /pattern/ apply pattern to a specific variable, true if matches.
-
$1, $2, ... $i, ... values matching the ith parenthetic grouping
in a pattern.
- All of the above was "inherited" by Perl from vi, ex, and sed.
- The truth: Perl's idea of regular expressions is much more powerful
than any of its predecessors might suspect.
- If we are to invoke appropriate regular expressions with clear intent,
and succeed in really laundering tainted data into appropriate data,
we must use the full power of RE's rather than this relatively limited
subset.
A warning
- so far, we've been using regular expressions that could have appeared in
any tool or language, e.g.,
sed, awk, vi, ex, javascript, etc.
- no longer: in the following, we study extensions that are either
unique to Perl, or in which Perl sets the standard that others follow.
Delimiters
- can use anything to delimit a pattern, e.g., |/|,
|, _
etc.
- can use
{} or <> or () as pattern delimiters.
- can use different delimiters for pattern and replacement, e.g.,
s{foo}<bar> means s/foo/bar/
Documenting patterns
Match Modifiers
- Each pattern match or substitution command can have trailing modifiers
that change its behavior.
-
i case insensitive
-
x ignore whitespace and comments in pattern.
-
s let . match newline as well as any other character.
-
o compile once only (to save processing time).
Replacement Modifiers
-
g globally, i.e., for every instance.
-
cg don't reset match pointer after /g failure.
Patterns starting with \
- first, must escape backslash itself
- second, there are several shorthands for common control characters.
-
\r return
-
\n linefeed
-
\t tab
-
\b backspace
- third, the literal meanings of all magic characters:
-
\. literal .
-
\? literal ?
-
\* literal *
-
\+ literal +
-
\{ literal open brace
-
\} literal close brace
-
\( literal open paren
-
\) literal close paren
Special patterns corresponding to character classes:
-
\w a word character ([a-zA-Z]).
-
\W a non-word character ([^a-zA-Z]).
-
\d a digit ([0-9]).
-
\D a non-digit ([^0-9]).
-
\s whitespace character (space or tab)
-
\S non-whitespace
-
\p{unicode-designation} matches one unicode character (with hundreds of
designations).
The pattern "engine"
- unlike other pattern matchers, Perl exposes its engine to
your control.
- It is possible to match "massless" patterns that assert a particular
state in the engine without matching anything.
- It is also possible to match patterns that cause, e.g., external
expressions to be evaluated!
Basics of engine matching
- Perl's regular expression matching engine parses a pattern into
four kinds of things
- literals that should appear at a specific place in the string.
- assertions that are either true or false. These don't actually match
anything, but instead make a go/no-go decision on whether a match
has been made.
- alternations that define multiple alternatives for a position.
- quantifications that describe the number of times some pattern
should repeat.
- Alternations and quantifications can contain literals, assertions,
and each other.
- Result of parsing is a pattern syntax tree where
- children are contained in parents.
- alternations are children of an alternation parent.
- the single child of a quantification contains the thing to be
quantified.
Matching and backtracking
- Matching is a prolog-like execution in which the engine backtracks
whenever an alternative doesn't pan out.
- Each variant element of the pattern has a current state during matching.
- The state of an alternation is which alternative is being tried, along
with where the last match was made.
- The state of a quantification is "how many instances" are being tried,
along with where the match begins.
- The matcher itself maintains the concept of a "current position"
to which the matcher has matched now, that is between two characters in the target string to be matched.
Basic matching algorithm:
- Start both marching through the string to be matched and through the
match pattern tree (in inorder, in case that matters).
- Every quantification or alternation is a potential point to which the
matcher will backtrack if a match doesn't pan out.
- Whenever you find an alternation, match the first reasonable option
and remember the next alternation option to try.
- Whenever you find a quantifier, try to repeat the match pattern inside the
quantifier as many (or as few) times as possible.
- Whenever you fail to match, or perhaps encounter a false
assertion, backtrack to try the next alternative for the nearest quantifier or alternation.
"Massless" (non-atomic) assertions
- This is where the similarity with other pattern matchers ends.
- In Perl, we also have non-atomic patterns that
do not match physical characters, but instead assert things
about the string at a particular position.
-
\A the current position is at the beginning of a string (^).
-
\Z the current position is at the end of a string ($).
-
\b - the current position is a word boundary (between \w and \W or
between \W and \w).
-
\B the current position is not at a word boundary.
Massless patterns that control pattern reading
-
\Q begin a quoted string in which meta characters have no meta meanings.
-
\E end a quoted string.
-
\L make lowercase till \E.
-
\U make uppercase till \E.
- These patterns are not interpreted by the pattern matching engine.
Massless assertions for lookahead and lookbehind
- Often, we want to assert that something exists in a pattern without
matching it.
-
(?= ... ) locate something forward of current match point
without matching it.
-
(?! ... ) don't locate something forward of current position. This is
an assertion that the contained pattern doesn't match.
-
(?<= ... ) locate something rearward of current match point.
-
(?<! ... ) don't locate something rearward of current match point.
Quantifiers
-
{MIN,MAX} between MIN and MAX occurrences.
-
{COUNT} exactly COUNT times.
-
? 0 or 1 times.
-
* 0 or more times.
-
+ 1 or more times.
- All of these try for the maximal match (i.e., the biggest part of the
string that matches).
- Append a
? to get the minimal match (as few characters as possible,
while retaining a match).
Stupid pattern tricks.
- In an array context, the value of a pattern is an array of all strings it matches.
- If
g modifier is not set, that's one pattern, also stored in $&.
- If
g modifier is set, and there are no ()'s in the pattern,
the value is an array of all matches (that may consist of the same characters). Thus the value of
"this is a perl of wisdom from Perl's perLier" =~ /perl/gi
in an array context is
('perl','Perl', 'perL')
- If there are
()'s in the expression, then successive matches are returned
an array.
"this is a perl of wisdom" =~ /(\w+)\s+(\w+)\s+(\w+)\s+(\w+)/
has the value
('this', 'is', 'a', 'perl')
- Further, all values returned this way are untainted.
Pattern homomorphisms
Two things to remember
- backtracking always backtracks to the rightmost remaining quantifier
or alternation.
- repeated matches (/g) always start after a prior successful match.
They do not start over at the beginning of the string.
lecture
in color
downloaded on Nov-23-2009 04:49:19 PM,
was last modified on Dec-31-1969 07:00:00 PM.
All lecture note content is copyright 2003 by
Alva L. Couch,
Computer Science,
Tufts University
(couch at cs dot tufts dot edu)