Next: 6.3 Mathematical Functions Up: 6 Predefined Functions and Previous: 6.1 Predefined Functions

6.2 String Manipulation

This library provides generic functions for string manipulation, such as finding and extracting substrings and pattern matching. When indexing a string, the first character has position 1. See Page  for an explanation about patterns, and Section 8.3 for some examples on string manipulation in Lua.

strfind (str, pattern [, init [, plain]])

This function looks for the first match of pattern in str. If it finds one, it returns the indexes on str where this occurence starts and ends; otherwise, it returns nil. If the pattern specifies captures, the captured strings are returned as extra results. A third optional numerical argument specifies where to start the search; its default value is 1. A value of 1 as a forth optional argument turns off the pattern matching facilities, so the function does a plain ``find substring'' operation.

strlen (s)

Receives a string and returns its length.

strsub (s, i [, j])

Returns another string, which is a substring of s, starting at i and runing until j. If j is absent, it is assumed to be equal to the length of s. In particular, the call strsub(s,1,j) returns a prefix of s with length j, whereas the call strsub(s,i) returns a suffix of s, starting at i.

strlower (s)

Receives a string and returns a copy of that string with all upper case letters changed to lower case. All other characters are left unchanged.

strupper (s)

Receives a string and returns a copy of that string with all lower case letters changed to upper case. All other characters are left unchanged.

strrep (s, n)

Returns a string which is the concatenation of n copies of the string s.

ascii (s [, i])

Returns the ascii code of the character s[i]. If i is absent, then it is assumed to be 1.

format (formatstring, e1, e2, ...)

This function returns a formated version of its variable number of arguments following the description given in its first argument (which must be a string). The format string follows the same rules as the printf family of standard C functions. The only differences are that the options/modifiers *, l, L, n, p, and h are not supported, and there is an extra option, q. This option formats a string in a form suitable to be safely read back by the Lua interpreter; that is, the string is written between double quotes, and all double quotes, returns and backslashes in the string are correctly escaped when written. For instance, the call
format('%q', 'a string with "quotes" and \n new line')
will produce the string:
"a string with \"quotes\" and \
 new line"

The options c, d, E, e, f, g i, o, u, X, and x all expect a number as argument, whereas q and s expect a string.

gsub (s, pat, repl [, n])

Returns a copy of s, where all occurrences of the pattern pat have been replaced by a replacement string specified by repl. This function also returns, as a second value, the total number of substitutions made.

If repl is a string, its value is used for replacement. Any sequence in repl of the form %n with n between 1 and 9 stands for the value of the n-th captured substring.

If repl is a function, this function is called every time a match occurs, with all captured substrings as parameters. If the value returned by this function is a string, it is used as the replacement string; otherwise, the replacement string is the empty string.

An optional parameter n limits the maximum number of substitutions to occur. For instance, when n is 1 only the first occurrence of pat is replaced.

As an example, in the following expression each occurrence of the form $name$ calls the function getenv, passing name as argument (because only this part of the pattern is captured). The value returned by getenv will replace the pattern. Therefore, the whole expression:

  gsub("home = $HOME$, user = $USER$", "$(%w%w*)$", getenv)
may return the string:
home = /home/roberto, user = roberto

Patterns

Character Class:

a character class is used to represent a set of characters. The following combinations are allowed in describing a character class:

x
(where x is any character not in the list ()%.[*?) -- represents the character x itself.
.
-- represents all characters.
%a
-- represents all letters.
%A
-- represents all non letter characters.
%d
-- represents all digits.
%D
-- represents all non digits.
%l
-- represents all lower case letters.
%L
-- represents all non lower case letter characters.
%s
-- represents all space characters.
%S
-- represents all non space characters.
%u
-- represents all upper case letters.
%U
-- represents all non upper case letter characters.
%w
-- represents all alphanumeric characters.
%W
-- represents all non alphanumeric characters.
%x
(where x is any non alphanumeric character) -- represents the character x.
[char-set]
-- Represents the class which is the union of all characters in char-set. To include a ] in char-set, it must be the first character. A range of characters may be specified by separating the end characters of the range with a -; e.g., A-Z specifies the upper case characters. If - appears as the first or last character of char-set, then it represents itself. All classes %x described above can also be used as components in a char-set. All other characters in char-set represent themselves.
[^ char-set]
-- represents the complement of char-set, where char-set is interpreted as above.

Pattern Item:

a pattern item may be a single character class, or a character class followed by * or by ?. A single character class matches any single character in the class. A character class followed by * matches 0 or more repetitions of characters in the class. A character class followed by ? matches 0 or one occurrence of a character in the class. A pattern item may also has the form %n, for n between 1 and 9; such item matches a sub-string equal to the n-th captured string.

Pattern:

a pattern is a sequence of pattern items. Any repetition item (*) inside a pattern will always match the longest possible sequence. A ^ at the beginning of a pattern anchors the match at the beginning of the subject string. A $ at the end of a pattern anchors the match at the end of the subject string.

A pattern may contain sub-patterns enclosed in parentheses, that describe captures. When a match succeeds, the sub-strings of the subject string that match captures are captured for future use. Captures are numbered according to their left parentheses.


Next: 6.3 Mathematical Functions Up: 6 Predefined Functions and Previous: 6.1 Predefined Functions