The most straightforward way to compose a Citrus grammar is to use Citrus' own custom grammar syntax. This syntax borrows heavily from Ruby, so it should already be familiar to Ruby programmers.
Terminals may be represented by a string or a regular expression. Both follow the same rules as Ruby string and regular expression literals.
'abc' # match "abc"
"abc\n" # match "abc\n"
/abc/i # match "abc" in any case
/\xFF/ # match "\xFF"
Character classes and the dot (match anything) symbol are supported as well for compatibility with other parsing expression implementations.
[a-z0-9] # match any lowercase letter or digit
[\x00-\xFF] # match any octet
. # match any single character, including new lines
Also, strings may use backticks instead of quotes to indicate that they should match in a case-insensitive manner.
`abc` # match "abc" in any case
Besides case sensitivity, case-insensitive strings have the same behavior as double quoted strings.
See Terminal and StringTerminal for more information.
Quantifiers may be used after any expression to specify a number of times it
must match. The universal form of a quantifier is N*M
where N
is the minimum
and M
is the maximum number of times the expression may match.
'abc'1*2 # match "abc" a minimum of one, maximum of two times
'abc'1* # match "abc" at least once
'abc'*2 # match "abc" a maximum of twice
Additionally, the minimum and maximum may be omitted entirely to specify that an expression may match zero or more times.
'abc'* # match "abc" zero or more times
The +
and ?
operators are supported as well for the common cases of 1*
and
*1
respectively.
'abc'+ # match "abc" one or more times
'abc'? # match "abc" zero or one time
See Repeat for more information.
Both positive and negative lookahead are supported in Citrus. Use the &
and
!
operators to indicate that an expression either should or should not match.
In neither case is any input consumed.
'a' &'b' # match an "a" that is followed by a "b"
'a' !'b' # match an "a" that is not followed by a "b"
!'a' . # match any character except for "a"
A special form of lookahead is also supported which will match any character that does not match a given expression.
~'a' # match all characters until an "a"
~/xyz/ # match all characters until /xyz/ matches
When using this operator (the tilde), at least one character must be consumed for the rule to succeed.
See AndPredicate, NotPredicate, and ButPredicate for more information.
Sequences of expressions may be separated by a space to indicate that the rules should match in that order.
'a' 'b' 'c' # match "a", then "b", then "c"
'a' [0-9] # match "a", then a numeric digit
See Sequence for more information.
Ordered choice is indicated by a vertical bar that separates two expressions. When using choice, each expression is tried in order. When one matches, the rule returns the match immediately without trying the remaining rules.
'a' | 'b' # match "a" or "b"
'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
It is important to note when using ordered choice that any operator binds more tightly than the vertical bar. A full chart of operators and their respective levels of precedence is below.
See Choice for more information.
Match objects may be referred to by a different name than the rule that originally generated them. Labels are added by placing the label and a colon immediately preceding any expression.
chars:/[a-z]+/ # the characters matched by the regular expression
# may be referred to as "chars" in an extension
# method
Extensions may be specified using either “module” or “block” syntax. When using module syntax, specify the name of a module that is used to extend match objects in between less than and greater than symbols.
[a-z0-9]5*9 <CouponCode> # match a string that consists of any lower
# cased letter or digit between 5 and 9
# times and extend the match with the
# CouponCode module
Additionally, extensions may be specified inline using curly braces. When using
this method, the code inside the curly braces may be invoked by calling the
value
method on the match object.
[0-9] { to_i } # match any digit and return its integer value when
# calling the #value method on the match object
Note that when using the inline block method you may also specify arguments in between vertical bars immediately following the opening curly brace, just like in Ruby blocks.
When including a grammar inside another, all rules in the child that have the
same name as a rule in the parent also have access to the super
keyword to
invoke the parent rule.
grammar Number
rule number
[0-9]+
end
end
grammar FloatingPoint
include Number
rule number
super ('.' super)?
end
end
In the example above, the FloatingPoint
grammar includes Number
. Both have a
rule named number
, so FloatingPoint#number
has access to Number#number
by
means of using super
.
See Super for more information.
The following table contains a list of all Citrus symbols and operators and their precedence. A higher precedence indicates tighter binding.
Operator | Name | Precedence |
---|---|---|
'' |
String (single quoted) | 7 |
"" |
String (double quoted) | 7 |
`` |
String (case insensitive) | 7 |
[] |
Character class | 7 |
. |
Dot (any character) | 7 |
// |
Regular expression | 7 |
() |
Grouping | 7 |
* |
Repetition (arbitrary) | 6 |
+ |
Repetition (one or more) | 6 |
? |
Repetition (zero or one) | 6 |
& |
And predicate | 5 |
! |
Not predicate | 5 |
~ |
But predicate | 5 |
<> |
Extension (module name) | 4 |
{} |
Extension (literal) | 4 |
: |
Label | 3 |
e1 e2 |
Sequence | 2 |
e1 | e2 |
Ordered choice | 1 |
As is common in many programming languages, parentheses may be used to override
the normal binding order of operators. In the following example parentheses are
used to make the vertical bar between 'b'
and 'c'
bind tighter than the
space between 'a'
and 'b'
.
'a' ('b' | 'c') # match "a", then "b" or "c"
Copyright © 2015 Michael Jackson