Example

Below is an example of a simple grammar that is able to parse strings of integers separated by any amount of white space and a + symbol.

grammar Addition
  rule additive
    number plus (additive | number)
  end

  rule number
    [0-9]+ space
  end

  rule plus
    '+' space
  end

  rule space
    [ \t]*
  end
end

Several things to note about the above example:

  • Grammar and rule declarations end with the end keyword
  • A sequence of rules is created by separating expressions with a space
  • Likewise, ordered choice is represented with a vertical bar
  • Parentheses may be used to override the natural binding order
  • Rules may refer to other rules in their own definitions simply by using the other rule’s name
  • Any expression may be followed by a quantifier

Interpretation

The grammar above is able to parse simple mathematical expressions such as “1+2” and “1 + 2+3”, but it does not have enough semantic information to be able to actually interpret these expressions.

At this point, when the grammar parses a string it generates a tree of Match objects. Each match is created by a rule and may itself be comprised of any number of submatches.

Submatches are created whenever a rule contains another rule. For example, in the grammar above number matches a string of digits followed by white space. Thus, a match generated by this rule will contain two submatches.

We can define a method inside a set of curly braces that will be used to extend a particular rule’s matches. This works in similar fashion to using Ruby’s blocks. Let’s extend the Addition grammar using this technique.

grammar Addition
  rule additive
    (number plus term:(additive | number)) {
      number.value + term.value
    }
  end

  rule number
    ([0-9]+ space) {
      to_i
    }
  end

  rule plus
    '+' space
  end

  rule space
    [ \t]*
  end
end

In this version of the grammar we have added two semantic blocks, one each for the additive and number rules. These blocks contain code that we can execute by calling value on match objects that result from those rules. It’s easiest to explain what is going on here by starting with the lowest level block, which is defined within number.

Inside this block we see a call to another method, namely to_i. When called in the context of a match object, methods that are not defined may be called on a match’s internal string object via method_missing. Thus, the call to to_i should return the integer value of the match.

Similarly, matches created by additive will also have a value method. Notice the use of the term label within the rule definition. This label allows the match that is created by the choice between additive and number to be retrieved using the term method. The value of an additive match is determined to be the values of its number and term matches added together using Ruby’s addition operator.

Since additive is the first rule defined in the grammar, any match that results from parsing a string with this grammar will have a value method that can be used to recursively calculate the collective value of the entire match tree.

To give it a try, save the code for the Addition grammar in a file called addition.citrus. Next, assuming you have the Citrus gem installed, try the following sequence of commands in a terminal.

$ irb
> require 'citrus'
 => true
> Citrus.load 'addition'
 => [Addition]
> m = Addition.parse '1 + 2 + 3'
 => #<Citrus::Match ...
> m.value
 => 6

Congratulations! You just ran your first piece of Citrus code.

One interesting thing to notice about the above sequence of commands is the return value of Citrus#load. When you use Citrus.load to load a grammar file (and likewise Citrus#eval to evaluate a raw string of grammar code), the return value is an array of all the grammars present in that file.

Take a look at examples/calc.citrus for an example of a calculator that is able to parse and evaluate more complex mathematical expressions.

Additional Methods

If you need more than just a value method on your match object, you can attach additional methods as well. There are two ways to do this. The first lets you define additional methods inline in your semantic block. This block will be used to create a new Module using Module#new. Using the Addition example above, we might refactor the additive rule to look like this:

rule additive
  (number plus term:(additive | number)) {
    def lhs
      number.value
    end

    def rhs
      term.value
    end

    def value
      lhs + rhs
    end
  }
end

Now, in addition to having a value method, matches that result from the additive rule will have a lhs and a rhs method as well. Although not particularly useful in this example, this technique can be useful when unit testing more complex rules. For example, using this method you might make the following assertions in a unit test:

match = Addition.parse('1 + 4')
assert_equal(1, match.lhs)
assert_equal(4, match.rhs)
assert_equal(5, match.value)

If you would like to abstract away the code in a semantic block, simply create a separate Ruby module (in another file) that contains the extension methods you want and use the angle bracket notation to indicate that a rule should use that module when extending matches.

To demonstrate this method with the above example, in a Ruby file you would define the following module.

module Additive
  def lhs
    number.value
  end

  def rhs
    term.value
  end

  def value
    lhs + rhs
  end
end

Then, in your Citrus grammar file the rule definition would look like this:

  rule additive
    (number plus term:(additive | number)) <Additive>
  end

This method of defining extensions can help keep your grammar files cleaner. However, you do need to make sure that your extension modules are already loaded before using Citrus.load to load your grammar file.