boost::spirit Library Usage Example

Author / Автор: Sergey Satskiy
Publication date / Опубликовано: 23.09.2005
Version / Версия текста: 1.3

Introduction

Tasks of lexical, syntax and semantic analysis come up at the every day programmers' practice quite often. Most often a plain symbols stream plays a role of an input stream. A typical example of a similar task is reading a textual configuration file of some application. That kind of tasks is usually solved with the help of tools like flex and yacc. However there is more elegant solution for C++. The boost::spirit library simplifies the mentioned analysis stages considerably. The result C++ code gets compact and easy to make further changes.

The article describes an example of the library usage. The selected example implies all three mentioned analysis stages and their results are passed to a virtual machine for the execution. The final solution looks simple and suitable for future reuse.

The Task

Let's suppose that there is a system which collects information from many sensors. The collected data should be displayed on a screen. However a value from a speed sensor should be displayed in meters per second for some users but in miles per hour for others. Moreover it is necessary to support various speed sensor models and each of them may provide data measured in different units. In order to solve this task it is possible to put into a textual configuration file a formula which describes a way to calculate a value to be displayed by the collected value. A C++ code will have to have an analogue of a function below:

double Calculate( double  CollectedValue )
{
    double    RequiredValue( 0.0 );
    . . .
        // Calculation of RequiredValue basing on CollectedValue 
        // by a formula from a configuration file
    . . .
    return RequiredValue;
}

The speed calculations must be done every time when the collected value is changed. An interpreter could be built for a formula from the configuration file however its calculation speed would be much slower than if the calculations were made by a virtual machine and a code for the virtual machine was built once - at the time of reading the formula.

The article describes design and implementation of a translator of converting formulas and the corresponding virtual machine which performs calculations.

The Task Formalisation

Let's make the task a bit simpler by omitting reading a converting formula from a file and implement a scheme which is shown on figure 1.


Figure 1. Block scheme of the solution.

Let's develop two executable modules: a translator and a virtual machine.

The translator takes a recalculation formula from the standard input. The result of the translator work is an intermediate code which is printed on the standard output. Recalculation formulas will support major arithmetic operations, brackets, floating point constants, a couple of trigonometric functions and the value keyword. The value keyword means a value of the recalculation function argument. For example, the following formula will be correctly processed by the translator:

value / 1.6 * sin( value + 0.4 )

The virtual machine module takes an input value via a command line argument while a code to be executed is taken from the standard input stream. The result of the virtual machine work is a calculated value which is printed on the standard output.

Real life applications may have both a translator and a virtual machine combined in a single executable module. Here we divide them to simplify the further discussion and to make a potential future reuse easier.

The Translator

It is obvious that a translator and a virtual machine are tightly related to each other. A translator must take into consideration a model which a virtual machine employs as well as the supported set of commands.

A stack oriented virtual machine is one of the easiests. It suits well for executing math calculations. Any math expression could be relatively easy transformed into an equivalent parenthesis-free notation. That notation is very close to the sequence of commands that would perform the required calculation.

It is however not a very good idea to make people use the parenthesis-free notation for formulas in a configuration file. Vast majority of people prefer to use brackets to specify operations priorities and to put arguments of two-place operations at the left and right of the operation symbol. If users were required to use parenthesis-free notation that would lead to many mistakes. So it is a good idea to analyse formulas written in a user friendly form in the translator.

Before we start to implement the translator it is necessary to discuss a set of commands supported by the virtual machine and the method of storing the commands. The commands could be encoded as integers however to make debugging easier we will use a string representation of the commands. As for the set of commands we will need a command to put a double constant on the top of the stack, the commands for two-place math operations, the unary minus command and the commands for trigonometric functions. Operations which require arguments will take them from the top of the stack and the result of an operation will be put back on the top of the stack.

The table below describes some of the required commands. All the rest behave similar.

Command Description
PUSH <double constant> Puts the given double constant at the top of the stack.
VALUE Puts at the top of the stack a value which is passed as an argument of the calculation function.
ADD Takes two arguments from the top of the stack, performs adding and puts the result at the top of the stack.
NEGATE Takes one argument from the top of the stack, flips its sign and puts the result back at the top of the stack.
SIN Takes one argument from the top of the stack, performs calculation of the sinus function with the taken argument and puts the result back at the top of the stack.

For example, the following sequence of the commands:

PUSH    154.0
PUSH    16.3
ADD

will perform adding 154 + 16.3 and the result - 170.3 - will be put at the top of the stack. So the result of a virtual machine work will always be a value at the top of the stack and the stack deep at the end of calculations must be equal to one.

Now it is necessary to discuss in details the allowed syntax to write recalculation formulas. To define the syntax it is convenient to use context-free grammar (Backus-Naur form). The boost::spirit library allows writing the Backus-Naur form in C++ almost directly. For example, the syntax graphs which describe allowed recalculation formulas will look as follows:






Figure 2. Syntax graphs which describe allowed recalculation formulas.

The equivalent C++ code which uses the boost::spirit library will look as follows (the scope resolution operator boost::spirit:: is omitted):

    . . .
    Function = 
        ( str_p( "sin" ) >> '(' >> Expression >> ')' )  |
        ( str_p( "cos" ) >> '(' >> Expression >> ')' );

    Double =
        lexeme_d[ (+ureal_p) ]       |
        str_p( "value" );

    Factor =
        Double | Function | '(' >> Expression >> ')'    |   
        ('-' >> Factor)                                 |
        ('+' >> Factor);

    Term =
        Factor
        >> *(   ('*' >> Factor)
            |   ('/' >> Factor)
            );

    Expression =
        Term
        >> *(   ('+' >> Term)
            |   ('-' >> Term)
            );
    . . .

Here str_p( . . . ) and ureal_p are parsers and lexeme_d is a directive provided by the library (see [1]). Operators *, | and >> are overloaded by the library in a way to make the final syntax as close as possible to the Backus - Naur form.

The next step is to bind semantic actions to the recognized syntax elements. boost::spirit supports operator[] for those actions. For example, the PushOperation functor binding for trigonometric recognized functions could be written as follows:

    . . .
    Function = 
        ( str_p( "sin" ) >> '(' >> Expression >> ')' )
            [ PushOperation( "SIN", Self.Code ) ]      |
        ( str_p( "cos" ) >> '(' >> Expression >> ')' )
            [ PushOperation( "COS", Self.Code ) ];
    . . .

The PushOperation functor is implemented as follows:

struct PushOperation
{
    PushOperation( const string &  cop, vector< string > &  code ) :
        COP( cop ), Code( code )
    {}
    
    void operator() ( char const *  Unused, char const *  End ) const
    {
        Code.push_back( COP );
    }
    
    string              COP;    //!< Code of OPeration
    vector< string > &  Code;   //!< Reference to the code version
};

The functor adds a virtual machine command to the code vector.

The whole code vector content will be printed at the standard output at the end of the translator work. A complete grammar for all the required syntax with all the semantic actions could be written similar.

The Virtual Machine

Implementation of the virtual machine is very simple. As soon as the code is read from the standard input stream all that should be done is to execute all the commands.

To make a potential code reuse easy it is convenient to develop a class which stores a code vector and provides an interface to perform calculations when it is required. The class also stores a data structure to support a stack.

class SCalculator
{
    public:
        SCalculator( const vector< string > &  code ) :
            Code( code )
        {
                // The stack size cannot be larger than the code size
            Stack.reserve( Code.size() );
        }
        
        double Calculate( const double &  Value ) 
        {
            . . .
        }

    private:
        vector< string >    Code;       // Code storage
        vector< double >    Stack;      // Working space - stack of doubles
};

The Calculate() method must clean the stack first to make multiple calls safe and then execute all the commands from the code vector:

double SCalculator::Calculate( const double &  Value ) 
{
        // Clear the stack for reuse
    Stack.clear();
            
        // Walk the whole code
    for ( vector< string >::const_iterator  k( Code.begin() );
          k != Code.end(); ++k )
    {
        if ( *k == "ADD" )
        {
            double      Temp( Stack.back() );

            Stack.pop_back();
            Stack[ Stack.size() - 1 ] = Stack.back() + Temp;

            continue;
        }

        if ( *k == "SUB" )

        . . .

    }

    return Stack.back();
}

The other commands are implemented similar.

Source Code

The full source codes of the translator and the virtual machine which are developed in the article could be found at the author's web site (http://satsky.spb.ru). A generated doxygen documentation is also available at the web site.

Conclusion

The boost::spirit library allows working with any kind of input streams. It could be used for parsing arbitrary binary streams as well.

Self Study

A reader who is interested in playing with the virtual machine can make it available via a network. It is easy to do using standard UNIX environment because the developed virtual machine works with the standard input and output streams. It is necessary to create a new user specifying the virtual machine executable as a shell. Now it is possible to connect to the computer via telnet or ssh and pass authentication and authorization as a newly created user. The console will be a standard input and output of the virtual machine. As soon as the command input is finished the machine will print the calculation result.

References

  1. boost::spirit library documentation (http://www.boost.org)
  2. Aho A., Sethi R., Compilers. Principles, Techniques, and Tools. Addison Wesley, 1985
  3. Sebesta R. Concepts of Programming Languages. Addison Wesley, 2002

Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

Разрешается копирование и распространение этой статьи любым способом без внесения изменений, при условии, что это разрешение сохраняется.
Last Updated: December 8, 2005