You can type in any calculator:

1000 * -1

and get

-1000 as result.

But what if you type:

** **
**one thousand times minus one
**
You can type any expression as this in google:

I got this programming challenge one year ago and I got immediately trapped to figure out how to tackle this problem. I'll explain in this blog the approach I followed to create the program to calculate with words instead numbers.

By the way, I developed this one year ago, I was asked not to publish any code, but I think that after one year is fair enough as I provided full source code and I think as well it is good to share this code in the internet.

At the end the program is text based and you can use it as the following console shows it:

Compare against the google site:

The result is equivalent, with the exception that google translate the numeric amount -30900 to the equivalent English sentence. By the way to translate from a number to an English sentence it's much more more easy and the required programming techniques are totally different.

In fact the reason this programming challenge was interesting to me is that I realize I could solve this problem using Compiler techniques. It was easy to know that if you want to program a typical calculator program you must use parsing techniques. But what if the calculator is using words instead of numbers? Well, it happens to be the same approach. This was not clear for me at first until I could get a valid grammar to start to solve this puzzle.

To make this more interesant, I couldn't find a solution in the internet. I did find some links with different approaches to solve this challenge, however, no implementation was at all right. For example this article was very useful:

http://blog.cordiner.net/2010/01/02/parsing-english-numbers-with-perl/
I remember I found several flaws one year ago, today I don't remember exactly all of them, but playng with this implementation fails miserably when I try to parse the 80 number. By the way, the perl program explained there only converts isolated amounts:

I reviewed more papers and all of them had flaws, so I started to create my own calculator. As I forgot everything from my compiler classes, I had to review Compilers Theory, specifically the parsing aspect of a language.
Parsing and evaluating a language are the basis for compiler construction, but this is a complex subject, and I only wanted to touch basic principles.
The dragon book is the reference to understand the theory and practice of compiler construction, but for this endeavor I only studied parsing techniques.

My first contact with a practical example was in the book The Unix Operating System by Kernigan, here I could check how to create a calculator in the C Programming Language. Please don't be foolish, as at first instance it is possible assume in a naive way that it is possible to try to read the expression and do the math in some way, however this approach won't work as the same parser
must also evaluate more complex expressions like:
( ( ( ( 3 * 2 ) + 4 ) * - ( 2 * 2) ) * - 1).
Just check the Unix bc calculator program to evaluate the last expression.

To feel more comfortable with this problem I started to learn by using the flex program, which is a tool for generating of "scanners". (Check info flex to get an essential intro to the tool).
Repeating text from info flex, this match a set of specific patterns and execute the corresponding action.
Our input file for lex is something like this:

This is a basic file which recognize a specific pattern and returns an integer identifier. For example reading the input 2 + 2 will return the identifiers NUMBER PLUS NUMBER.
The include header file calc.tab.h in this sample was generated by bison, but for the sample given now you can create that file containing only C language #define statements.

Now the complete lex file (calc.l) for the calculator: (Please, note that the first approach was to understand how to create this calculator using lex/yacc tools, now replaced by flex/bison in this tutorial)

%{
#include "calc.tab.h"
%}
%%
"+" { return ADD; }
"plus" { return ADD; }
"-" { return SUB; }
"minus" { return SUB; }
"*" { return MUL; }
"times" { return MUL; }
"/" { return DIV; }
"div" { return DIV; }
"(" { return OP; }
")" { return CP; }
"|" { return ABS; }
"abs" { return ABS; }
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
"zero" { yylval = 0; return ZERO; }
"one" { yylval = 1; return ONE; }
"two" { yylval = 2; return TWO; }
"three" { yylval = 3; return THREE; }
"four" { yylval = 4; return FOUR; }
"five" { yylval = 5; return FIVE; }
"six" { yylval = 6; return SIX; }
"seven" { yylval = 7; return SEVEN; }
"eight" { yylval = 8; return EIGHT; }
"nine" { yylval = 9; return NINE; }
"ten" { yylval = 10; return TEN; }
"eleven" { yylval = 11; return ELEVEN; }
"twelve" { yylval = 12; return TWELVE; }
"thirteen" { yylval = 13; return THIRTEEN; }
"fourteen" { yylval = 14; return FOURTEEN; }
"fifteen" { yylval = 15; return FIFTEEN; }
"sixteen" { yylval = 16; return SIXTEEN; }
"seventeen" { yylval = 17; return SEVENTEEN; }
"eighteen" { yylval = 18; return EIGHTEEN; }
"nineteen" { yylval = 19; return NINETEEN; }
"twenty" { yylval = 20; return TWENTY; }
"thirty" { yylval = 30; return THIRTY; }
"forty" { yylval = 40; return FORTY; }
"fifty" { yylval = 50; return FIFTY; }
"sixty" { yylval = 60; return SIXTY; }
"seventy" { yylval = 70; return SEVENTY; }
"eighty" { yylval = 80; return EIGHTY; }
"ninety" { yylval = 90; return NINETY; }
"hundred" { yylval = 100; return HUNDRED; }
"thousand" { yylval = 1000; return THOUSAND; }
"million" { yylval = 1000000; return MILLION; }
\n { return EOL; }
[ \t] { /* ignore whitespace */ }
. { printf("Mystery character %c\n", *yytext); }
%%

You have to understand the scope of flex and the previous input file. I am not mentioning the structure of the flex input file, for that, you have to read info flex. Flex will generate c code with yylex() function, create a program invoking yylex() and link with libfl.so library and the magic starts. I recommend the excellent book:

http://www.amazon.com/flex-bison-ebook/dp/B0043D2DYQ/ref=sr_1_1?s=books&ie=UTF8&qid=1335844993&sr=1-1
The challenging part was to create bison input file (calc.y). I won't go into the details but my inspiration sources were the Brian Kernighan/D. Ritchie book and the John Levine's Book.

%{
#include
#include
int total=0;
int value=0;
%}
/* BISON Declarations */
%token NUMBER
%token OP
%token CP
%token ADD
%token SUB
%token MUL
%token DIV
%token ABS
%token EOL
%token ZERO
%token HUNDRED
%token THOUSAND
%token MILLION
%token ONE
%token TWO
%token THREE
%token FOUR
%token FIVE
%token SIX
%token SEVEN
%token EIGHT
%token NINE
%token TEN
%token ELEVEN
%token TWELVE
%token THIRTEEN
%token FOURTEEN
%token FIFTEEN
%token SIXTEEN
%token SEVENTEEN
%token EIGHTEEN
%token NINETEEN
%token TWENTY
%token THIRTY
%token FORTY
%token FIFTY
%token SIXTY
%token SEVENTY
%token EIGHTY
%token NINETY
/* Grammar follows */
%%
input: /* empty string */
| input line
;
line: expr EOL { printf("\t%d\n",$1); }
;
expr: expr ADD term { $$ = $1 + $3; }
| expr SUB term { $$ = $1 - $3; }
| term
;
term: term MUL factor { $$ = $1 * $3; }
| term DIV factor { $$ = $1 / $3; }
| factor
;
factor: NUMBER
| numeral
| OP expr CP { $$ = $2; }
;
numeral: n3million
| n3thousand
| n3
| ZERO
;
n3million: n3 MILLION { $$ = getValue2($1,$2); }
| n3 MILLION n3thousand { $$ = getValue3($1,$2,$3); }
| n3 MILLION n3 { $$ = getValue3($1,$2,$3); }
;
n3thousand: n3 THOUSAND { $$ = getValue2($1,$2); }
| n3 THOUSAND n3 { $$ = getValue3($1,$2,$3); }
;
n3: n1hundred
| unit { $$ = getValue1($1); }
| ten { $$ = getValue1($1); }
| mten { $$ = getValue1($1); }
| mten unit { $$ = getValue2($1,$2); }
;
n1hundred: unit HUNDRED { $$ = getValue2($1,$2); }
| unit HUNDRED unit { $$ = getValue3($1,$2,$3); }
| unit HUNDRED ten { $$ = getValue3($1,$2,$3); }
| unit HUNDRED mten { $$ = getValue3($1,$2,$3); }
| unit HUNDRED mten unit { $$ = getValue4($1,$2,$3,$4); }
;
unit: ONE
| TWO
| THREE
| FOUR
| FIVE
| SIX
| SEVEN
| EIGHT
| NINE
;
ten: TEN
| ELEVEN
| TWELVE
| THIRTEEN
| FOURTEEN
| FIFTEEN
| SIXTEEN
| SEVENTEEN
| EIGHTEEN
| NINETEEN
;
mten: TWENTY
| THIRTY
| FORTY
| FIFTY
| SIXTY
| SEVENTY
| EIGHTY
| NINETY;
%%
yydebug=1;
/* XXX: TODO
use va_listA
int getValue(int n, ...) {
int value=0;
int i=n;
va_list lparam;
va_start(lparam, n);
}
*/
int getValue1(int n1){
int value=0;
value=setValue(0,n1);
return value;
}
int getValue2(int n1, int n2){
int value=0;
value=setValue(0,n1);
value=setValue(value,n2);
return value;
}
int getValue3(int n1, int n2, int n3){
int value=0;
value=setValue(0,n1);
value=setValue(value,n2);
value=setValue(value,n3);
return value;
}
int getValue4(int n1, int n2, int n3,int n4){
int value=0;
value=setValue(0,n1);
value=setValue(value,n2);
value=setValue(value,n3);
value=setValue(value,n4);
return value;
}
int setValue(int value, int n){
if (value==0)
value=n;
else if (value>n)
value+=n;
else
value*=n;
total += value;
return value;
}
main(int argc, char **argv)
{
yyparse();
}
yyerror(char *s)
{
fprintf(stderr, "error: %s\n", s);
}

This is difficult to explain in a blog, but the recommendation is to study from the internet. You can try the code provided here using this Makefile:

calc: calc.l calc.y
bison -d calc.y
flex calc.l
gcc -o $@ $@.tab.c lex.yy.c -lfl; mv calc calculator
clean:
rm calculator calc.tab.h calc.tab.c lex.yy.c

The previous program has some flaws and its not 100% correct. But it did give an idea about how to build the calculator using now a LL approach. I hope to have time to post the approach to build this program in C++. By the way, I didn't mention any formal aspect of this challenge. Starting with BNF Grammars, LALR or LL parsing, Ambiguos Grammars and ASTs. I promise in the next article to go into more internal details.