1990spe D. Szafron and R. Ng, LexAGen: An Interactive Incremental Scanner Generator, Software Practice and Experience, Vol. 20, No. 5, May 1990, pp. 459-483. abstract or pdf.
Abstract:

This paper describes LexAGen, an interactive scanner generator which is the first component of an interactive compiler generation environment. LexAGen can generate fast scanners for languages whose tokens can be specified by regular grammars. However, LexAGen also supports several context sensitive programming language constructs like nested comments and the interaction between floating point numbers and the range operator in Modula-2. In addition, LexAGen includes a fast new algorithm for keyword identification. However, the most important and novel aspects of LexAGen are that it constructs scanners incrementally and that specifications can be executed anytime for validation testing. LexAGen specifications are expressed and entered interactively in a restricted BNF format (no left recursion). All syntactic errors and token conflicts are detected and reported immediately as LexAGen incrementally constructs a deterministic finite automaton to represent the scanner. At any time, the user can test the scanner fragment which has been entered by supplying text to be scanned. Alternatively, the user can generate a C-code scanner from the automaton. The generated automaton uses a direct execution approach and is quite fast. LexAGen is implemented in Smalltalk-80. Its extensive use of interactive graphics makes it very easy to use. In addition, the object-oriented paradigm of Smalltalk-80 is the basis for the incremental analysis, error detection scheme and an intermediate representation which can be easily modified to generate scanners in other target languages like Pascal, Modula-2 and Ada.