The reparser is a program that rapidly reparses a given text after small changes have been made in it. The idea is that most of the text has not changed, and that as a result it can simply keep most of the old parse around without modification. It is intended as a component of an editor that provided continuous feedback to the user as to syntactic problems with the text being edited.
As such, the reparser has to be quite tolerant of syntax problems; it has to degrade gracefully in the face of sinvalid syntax without far-fetched or confusing error recoveries. To accomodate this, it has a policy of retaining the previous syntactic structure when in doubt.
In fact, it accepts editing actions such as insertion or deoetion of a character by simply modifying the parse tree. Usually this leads to an invalid parse tree.
For example, if the program uses a production that parser "ab" as an identifier. Inserting a plus sign between the a and the b will immediately reult in a parse where "a+b" is parsed as an identifier. This is an incorrect parse, of course. When the plus sign is inserted, all the parse tree branches that now include the plus sign are marked as having been changed. The reparser walks the parse tree, retaining all branches that cannot have been affected by the changes. Other branches get incrementally unparsed, and reparsed, using the same criteria. If the reparse fails because the text is syntactically incorrect, the original parse tree is retained, and marked as erroneous. Even though erroneous, it will be reparsed only if a subsequent change can possibly affect the parse.
An editor that uses this reparser can have its own policy as to how to report errors to the user. That's not the conern of the reparser.
The actual algorithm is a few steps removed from the usual LR parsing technique.
First, it retains nonterminal transitions in the parsing tables. The remaining input is treated as a stack, with the next input character at the top. When a reduction is performed, the nonterminal is pushed on the input stack, and a number off items is popped on the parsing stack corresponding to the size of the production. THen the parser has a new state (that on the parse stack) and it proceeds to read the nonterminal from the input stack as a normal read transition.
Second, the parser usually processes an existing parse tree, not a sequence of terminal symbols. Initially, the old parse tree is placed onto the input stack. Then, wheneever the next piece of input is a branch that cannot be affected by the edits, and the parser is in a state that can read it, the entire branch is just read. If it may be affected, or cannot be read from the current state, it is unparsed and ther resulting twigs pushed back into the input stack. If it's a terminal that can't be read, it's a syntactically incorrect, and recovery action is taken. Recovery consists of backing through the parse stack to find a place where what was the next input symbol at that point in the parse was readable. If so, the input stack is returned to the state it had when that point in the parse stack had been reached originally, and read transitions are followed untli that parser is past the point where the error occurred. In this way it retains parts of the existing (probably invalid) parse tree.
This requires the parser to make copies of the input stack while parsing. We place links to it on tha parse stack, so that the apppropriate input is right there when backing up to a state during error recovery.
By using a functional programming style of data structure (rarely modifying anything) it is possible to store all of this in very little space: most of these data will be shared.
To determine whether a parse branch might possibly be affected by a change in input, each branch contains an integer -- the distance (counted in terminal characters) from the start of the source text it summarizes to the last character that was consulted (even in lookahead) when making the decision to perform the reduction. We also remember the state that was on the parse stack when the nonterminal was accepted onto the parse stack. If there was an edit anywhere in that span of terminal symbols, and further the reparser is in the same state as it was before when it previously decided to read the nonterminal from the input stack, the old parse branch will still be valid.
What hasn't been done yet is suppressing so-called useless renaming productions from the parse tree, and integrating an attribute evaluator with the parse tree maintenance. I'd love to experiment someday with that someday and see if it can handle type errors and the like in ordinary programs, and in something like constructive type theory.