Commentary by Chris Pressey

This work is distributed under a CC-BY-ND-4.0 license, with the following explicit exception: the ratings may be freely used for any purpose with no limitations.

Parsing

Parsing as Deduction

rating: 2

See also: [Definite clause grammar][], [Chart parser][], [Earley Deduction][].

They say something interesting about the "proof procedure" in Prolog:

it has the same theoretical problems of the top-down backtrack parsing algorithms after which it is modeled.

Are they really saying Horn clause resolution was modelled after parsing procedures?

They note that Earley Deduction is a natural generalization of Earley parsing but that it's challenging to make it as efficient as Prolog's "proof procedure". In particular,

It is very costly to implement subsumption in its full generality.

When they say "definite clauses are a simple subset of first-order logic", how do they mean "subset"? Definite clauses (Horn clauses) may be syntactically a subset of what we usually think of as the language of first order logic, but they have the same expressive power. So, they must mean syntax.

A parser based on definite clauses is a subset in terms of expressive power because one of the arguments is monotonic -- the input is always consumed -- thus the deduction always terminates.

There is a good short description of chart parsing on page 3.

Definite clause grammar is to CFG as Earley deduction is to Earley parsing. (Is it? The point is that the parsing-function is a sub-function of the deduction-function.)

On complexity, they use the phrase "offline-parseable", which a grammar can have if it is not "infinitely ambiguous". I'm not sure what that means. But if a DCG is offline-parseable then it is decidable. This surely corresponds to (a) non-infinite amounts of ambiguity and (b) the input is finite and always consumed.

They also note that unrestricted DCGs are Turing-complete and LFGs (lexical-functional grammars, a unification-based formalism) can encode "exponentially hard" languages (probably they mean EXPTIME-complete, or something similar.)

Principles and Implementation of Deductive Parsing

rating: 1

The authors present natural deduction-based logical systems that represent various parsing algorithms for context-free grammars (CYK, recursive descent, LR parsing, Earley's algorithm) and then some more sophisticated grammars -- Augmented Phrase-Structure Grammar (which is unification-based), Combinatory Categorial Grammars (which is Lambek-ish), Tree Adjoining Grammar (which is just weird).

They present these to buttress their claim that parsing algorithms can be represented directly as deduction systems.

These presentations could be used to form simple (but inefficient) implementations of these parsing algorithms in Prolog. At any rate, it elucidates the algorithms to some degree.

Section 4.4 I also found interesting, but for different reasons. It's about the sequent calculus. They say their preceding presentations based on natural deduction can be implemented by bottom-up execution. Then they note Lambek calculus is presented as a sequent calculus system. Then they note that it is difficult to apply their techniques to sequent calculus systems, because "computationally they are designed to be used in a top-down direction".

I would have thought both ND and SC were relational systems, and that "executing" them in one direction or the other would always necessitate search of some sort to deal with the nondeterminism. It's surprising to see them hold what looks like a markedly different view. Not sure if it has something to do with the connection to parsing, or not.

Taming Context-Sensitive Languages with Principled Stateful Parsing

rating: 1

It's a slightly annoying paper in many ways; but some of the ideas are good.

[Context-sensitive features] are traditionally handled with ad-hoc code (e.g., custom lexers), outside of the scope of parsing theory.

True dat. I too would love to bring them into parsing theory (or rather, parsing-and-generating theory.)

Stateful parsing is a crude but effective way to implement a parser for a context-sensitive language. (Implementing a CSG straight-up is far too inefficient.) "Crude" is on the level of global variables though. The "principled" part is to make it less crude.

They don't like the idea of explicitly threading state through the grammar (and/or parser code); as this increases the dependencies; so they prefer this more implicit, global-variable-like setup.

This seems similar to how people like "implicit arguments" and monads in functional programming.

The "principled" part is largely about managing the state using transactional operations.

each parser invocation either succeeds or leaves the state untouched

But... isn't the transactional approach only required because the state is treated as updatable? Wouldn't a purely functional approach (using immutable data) simply discard the altered state and revert to the previous state? Or rather, it would be, "each parser invocation either succeeds and returns a new state, or fails and returns nothing", where it's understood the state is untouched.

So they don't appear to be thinking of functional programming (or "persistent" data structures) here, but charging ahead with mutable state.

Also: in my ideas for applications for this at least, wouldn't it sometimes to be valuable to send information back to the choice point on failure?

(Need to double-check this paper/impl: does it use ordered choice like PEG?)

research!rsc: Yacc is Not Dead

rating: 1

Yacc is dead: An update

rating: 2