LCF-style Natural Deduction

See also: PhilomathDestructorizersNested Modal Transducers

This article presents a (perhaps unorthodox) development of a simple LCF-style theorem prover for propositional logic in a Natural Deduction system.

It starts with some brief notes and observations about LCF-style theorem provers and Natural Deduction. It then proceeds to tie them together with some Python-like pseudo-code, and concludes with a handful of further observations.

This article contains only pseudo-code; for implementations of these ideas in real programming languages, have a look at projects such as Philomath (in ANSI C).

Table of Contents

LCF-style Theorem Proving

A brief introduction to LCF-style theorem proving can be found in The LCF Approach to Theorem Proving (PDF), slides for a presentation by John Harrison of Intel Corp. in 2001.

An LCF-style theorem prover, as far as I'm concerned, means this:

You have a programming language, and you will write your proofs directly in this language, as executable programs.

To do this, you have a data type representing theorems (that is, statements for which there is a valid proof).

You have some operations that produce trivial theorems (ones which need no proof to speak of).

You have some other operations that take theorems and transform them in some way to produce new theorems. These operations represent valid proof steps.

You don't provide any other way to produce or mutate such a data object. In this way, these data objects are guaranteed to represent theorems with valid proofs, and a program that runs and produces such a data object as its output, has proved a theorem.

Some Observations

Now, some observations about LCF-style theorem provers:

(1) This is fundamentally an application of "Parse, don't Validate".

(2) This doesn't rely on a static type system. Rather, it relies on encapsulation a.k.a. information hiding. You could write it in Scheme (see, e.g., "Information Hiding in Scheme"), even though a type theorist would call Scheme an "untyped" language. (The rest of the world would probably call it "dynamically typed" or "latently typed".)

(3) Incidentally, taking 1 and 2 together, one may conclude that "Parse, don't Validate" in general doesn't rely on a static type system, either.

(4) Actually preventing the mutation of such a data object on a real computer is probably impossible. Can you not always run the program under gdb and alter a few bytes here and there? It's probably more productive to view it as a proviso: if the data object was constructed only by these operations and no others, then it represents a theorem. Under such a proviso, one could even have an LCF-style theorem prover in a language with only "voluntary" encapsulation, such as Python.

Natural Deduction

A good tutorial on Natural Deduction is the first chapter of the book Logic in Computer Science (, 1st ed., borrowable) by Michael Huth and Mark Ryan.

The IEP article on Natural Deduction by Andrzej Indrzejczak is also a good overview, and I'll follow parts of it closely here.

In particular, it proposes some criteria for distinguishing Natural Deduction proof systems from other proof systems. In ND systems,

(1) Assumptions can be freely introduced, and subsequently discharged under certain conditions.

(2) Reasoning proceeds by successive application of inference rules.

(3) Rich forms of proof construction are possible.

The first corresponds to LCF-style operations which produce trivial theorems. The second corresponds to LCF-style operations which transform theorems into other theorems via a valid proof step. The third corresponds to the fact that the data objects representing theorems are processed by a general-purpose programming language, in which we can build arbitrary contrivances to help us work with them.

Some Observations

An ND proof may be presented in a tree format or in a list format. An LCF-style proof in an applicative language is a set of function applications, which has the shape of a tree. If we bind each application to an identifier (using let or a similar language construct), we obtain a flattened list, where each proof step (application) may refer to earlier proof steps.

In the tree format, assumptions are labelled when introduced, so that when an assumption is discharged, we know which one it is. In the list format, nested boxes can be used to serve the same purpose. This is akin to labelled assumptions having dynamic scope versus nested assumptions having lexical scope, which will be demonstrated in one of the sections below. To simplify the exposition, our theorem prover will use labelled assumptions.

Pseudo-Code for the Theorem Prover

It would seem that the ideal language in which to demonstrate the things I'm trying to demonstrate here would be a dynamically-typed language with good support for encapsulation.

Unfortunately, such a combination is rarely found in a programming language (which could be a topic unto itself). This leaves me a few options:

None of these options are fantastic. I'll pick the first option, because Python is pretty accessible and it's pretty easy to pretend it has encapsulation. (See also the proviso I mentioned earlier.)

So, this code will deal with Theorem objects with some attributes whose names begin with single underscore. Just pretend that the functions we define here are the only functions that can access these attributes. All other code is just, you know, magically prevented from accessing them.

These attributes are:

Propositional formulas are represented with an abstract syntax tree structure, whose nodes are objects from classes such as Conj, Disj, Impl, Var and so forth. When these are binary operators we assume they have attributes called lhs and rhs containing their child nodes.

Unlike our theorem objects, the representation of propositional formulas will not be hidden from client code. In fact it will be useful for client code to manipulate propositional formulas as much as it likes.

In particular, in order to avoid writing clumsy formula-tree-building code like Impl(Var("p"), Var("q"))), we'll assume we have a helper function wff which parses strings containing propositional formulas, like so: wff("p → q").

Now, on to the actual operations.

The basic operation representing the most basic valid proof step takes a propositional formula and a label, and produces a theorem with the given formula as an assumption with the given label. The code would be something like:

def suppose(formula, label):
    return Theorem(
        _assumptions={label: formula},

We will also have a set of operations that represent inference rules. Each inference rule is a function: the inputs are the premises and the output is the conclusion.

Inference rules such as conjunction-introduction seem easy to formulate, and they are, but there is one aspect where we must take care, which is this: any assumptions already made in the premises must be carried over and preserved in the conclusion.

Additionally, if we ever discover that two copies of the same label refer to two different propositional formulas, this indicates some kind of mistake in the proof construction, and we should take care to flag it as such.

To achieve these ends we would benefit from defining a helper function, something like:

def merge_assumptions(a, b):
    c = b.copy()
    for (key, value) in a.items():
        if key not in c:
            c[key] = value
        elif c[key] != value:
            raise Error("Inconsistent labelling")
    return c

Now we can write the operation corresponding to the inference rule of conjunction-introduction:

def conj_intro(x, y):
    """If x is proved, and y is proved, then x & y is proved."""
    assert isinstance(x, Theorem)
    assert isinstance(y, Theorem)
    return Theorem(
        _conclusion=Conj(x._conclusion, y._conclusion),
            x._assumptions, y._assumptions

(Note that traditionally, Greek letters are used in the premises and conclusion of an inference rule to denote variables representing theorems. We will avoid Greek letters here, because they don't seem to work as well in pseudo-code as they do in typeset mathematics. Writing out their names seems awkward too (phi and psi are too similar, as names go,) so we will use Latin letters instead. But! When using Latin letters it might make sense to use the letter p for a variable for "proof", it would be too easy to confuse this with p used as a propositional variable — which it commonly is. So in this exposition we will use x, y, z for variables that represent theorems. Also, when we have a propositional formula rather than a theorem, we may add an f in the name.)

Conjunction-elimination is also reasonably simple. We assert that the conclusion has a certain structure, and then we take it apart. side is a parameter which tells which side we want to keep: 'L' for left, 'R' for right. (The other option would be to write two rather similar functions for two rather similar rules of inference.)

def conj_elim(z, side):
    """If z (of the form x & y) is proved, then x (alternately y) is proved."""
    assert isinstance(z, Theorem)
    assert isinstance(z._conclusion, Conj)
    return Theorem(
        _conclusion={'L': z._conclusion.lhs, 'R': z._conclusion.rhs}[side],

We did the rules for conjunctions first because they are very simple, but correspondingly not very exciting, and showing a proof using only them would probably not be very illuminating. So, let's work towards having a selection of inference rules that will let us write a small, but meaningful, proof. Here is implication-elimination, also known as modus ponens:

def impl_elim(x, y):
    """If x is proved, and y (of the form x → z) is proved, then z is proved."""
    assert isinstance(x, Theorem)
    assert isinstance(y, Theorem)
    assert isinstance(y._conclusion, Impl)
    assert y._conclusion.lhs == x._conclusion
    return Theorem(
            x._assumptions, y._assumptions

Implication-introduction is slightly more complex, as it will discharge one of the assumptions it's given. To identify this assumption, we will pass in its label. An assumption with that label must be present on the theorem that we also pass in to the function.

def impl_intro(x_label, y):
    """If y is proved under the assumption x, then x → y is proved."""
    assert isinstance(y, Theorem)
    assert x_label in y._assumptions
    a = y._assumptions.copy()
    fx = a[x_label]
    delete a[x_label]
    return Theorem(
        _conclusion=Impl(fx, y._conclusion),

We now have enough operations to demonstrate a simple proof. We'll follow the proof of the statement given in section 5a of the IEP article:

(p → q) → ((q → r) → (p → r))

Our function application that corresponds to that proof is

def proof1a():
    return (
                            suppose(wff("p"), 1),
                            suppose(wff("p → q"), 3)
                        suppose(wff("q → r"), 2)

(Note well that proof1a is not an operation and may not directly access or manipulate the _conclusion and _assumptions fields of the object it creates and returns.)

The formatting of this function application is intentionally very tree-like, to try to more clearly show how it corresponds with the tree presentation of the proof in the IEP article.

The glaring difference between this and a tree-structured proof is, of course, that the nodes of the tree-structured proof show the conclusion at each step, while in the function application, the theorems are merely being passed in as parameters to the enclosing function, and the content of these theorems are not shown in the source code.

Even if we find it acceptable to omit explicit intermediate conclusions, we probably still do want to show the final conclusion that we set out to prove, for clarity. So, we can define another helper function like

def shows(x, conclusion):
    assert isinstance(x, Theorem)
    assert len(x._assumptions) == 0
    assert x._conclusion == conclusion
    return x

Then we can say

def proof1b():
    return shows(
            (... all the rest of the above proof ...)
        wff("(p → q) → ((q → r) → (p → r))")

and if we call this function we should not get an error; it should just return, indicating that the proof is indeed valid and that it is indeed a proof of this statement, with no outstanding assumptions.

Note that the shows function could also be used on intermediate steps, if desired.

But even if you used it on every intermediate step, this layout seems a bit awkward. Compared to the proof tree displayed in the article, it's oriented sideways.

Natural deduction proofs can also be written out linearly, as a list of steps. We can do that with our big nest of function applications, by binding each application to a local name. Like so:

def proof1c1():
    s1 = suppose(wff("p"), 1)
    s2 = suppose(wff("p → q"), 3)
    s3 = impl_elim(s1, s2)
    s4 = suppose(wff("q → r"), 2)
    s5 = impl_elim(s3, s4)
    s6 = impl_intro(1, s5)
    s7 = impl_intro(2, s6)
    s8 = impl_intro(3, s7)
    return shows(s8, wff("(p → q) → ((q → r) → (p → r))"))

This layout is immediately, in some ways, more readable. It comes at the cost of having to introduce and manage all those intermediate names, though. And we still need to manage the bookkeeping of the assumption labels.

In a Fitch-style proof, the assumption labels are replaced by nested boxes. The proof steps outside the boxes cannot make reference directly to the proof steps inside the boxes. This suggests the proof steps in the boxes are in an inner lexical scope.

First let's re-arrange the proof steps to show that they can indeed nest.

def proof1c2():
    s1 = suppose(wff("p → q"), 3)
    s2 = suppose(wff("q → r"), 2)
    s3 = suppose(wff("p"), 1)
    s4 = impl_elim(s3, s1)
    s5 = impl_elim(s4, s2)
    s6 = impl_intro(1, s5)
    s7 = impl_intro(2, s6)
    s8 = impl_intro(3, s7)
    return shows(s8, wff("(p → q) → ((q → r) → (p → r))"))

Now let's pretend we've rewritten suppose to be a Python-style context manager. Instead of taking a label as an argument, suppose now generates its own unique label and pushes it onto a stack. And now, also instead of taking a label as an argument, impl_intro pops the label off that stack. This lets us write:

def proof1c3():
    with suppose(wff("p → q")) as s1:
        with suppose(wff("q → r")) as s2:
            with suppose(wff("p")) as s3:
                s4 = impl_elim(s3, s1)
                s5 = impl_elim(s4, s2)
                s6 = impl_intro(s5)
            s7 = impl_intro(s6)
        s8 = impl_intro(s7)
    return shows(s8, wff("(p → q) → ((q → r) → (p → r))"))

...which matches pretty closely with a Fitch-style exposition. It's still lacking a few things; mainly, the lines after a with block shouldn't be able to see the local variables defined within it; they should only be able to refer to the final line. We'll have to pretend they are magically prevented from referring to any of the other lines.

A more thorough way to do this would be to use local function definitions, although it does begin to look a bit less Fitch-like:

def proof1c4():
    def inner1((s1, lab1)):
        def inner2((s2, lab2):
            def inner3((s3, lab3)):
                s4 = impl_elim(s3, s1)
                s5 = impl_elim(s4, s2)
                return impl_intro(s5, lab3)
            s6 = inner3(suppose(wff("p")))
            return impl_intro(s6, lab2)
        s7 = inner2(suppose(wff("q → r")))
        return impl_intro(s7, lab1)
    s8 = inner1(suppose(wff("p → q")))
    return shows(s8, wff("(p → q) → ((q → r) → (p → r))"))

You can, however, probably imagine an alternate syntax for this that is more properly Fitch-like. It would be simply an exercise in syntactic sugar, so I won't dwell on it.

Back to the rules of inference. We haven't given a full set, but I think we've given enough to get the idea across. I could say the rest are left as an exercise for the reader. But, one in particular is perhaps still worth doing, and that's disjunction-elimination, and the reason it's perhaps worth doing, is because it's complex. If you can translate it into a function, you can probably translate any inference rule into a function.

Pictorially, disjunction-elimination is usually shown as something like

                    φ   ψ
                    :   :
                    :   :
            φ ∨ ψ   χ   χ
        -------------------- ∨e

In prose, we can read this as "If φ ∨ ψ is proved, and if we can prove χ under the assumption φ, and if we can prove χ under the assumption ψ, then χ is proved."

From this, we know we need to take a theorem (call it z) and assert that its conclusion has a certain form (Disj); and we will need to take another theorem, and a label (call them s and x_label); and yet another theorem and a label (call them t and y_label).

At the labels we will locate x and y and assert that those are what z breaks up into.

We will assert that s and t also represent the same conclusion, and that is what we will return. We will discharge the assumptions x and y, but we will also need to carry forward any undischarged assumptions that may still be in z, s, and t.

Which means the function must look like this:

def disj_elim(z, s, x_label, t, y_label):
    assert isinstance(z, Theorem)
    assert isinstance(z._conclusion, Disj)

    assert isinstance(s, Theorem)
    assert x_label in s._assumptions
    a_s = s._assumptions.copy()
    fx = a_s[x_label]
    delete a_s[x_label]

    assert isinstance(t, Theorem)
    assert y_label in t._assumptions
    a_t = t._assumptions.copy()
    fy = a_t[y_label]
    delete a_t[y_label]

    assert s._conclusion == t._conclusion
    assert z._conclusion.lhs == fx
    assert z._conclusion.rhs == fy

    return Theorem(
            merge_assumptions(a_s, a_t)



We'll conclude with an assortment of additional observations, ones which suggest avenues for future work.

The first is that, while you can certainly build a proof with these functions, you have to actually run the constructed program to check that the proof is valid. And confirming that the proof is valid is all the program does. And as computational tasks go, that's really not a very complex one — there aren't any loops, or even any conditionals, in these inference rules we've written.

So you might (very reasonably) wonder if the proof can be checked without going all the way to executing the program. You might (very reasonably) think that you could apply concepts from static analysis, such as constant folding and abstract interpretation, to statically analyze the steps, and obtain the confirmation of validity of the proof at compile-time rather than runtime.

Similarly, you might think of writing it as a macro — another compile-time thing.

Having not yet tried it, I cannot say exactly how it does end up, but I would be surprised if the result isn't parallel in some way to "propositions as types", since the compile-time-computable component can arguably be viewed as a type system.

I would guess the resulting system would be fairly trivial for propositional logic, but would start to resemble dependent types if one were to upgrade to first-order predicate logic. Dependent types are what are more conventionally used for theorem provers anyway, so this makes a certain amount of sense. It has a certain logic to it, if you will.

The other observation is that our program which expresses and checks a proof has a simple structure: a tree of function applications. There is a straightforward way of "refactoring" such a program into a data structure: defunctionalization (Wikipedia). (And actually, since none of our theorem-producing functions are higher-order functions, defunctionalizing them is trivial.)

In this way, the entire proof could be expressed as "plain old data" (think: JSON) and "interpreted" by the program. In this case, the programming language needn't support information hiding; the program is prevented from producing an incorrect theorem object by virtue of the fact that the "interpreter" is closed, and does not allow any external code (i.e. code that could potentially modify the proof object in process in invalid ways) to be executed before the proof checking process terminates.

It would admittedly be a stretch to call the resulting theorem prover "LCF-style" though, as it would no longer be possible to write arbitrary proof-constructing code (e.g. tactics) in the host language.