Language version 1.1
While discussing Cyclone,
Gregor Richards stated that in order for a language
to support explicit malloc()
ing and free()
ing of
allocated memory, while also being safe (in the sense of not being able to execute
or dereference incorrectly-populated memory) would require that language to
either support garbage collection, or to not implement free()
.
In his words:
A C-like language which provides a true explicit free() cannot be safe. (By "true" I mean that you can get that memory back in a later malloc().) To be safe a language must either never free (which is bad) or be GC'd. [C-like languages being] imperative languages with pointers at arbitrary data, where safety is defined as not seeing that data as a different type.
Eightebed was designed as a counterexample to that claim.
Eightebed is a small, C-like language with explicit malloc()
and free()
. Memory is actually freed by free()
and might be re-allocated by a future malloc()
. Yet Eightebed
is a safe language, requiring only a modicum of static analysis and runtime support,
and in particular, it neither specifies nor requires garbage collection:
We place some restrictions on Eightebed in order that our implementation of a compiler and analyzer for it may be simplified. These restrictions do not, we assert, prevent the language from being "C-like", as it would be possible to extend the language to include them; the only thing we would be adding if we were to do so would be additional complexity in implementation. These restrictions are:
while
loops.Note that where this grammar is a little weird, it is only to support
being fully LL(1) to ease parser construction. Notably, the syntax to access
a member of a structure uses both square brackets around the structure
and a dot between structure and member. Unlike C, there is no syntax like
->
to dereference and access a member in one go; you need
to dereference with @
, then access the member with [].
.
Eightebed ::= {TypeDecl} {VarDecl} Block. Block ::= "{" {Stmt} "}". TypeDecl ::= "type" NameType Type ";" Type ::= "int" | "struct" "{" {Decl} "}" | "ptr" "to" Type | NameType. Decl ::= Type Name ";". VarDecl ::= "var" Decl. Stmt ::= "while" Expr Block | "if" Expr Block ["else" Block] | "free" Ref ";" | "print" Expr ";" | Ref "=" Expr ";". Ref ::= "[" Ref "]" "." Name | "@" Ref | Name. Expr ::= "(" Expr ("+"|"-"|"*"|"/"|"="|">"|"&"|"|") Expr ")" | "malloc" NameType | "valid" Expr | IntLit | Ref.
type node struct { int value; ptr to node next; }; var ptr to node jim; var ptr to node george; { jim = malloc node; if valid jim { [@jim].value = (1 + 4); george = jim; } if valid george { print [@george].value; } free george; free jim; }
Dereferencing a pointer x must only occur at the
safe start of the "then" part of an if
statement whose
test condition consists only of the expression valid x
.
The safe start of a block is the set of statements preceding and
including the first assignment statement or free
. (This is on the
[admittedly somewhat pessimistic]
assumption that any assignment could invalidate x.)
(New in 1.1: the safe start must precede the first free
statement, to prevent creation of dangling aliased pointers. Thanks Gregor!)
To simplify implementation, we limit x to a simple variable name
rather than a full expression. (This too is without loss of generality, as it is a
simple matter to use a temporary variable to store the result of a pointer expression.)
Any attempt to dereference a pointer which
does not follow these rules is caught by the static checker and disallowed.
Every pointer in the Eightebed language is implemented internally
as a structure of a machine pointer (obtained, for instance, by C's malloc()
)
coupled with a boolean flag called valid
.
When a chunk of memory is initially successfully allocated, valid
is
set to true. Freeing a pointer first checks this flag; freeing the
machine pointer is only attempted if valid
is true.
In addition, just before freeing the machine pointer, we invalidate
all aliases to that pointer. (Starting with the "root set" of the
program's global variables, we traverse all memory blocks reachable
by following valid pointers from them, looking for pointers which
match the pointer about to be freed; any we find, we set their valid
flags to false.) After freeing a pointer, we set its valid
to false.
Because of the static analysis, it is not possible to dereference a pointer at a point in the program where we do not know for certain that it is valid (i.e., it is not possible to dereference an invalid pointer.) Because of the runtime support, as soon as a pointer becomes invalid, all aliases of it become invalid as well. (All reachable aliases, that is – but if an alias isn't reachable, it can't be dereferenced anyway.) Add both of these together, and you get memory that can leak without any risk of being reused.
And no, this isn't garbage collection, because (as stated already) we don't care about garbage and we don't collect anything. Yes, the runtime support looks a bit like the mark phase of a mark-and-sweep garbage collector, but even it has a different job: not marking everything that is reachable, rather invalidating all aliases of a given pointer.
And finally, yes, I realize how little this proves. Long live loopholes.
16:19:38 <Gregor> We implement this without a GC by stuffing most of a GC into the free function, thereby making it just as slow as a GC'd language with none of the advantages! 16:25:29 <Gregor> So yes, although you have managed to fit my requirements, I am wildly underwhelmed :P
Cat's Eye Technologies provides a cockamamie reference implementation of
Eightebed called 8ebed2c.py
. Written in Python 2.6,
it compiles Eightebed code to C, and for convenience will optionally
compile that C with the C compiler of your choice and run the resulting
executable.
8ebed2c.py
ships with a fairly extensive (for a language
like this!) suite of test programs, which can of course double as example
sources; these can be found in the eightebed.tests
module.
For an appreciation of just how cockamamie 8ebed2c.py
is,
run 8ebed2c.py --help
and read through the command-line options
it provides.
The name Eightebed started life as a typo for the word "enlightened" made on an iPhone by a mysterious individual known only as Alise. (Well, perhaps not only.) Alise has aggressively asserted her intellectual property rights by copyrighting [sic] the name Eightebed. Cat's Eye Technologies has pursued permission to use the name for this language, only to be told that the procedure for obtaining such permission "involves five yaks, a Golden toad that hasn't eaten for five days, five boxes of antique confetti (not stripped of uranium), dye number 90 (blood green), a very confused weasel, and three pieces of A4.15 paper."
Cat's Eye Technologies' legal-and-yak-husbandry team is currently investigating the feasibility of this arrangement, and as of this writing, official permission is still pending. If complications persist, another, less contentious name (such as "Microsoft Windows 7") may need to be chosen for this language.
17:52:08 <alise> cpressey: I request that all harm is done to animals in the making of this production.
In which we reveal the outline of a grand plan for a blockbuster sequel to Eightebed which will never materialize
if valid
,
pointers to unnamed types, structures which contain other structures, and all that other boring stuff that
we just said doesn't matter.Happy leaking!
Chris Pressey
September 1, 2010
Evanston, IL