Copyright © 2001 S.Cabasino, G.M. Todesco, P.S. Paolucci
Table of Contents
Table of Contents
The Zz language was designed within the APE100 team of INFN to serve as the basic tool to develop compilers for the Ape100 parallel computers. One of the aims of the APE100 group is the realization of a supercomputer specifically designed to solve specific numerical problem arising from the QCD community, but suitable also for other applications. The great necessity of custom compilers and interpreters suggests us the need to design the tool described in this document.
Hence two major reasons to develop Zz were:
Given its flexibility, Zz has been also used for a variety of system software, including the symbolic debugger DBQ and the machine description compiler.
This document has to be read with a little bit of tolerance. There are certainly some errors; but nevertheless we have done our best to ensure its quality. We assume that the reader has general familiarity with computer terminology.
The first part is intended to lead the novice user to the comprehension of the fundamental Zz concepts [1] . To learn these concepts we suggest you try all the examples because sometimes the results can be quite surprising. This tutorial guide should be read sequentially because we have introduced the terminology in a progressive fashion.
The second part is a quite detailed reference guide, which is useful to develop a real application. This reference guide is something more than a list of statements, but does not explain the concepts.
The third part is a guide to tailor Zz to fit your needs.
This documents is concluded with some examples of advanced Zz use and a glossary.
When a character plays a special role, it is highlighted, like this dollar: $. When something is optional we use the notation [ ... ].
Usually keywords are bold.
Zz is a dynamic language you will expand its vocabulary as you use it. However the Kernel is also evolving, so we hope to continually produce new releases of Zz and of this document. Stay in tune with the latest developments by monitoring the website listed in the resources section of this document.
[1] The theory of dynamic parsing is described in: S.Cabasino, P.S.Paolucci, G.M.Todesco, "Dynamic Parsers and Evolving Grammars", published on the November issue of ACM SIGPLAN Notices (1992)
Table of Contents
The Zz language is a general-purpose incremental language. It can handle operator overloading and any kind of structured data. By our definitions, an incremental language is a language that is able to easily grow according to the users needs, and which is also suitable to develop complex compilers and simple command interpreters (we use as an example a calculator). The user of Zz starts using a simple interface that allows him to introduce new statements.
The user can specify the semantics of his statements using other Zz statements or routines written by the user in a conventional programming language (like the C language). We call these routines, usable from Zz, "C-Procedures".
Zz can to be instructed to recognize very general grammars; upon matching a grammar rule it can execute an action as stated above. Thus one of the aims of Zz is to interface a set of C-Procedures with a command language.
Developing a Zz application is quite easy using its native environment. The user is encouraged to take advantage of the inherent flexibility of Zz to improve the interfaces of their applications. Today, within the APE group, Zz is used in all the applications that require some user interface or command language. Zz has many enticing aspects, for example it can add new words to its syntax like FORTH does, it can handle its own code like LISP does, and it can handle syntaxes like YACC does.
For the compiler writers Zz can be quite helpful. It does the job of a compiler compiler, but it handles the variables and other objects declarations maintaining a pure syntactic strong type checking. A compiler developed using Zz could be general like ADA and C, but our intention is that Zz will be used to develop innovative Very High Level Language (VHLL) compilers with dynamic syntax capability.
A compiler is designed to translate source code, written in a given language, e.g. FORTRAN either into executable machine code or into an intermediate form, (e.g. into an assembler program or binary object code). In the APE software we choose the second alternative, since the main code optimization step is executed in a following program.
Programs written in a high level language, e.g. FORTRAN can be looked at in two different ways. The first is to consider the program as a statement in the given language of a computational task to be executed by the computer. The second is to look at it as a series of statements which instruct the compiler to produce either the machine code or the assembler code needed to execute the task. In the first way, the program represents directives to the compiler rather than the description of the computational task at hand. Common examples of this kind are declarations of variables and special data types and structures, or directives for compiler options (e.g. LIST, NoLIST, etc.). For the following discussion of Zz, it is however useful to adopt the second point of view, where all lines in the program are seen as directives to the compiler.
The basic idea underlying Zz is that of a language which can be extended not only by the definition of new entities (e.g.: structures, subroutines or tasks declarations) and by the redefinition of existing ones (e.g.: overloading), but where the programmer can modify the syntax itself. In the point of view we have adopted, all compiler languages are extensible to a limited extent. Fortran 77 compilers, e.g., allow the definition of new identifiers, the names of variables, arrays, functions, and subroutines. Other compilers, e.g. those for ADA and C++, go one step further, and allow operator overloading and the definition of new data types. Languages such as FORTH and LISP are even more versatile, as they allow the definition of new operators accepting a very unfriendly grammar. Zz has the versatility of FORTH and LISP, with the added feature of making possible the definition of new syntactical forms. This makes of Zz a universal compiler language. It is possible, through suitably designed syntax extensions, to use Zz to write compilers for most existing languages including, among others, FORTRAN, C and C++.
The extension of Zz proceeds through the definition of "production rules", which specify new syntactical forms accepted by the language, and thus effectively add to the grammar of the language. For each new production rule the user can specify an action that will be executed when the interpreter recognizes the corresponding syntactical form. The action can be specified either in the Zz language, or as a call to a user defined C procedure.
The Zz language basic package, Zz L0, has been written in C to ensure easy portability between different platforms.
Zz can interface a set of C-Procedures with a command language. The basic set of C-Procedures available in the Zz kernel is limited; however the user may link in their own C-Procedures with the Zz kernel.
Zz will be able to call all the C-Procedures in an appropriate sequence, providing the needed parameters according to the defined grammar rules.
It is didactically interesting, although without interesting practical applications, to use the unconfigured version of Zz. It allows only the output to standard output with a basic format handling. When you plan to use Zz in a certain area of your project to produce a Zz application you have to write, in Zz language, the syntax extension files which define the syntax to be used in your project and the actions to be executed. Moreover you should configure Zz linking to Zz the set of your C-Procedures. We will do here three examples to clarify the field of application of Zz.
Let's suppose that you need a command interpreter to give commands to your data acquisition equipment. You need some C-Procedures as filters, data analysis procedures, device drivers etc. Zz will call them when you will write statements like:
STORE ON MY_FILE EVENTS ROM DEVICE 3 (FILTER: B=23) FIT EVENTS FROM MY_FILE WITH MY_FUNCT; DISPLAY CHI^2.
Let's suppose that your new parallel supercomputer needs a special language extension and you plan to develop a new compiler. You will write, and link to Zz, a lot of C-Procedures to write assembler code to optimize the programs, to write program listings and so on. As an example you need statements like:
where (convergence > 0.0001) there ..... endwhere ifall (check ==ok ) { ...... }
You need the Fortran languages, but you need to enrich the Fortran syntax adding special purpose statements to handle a computer network or a special machine. The standard compiler is ok for you, but the new statements will provide the programmer with the capability of producing shell procedures to configure appropriately the network or to allocate the machines when the program is running.
A "Fortran extended" instruction could be:
TEST COMMUNICATIONS
In these examples you have to configure three different "Zz applications". These three applications differ in the user action library and in the syntax. Indeed when you will have linked the C-Procedures library, and in so configuring Zz, you will describe the syntax.
The first application will link standard analysis algorithms and user procedures. The second will generate optimized code for your computer and the third one will produce Fortran source and shell commands.
The first chapters of this manual will not describe a specific Zz application and the examples will use mainly the set of C-Procedures available in the basic kit. The third part will describe in detail the way to link the Zz kernel with user written C-Procedures (i.e. how to configure Zz).
Table of Contents
We call "The Zz language" (or simply "Zz") the language accepted by Zz when it starts. This language allows the definitions of new grammar rules, i.e. the language itself may change and grow. In this chapter we introduce Zz as it is when it is started without any language extension.
The first program to write is the emerging software standard "Hello World!". Zz has to be installed and you need to know how to call it: for details on your installation, please ask your system manager.
If you want Zz to process a file instead of starting an interactive session, you should type:
& zz filenameIf you omit the file name it starts an interactive sessions; an environment useful for doing exercises:
$ zz ....... ZZ initialization message ... zz> /print "Hello, world" Hello, world zz> ctrl-z $
To exit politely type ctrl-Z (ctrl-D for UNIX users).
The statement /print (read "slash print") is used to print something on your screen.
Of course the user of a dynamic language would try to write a dynamic example from the beginning. Therefore let's define a new statement: "Hello" that is used to print "Hello, world".
$ zz ....... ZZ initialization message ... zz> /stat -> "Hello" { .. /print "Hello, world" .. } zz> Hello!! now Hello is a new recognized stat. zz> Hello, world zz>
Arguments of the /print statements print may also be numbers or expressions:
zz> /print 12.7 * 2 25.4 zz> /print "The result is ", 20+4.0/3.0 The result is 21.333334
There are also variables, and statements to assign expression to them:
zz> /r=12 zz> /pi=3.141593 zz> /header= "circle = " zz> /print header,2*r*pi circle = 75.398232 zz> zz> /x = 12 zz> /y = goofie zz> /print y goofie zz> /y = x zz> /print y 12 zz> /y = "x" zz> /print y x
Zz is a very sparse language: few operations are intrinsically supported. The key of Zz is the syntax extension statement. In the current release the following are available as predefined statements: assignment, print, evaluation of simple expressions, and a limited number of other basic instructions. In principle there is no need of Zz instructions, except for only the syntax extension capability. The intrinsic Zz statements are however useful for purposes of exercise and in the early stages of application development.
The Zz intrinsic statements are prefixed with a simple slash / introduced to clearly distinguish the Zz starting language statements from the user application language.
It is possible to fit more than one statement on one line, terminating each statement with a semicolon(;). If there is one single statement the semicolon is optional.
Example:
zz> /print "Hello, world"; /print "I am happy!'' Hello, world I am happy!
If the line is too long to fit in one line it is possible to split it (continuing on the next line) by means of the continuation line marker ... placed at the end of line to be truncated:
Example:
zz> /a= ... "not a very long line" zz> zz> /print a not a very long line
The statement:
/include "file_name"makes it possible to include a stream of statements written within another file (file_name must be the name of a text file containing Zz statements).
Zz uses a lexical analyzer to get tokens from the source stream. The lexical analyzer is able to categorize the following lexical elements:
The user can introduce new lexical categories.
The statement /print can print tokens of all these categories.
Examples:
zz> /print " first row \n second row" first row second row zz> /print robert,34,3.5 robert 34 3.5 zz> /print "&" & zz> /print "****" ****
Note that the control sequence "\n" causes a carriage return.
The double quotes "" are used mainly in the following cases:
Zz supports variables and simple expressions; the intrinsic types are mainly numeric, string and list.
The Zz variables are dynamic; they are created when assigning values to them. A Zz variable has a value and a tag. The tag of the Zz variables is the type of the expression assigned to it. There is a correspondence between lexical tokens and tags.
The assignment statement has the following formats:
/variable := expression [ as type ]or
/variable = expression [ as type ]The optional type is some kind of tag used by a Zz expert to change the tag of expression. It can be any syntagma, as we'll explain in the following.
The assignment form ":=" creates GLOBAL VARIABLES which remain alive until the EOF is reached, while the "=" one creates LOCAL VARIABLES, which remain alive until the EOF (if declared at level 0) or local block's closing brace "}" (if declared within a block) is reached. How to use these variables will be explained later.
Zz language offers some facilities to manage the lists. A sequence of tokens within braces {} is interpreted as a list. It is possible to explicitly assign a list to a variable using the following format:
/var = { tokens.... }Wherein any token is allowed with the exception of "}" and an unmatched double quote ". List tokens are delimited by spaces.
As example of assignment to a variable:
zz> /my_list = { alfa b c , "anymore" 23.4 }"}
It is possible to refer to any item of a list using the notation variable.item_number, where item_number is the 1 based index number of the item we want to refer to. It is also possible to print the length of the list (i.e. the number of the elements in the list) using the notation variable.length.
As an example, using the list defined above:
zz> /print my_list.1 , my_list.4 alfa , zz> /print my_list.length 6 zz>
The four usual arithmetic operations (*, /, +, -) are supported for integer and floating point data types, following the usual rules of precedence. The type of the result is chosen depending on the type of the operands with the usual rules of floating type conversion for mixed floating/integer calculations.
A concatenation operator "&" is defined. The "&" symbol can be used to join identifiers, strings, or lists, and it can also operate if one of the operands is a numeric variable. In this case it takes the corresponding literal value of the number.
Examples:
zz> /id = "blabla" zz> /golf = id & 12*(4+5) zz> /print golf blabla108 zz> zz> /v1=15 zz> /v2=16 zz> /id = ciccio &_& v1 &_& v2 zz> /print id ciccio_15_16 zz> /my_list = { 123 "mouse" 2.4 } zz> /print my_list { 123 mouse 2.4 } zz> /print my_list.2 mouse zz> /new_list = my_list & { 123 } zz> /print new_list { 123 mouse 2.4 123}
When ZZ doesn't recognize a statement it prints a diagnostic message.
Example:
zz> /alfa=12*(13 # 40) + **** SYNTAX ERROR **** | got: '#' | expected one of: '*' '/' ')' '+' '-' | /alfa=12*(13 # 40) | ^ | line 1 of stdin zz>
The unexpected token is underlined by a "^" sign. Zz also prints the pertinent rules, underlining the place where the mismatch occurred.
In the previous example the following character are acceptable: *, /, ), +, -, while # is meaningless.
The key power of ZZ is the capability of expanding the recognized language. To add syntax extension to ZzL0 it is necessary to specify something on which to match, and an action to execute when this match occurs.
Now we introduce a new statement (shortly: stat) to display the Zz version. This new statement will be: "show version":
zz> /stat -> show version { .. /print "Zz Version 2.0 31, October 1991\n" .. } zz> zz> show version Zz Version 2.0 31, October 1991
The usual prompt "zz>" changes to a couple of dots to show that the action specification has to be completed. It is possible to overload a part of the above statement to display something else:
zz> /stat -> show authors { .. /print "Zz's authors are:" .. /print " Simone Cabasino" .. /print " Pier Stanislao Paolucci" .. /print " Gian Marco Todesco" .. } zz> show authors Zz's authors are: Simone Cabasino Pier Stanislao Paolucci Gian Marco Todesco zz> show version Zz Version 2.0 31, October 1991
In the examples just shown we added new syntaxes to the grammar of the statements (stat), writing:
zz> /stat -> thread { actions }
We call these kinds of statements "syntax extensions". More generally the form of the syntax extension statement is:
/ syntagma -> thread [ {action } ]"stat" is a good syntagma. Actually stat is the only syntagma that we have seen up to now: we will describe general syntagmas later.
We call "thread" the pattern (or rule) we are adding to the syntax (more exactly to the syntagma) and that Zz will be able to recognize when met. We call "action" the list of Zz statements, within braces {}, to be executed when the thread will be matched. The action is an optional field.
A thread is a list of beads. There are terminal beads like show, author, authors or Hello and nonterminal beads. Nonterminal beads will be introduced later.
Let's try with an error:
zz> show author ***** SYNTAX ERROR etc....
We can foresee this error and give a friendlier message:
zz> /stat -> show author { .. /print "There are several authors of Zz." .. /print "The correct statement is 'show authors'" .. /print "anyway:" .. show authors .. } zz> zz> show author There are several authors of Zz. The correct statement is 'show authors' anyway: Zz's authors are: Simone Cabasino Pier Stanislao Paolucci Gian Marco Todesco
Please note that you can use the stat "show authors" within the action of "show author".
The statement /rules shows all the syntax rules added to Zz:
zz> /rules RULES Scope kernel stat -> show author stat -> show authors stat -> show version stat -> say Hello
Here follows a set of examples to summarize:
zz> /stat -> "?" { .. /print "Commands today are:" .. /print " say Hello" .. /print " show version" .. /print " show authors" .. } zz> zz> /stat -> 12 { .. /print "you typed the integer number 12" .. } zz> zz> /stat -> 12.0 { .. /print "you typed the fp number 12.0" .. } zz> zz> 12 you typed the integer number 12 zz> zz> 000012 you typed the integer number 12 zz> zz> 12.000000 you typed the fp number 12.0 zz> zz> 12. you typed the fp number 12.0 zz> zz> 1.2e1 you typed the fp number 12.0
Let's again introduce a syntax extension with an example (that we strongly suggest to try) of a nonterminal bead in the thread:
zz> /stat -> "I am " ident^name { .. /print "Hello ",name, "!" .. } zz> zz> I am freddy Hello freddy!
Of course an integer number is not a legal identifier and Zz will warn us about it:
zz> I am 13 ***** SYNTAX ERROR etc....
In the example above the nonterminal bead is ident^name. Here, "name" is like a variable and identifies the bead inside the thread. "ident" is predefined, it will match any legal identifier.
The general form of a nonterminal bead is:
syntagma ^ parameterA nonterminal bead is made up of the syntagma, the character ^ (caret), and an identifier that plays the role of a formal parameter and can be used like a variable within the action. A nonterminal bead matches a set of syntactical objects (eg: identifiers and integers but also expressions or programs as we'll show in the following). We use "syntagma" for the name of those sets.
ident, stat and int are good examples of predefined syntagmas built in the kernel of Zz, and hence always available. We'll see in the following that when the action is executed (because the thread has matched something) all the formal parameters will have the actual value just matched.
We can create new syntagmas simply by using it in a nonterminal bead or assigning to it a thread with the syntax extension statement. This means that it is possible to assign one or more threads to a syntagma by using it in a following statement or refer (within a nonterminal bead) to a syntagma, which has not yet any thread assigned to it. It is possible to say informally that a syntagma is a collection of threads and a syntax extension is the way to assign a new thread (with the corresponding action) to a syntagma. When the parser has to match a nonterminal bead it tries to match all the threads of the syntagma referenced in the nonterminal bead.
A new syntagma: color is defined in the following example:
zz> /stat -> use the ink color^c { .. /print " I'm using the color n.",c .. } zz> zz> /color -> red { /return 1 } zz> /color -> violet { /return 2 } zz> /color -> pink { /return 3 } zz> zz> zz> use the ink red I'm using the color n.1
We have seen above the practical usage of the statement /return. The statement /return makes sense only within actions because it is used to give a value to the formal parameter of a nonterminal bead. It is possible to return something changing its type in a way like the assignment does. The general form of the return statements is:
/return expression [ as type ]Using a syntagma with no thread associated to it generates a syntax error. Try this kind of error with the undefined color yellow:
zz> use the ink yellow ***** SYNTAX ERROR etc....
The following example, that we again suggest to try, shows an interesting concept:
zz> /color -> gray int^a "%" {/return 100+a} zz> use the ink gray 20% I'm using the color n. 120
As you can see the new color just defined is more complex then a simple token. When in action we are not interested in the actual parameter's values, like in the following example:
zz> /stat -> "I'm" ident^name {/print "Hello!"}
We can use as a convention the name "$" for the formal parameter:
zz> /stat -> "I'm" ident^$ {/print "Hello!"}
When we use the $ sign in formal parameters we remark that the parameter is dummy, but it is a mere convention. In fact the $ is treated by Zz as any other identifier.
In a rule like this:
zz> /stat -> "I'm" ident^$ "from" ident^$ { /print "Hello!" }
The value of $ is replaced twice during the parsing, i.e.
zz> I'm Laura from Rome Hello!
When the thread is parsed the identifier "Laura" and then "Rome" are associated to the parameter $. When the action is executed the $ parameter contains the last value "Rome", in fact:
zz> /stat -> "I'm" ident^$ "from" ident^$ { /print "Hello!' /print $ } zz> I'm Laura from Rome Hello! Rome
And the same behavior occurs if $ is substituted by another identifier.
Table 2.1. Some useful syntagmas available within Zz are:
BEAD | Description |
---|---|
ident^xxx | Matches a string of alphanumeric characters, dollars, and underscore that do not begin with a digit (the lexical token identifier). |
int^xxx | Matches a string of integer digits (the lexical integer). |
float^xxx | Matches a string of digits with a decimal point and/or exponential notation (the lexical float). |
qstring^xxx | Matches a string delimited by quotes. Special characters are allowed if escaped with a slash (the lexical qstring). |
stat^xxx | Matches a Zz statement |
statlist^xxx | Matches a list of stat^ separated with ";" or newline |
num_e^xxx | Matches a Zz integer expression and returns the int result |
string_e^xxx | Matches a Zz string expression and returns the qstring result |
list_e^xxx | Matches a Zz integer expression and returns the list result |
any^xxx | Matches any token |
Sometimes it could be useful to control the parsing flow. It will be possible to iterate the parsing (something like a loop) and to conditionally parse some sentence (something like a conditional branch).
In the current version of Zz, the following are implemented: /for, /foreach, /do, /while, /if.
/for index_var = start_val to stop_val ... [step step_val] {action}
The action is executed (stop_val start_val + step_val)/step_val times.
Examples:
zz> /for i = 1 to 6 { /print i } 1 2 3 4 5 6 zz> /for i = 1 to 6 step 2{ /print i } 1 3 5
/foreach variable in list { action }
The action is executed once for each item in list. The variable takes the value of each item.
Example:
zz> /my_list = { a bb ccc } zz> /foreach k in my_list { /print k } a bb ccc
/do { action } while ( logical_condition )
Perform the action while the logical_condition is true. The loop is always executed at least once.
zz> /control = 1 zz> /do { /print control; /control = control + 1; } while (control <=3) 1 2 3 zz>
/while ( logical_condition ) { action }
The action is executed as long as the logical_condition is true. Unlike the "do" loop, this structure may never have it's action executed.
zz> /control = 1 zz> /while (control <= 3) { /print control; /control = control + 1; } 1 2 3 zz>
/if logical_condition { action }
The action is executed if the condition is true.
Example:
zz> /a = 2 zz> /b = 0 zz> /if a > b { /c = a b /print c } 2
There are some utilities to handle syntax extensions. The statements:
/krules [syntagma ] /rules [syntagma ]
These are used to print both kernel and user threads (/krules) or only user rules (/rules). The optional syntagma is used to print only the rules attached to a specific syntagma.
There is a statement to show all the variables active at a certain level:
/param
This statement can be used within an action to know the parameter's values.
We introduce with this example the concept of overloading:
zz> /stat -> show int^x { .. /print "Integer ",x .. } zz> zz> /stat -> show float^x { .. /print "Floating Point ",x .. } zz> zz> show 12 Integer 12 zz> zz> show 12.0 Floating Point 12.0000
In the example above the word show manifests two different behaviors depending only on the type of the number (12 or 12.0). In other words the statement show is overloaded. The parser is able to resolve the overloading ambiguity choosing the right thread according to the type of the nonterminal beads: int^x or float^x. There are other languages allowing some kind of overloading: ADA and C++ for instance allow the operator overloading, but not the definition of new operators.
In the following example we show how ZzL0 variables dynamically change their type:
zz> /my_value = 12 !! my_value is integer zz> show my_value Integer 12 zz> /my_value = 12.0 !! my_value now is float zz> show my_value Floating Point 12.000000
We prefer the typographic style described below.
When the action is very short or omitted all the SE has to be written on only one line:
zz> /stat -> one_hello { /print "Hello, World!" } zz> /stat -> this is an unuseful statement and... does nothing
Elsewhere we prefer to begin at new line the action:
zz> /stat -> four_hello { .. one_hello .. one_hello .. one_hello .. one_hello .. } zz>
It is forbidden to insert a new line before the open brace.
Examples
zz> /color -> green { /return 10 } zz> /color -> blue { /return 20 } zz> /stat -> the ink is color^c {/print "ink = ",c} zz> zz> /feeling -> glad { /return 1000 } zz> /feeling -> blue { /return 1001 } zz> zz> /stat -> I feel feeling^f {/print "You feel ",f} zz> zz> I feel blue You feel 1001 zz> zz> the ink is blue ink = 20 zz> zz> /arg3 -> int^a "," int^b "," int^c { .. /print "push ",a .. /print "push ",b .. /print "push ",c .. } zz> /stat -> goofie arg3^$ { .. /print "call goofie" .. } zz> pippo 1,2,3 push 1 push 2 push 3 call pippo
The infix operators' notation is user friendly but potentially ambiguous. Thus there are two options to compute the expression 2 + 3 + 4:
This ambiguity is of course often negligible, but can be dangerous if the operator isn't associative: (2/3)/4 != 2/(3/4).
Let's imagine a translator which converts infix (ambiguous) operators into RPN notation (that is unambiguous). We define explicitly an unambiguous grammar (left associative):
zz> /stat -> expr^e zz> /expr -> fact^$ zz> /expr -> expr^$ "/" fact^$ {/print "divide"} zz> /fact -> int^n {/print "push ",n}
This is to test the example:
zz> 20/10/5 push 20 push 10 divide push 5 divide
Of course it is possible to change one line to change the associativity:
zz> /stat -> expr^e zz> /expr -> fact^$ zz> /expr -> fact^$ "/" expr^$ {/print "divide""} zz> /fact -> int^n {/print "push ",n}
and now:
zz> 20/10/5 push 20 push 10 push 5 divide divide
When the action is defined all the parameters (associated to the nonterminal beads) and variables within the braces {} are evaluated and the name is replaced with the corresponding value. Local variables (assigned with =) are replaced immediately (when the action is declared) while the other kind (assigned with :=) is replaced only when the action is executed (see also Using Zz variables).
We have seen up to this point only Zz action within braces {}, but there are two other kinds of actions, thus the syntax extension statement has three different formats:
The first format is well known. The second one is used to call a user C procedure (UCP) linked with the Zz kernel, optionally passing to it its parameters (see Part III). The third one is used to return a constant value; this format is very similar to:
/syntagma -> thread { /return expression }The third format is fastest because Zz doesn't have to interpret the action; however no variable replacement will occur.
The kernel makes available a simple C-Procedure: pass that is used to return all the parameters of nonterminal beads in the thread. Thus the following examples (a) and (b) are equivalent but the second one is faster:
The following form:
zz> /sss -> ... xxx^yyy ... :return yyy
is wrong because yyy is not a constant expression, in this case Zz will every time return "yyy" and not its actual value.
The statement to extend the syntax is usable as any other statement within the braces { } of a Zz action. This is the way to handle symbol tables using Zz. Let's suppose that we want Zz to handle our phone directory. We would need a symbol table for this. We'll create one called "names":
zz> /stat -> show names^x {/print " phone: ", x } zz> /stat -> show any^${/print "phone not available" } zz> zz> /names -> paola { /return "0034345678" } zz> /names -> tony { /return "002143545" } zz> /names -> albert{ /return "home:123456 office:3445" } zz> zz> show albert phone: home:123456 office:3445 zz> show carin phone not available
Now we can introduce a statement to insert friendly a new name:
zz> /stat -> add ident^n qstring^p... {/names> n { /return p } } zz> add luisa "off. 35682" zz> show luisa phone: off. 35682
It is also possible to change the action associated with a thread simply by assigning a new action to that it:
zz> add luisa "off. 3935682" zz> show luisa phone: off: 3935682
It is possible to return a list:
zz> /int_decl -> ident^name "[" int^size "]" { .. /return { name size } !! unidim. array .. } zz> zz> /int_decl -> ident^name { .. /return { name 1 } !! scalar var .. }
Any thread that uses int_decl^xxx will be able in the action to refer to the field of xxx writing xxx.0 and xxx.1.
When a syntactical rule is matched the parser does one of the following:
We have already seen that the format := creates GLOBAL VARIABLES which remain alive until the EOF is reached, while the = one creates LOCAL VARIABLES which remain alive until the EOF (if declared at level 0) or the matching brace "}" (if declared within a block) are reached.
Variables declared within a block become alive when the block is parsed (executed). These variables can be used in the definition of other blocks inside the one which is currently parsed: these blocks, that are not executed now, will be called "inner blocks".
There is a major difference between the use of a variable within the block in which it is declared and its use in an inner block.
In the block in which variables are declared
Global and local variables can be used as usual in common languages in expressions or assignments within the block in which they are declared, as shown in the following example:
zz> /a = 3 zz> /b := 5 zz> /a = a + b zz> /b := b + 2 zz> /print a,b,(a*b + a) 8 7 64 zz> /stat -> test { .. /c = 10 .. /d := 25 .. /d := d + c .. /c = c + 1 .. /print c , d .. } zz> test 11 35
But their behavior is different, depending on the way they were declared, if used in inner blocks.
In inner blocks
About LOCAL variables
LOCAL variables stop existing when the block in which they are declared does. For this reason, when defining a new block inside the current block, those variables, if present, are immediately substituted by their values, that is they are fixed once for all. Then suppose that, within a block, we are going to define an object that will remain alive after the end of the block (for example global variables or rules) and that to define this object we need local variables already defined in the block. In this case we must be interested in the value of those variables because their value will remain alive within the object that we are defining, while the variable itself will be lost at the end of the execution of the current block.
For this reason in the inner object that we are defining the names of these variables are immediately substituted by their values so that they are no more variables but fixed strings or constant numbers (depending on their tag):
zz> /cc = 7 zz> /stat -> test_1 { .. /dd = cc + 3 !!here cc is immediately replaced by 7: that is !!/dd = 7 + 3 .. /print dd .. /stat -> dd { !!here dd is immediately replaced by 10 .. /ee := dd+1 .. /print ee ..} ..} zz> test_1 10 !!comes from /print dd zz> /rules RULES Scope Kernel /stat -> 10 !!the inner object we !!created during the execution of the test_1 /stat -> test_1 zz> 10 11 !!comes from /print ee zz> /cc = 9 zz> test_1 10 !!here cc is not replaced by 9; !!in fact it was replaced by 7 during the !!definition of test_1
Identifier and other expressions
Remembering that local variables, when entering a new block, are immediately substituted by their values, let us see an important difference about the use in an inner block of local variables (declared in an outer one) whose value is an identifier (strings of alphanumeric characters, underscores and dollars not beginning with a number) and those whose value is any other expression.
case variable = identifier:
Identifiers are legal names for variables, so in an inner block we can use the local variables that have an identifier as value in the left part of an assignment, creating a new variable whose name is the value of the old one:
zz> /colour = red zz> /stat -> test_2 { .. /print colour .. /colour=green !!this is red = green .. /d=blue !!local d .. /print colour,d !!this is /print red,d .. /param .. } zz> test_2 red green blue 0L colour == red 1L d == blue 1L red == green zz> /print colour !! the old value red zz> /print d d !!here d, defined in test_2, is no more alive! zz> /var = mickey zz> /stat -> link { .. /var = var&_mouse .. /print var .. /param .. } zz> link mickey_mouse 0L var == mickey 0L colour == red 1L mickey == mickey_mouse
case variable = any other expression:
Other expressions (different from identifiers) are not legal names for variables, so it does not make sense to use the name of local variables that have such values in the left part of an assignment. An attempt to use them in this manner would cause a syntax error, as we'll see in the following example:
zz> /ff = 13 zz> /stat -> test_3 { .. /ff = ff + 1 !! this is /13 = 13 + 1 that does !! not make sense! .. /print ff .. } zz> test_3 **** SYNTAX ERROR ****
About GLOBAL variables
In inner blocks we can refer to GLOBAL variables, already declared in an outer block, by their names. In fact, as global variables remain alive until the EOF, when entering a new block, their names are NOT substituted once for all by their values: if the variable is part of an expression, its value is replaced only when that expression is evaluated if the variable is within an action, its value is replaced only when the action is executed. Then, had they an identifier or any other expression as value, they can be used to the left of an assignment of the type /var := expression.
Vice versa a global variable, if declared in a block, can be referenced later in an outer block, as it is global:
zz> /aa := 4 zz> /stat -> test_4 { .. /cc := aa + 1 .. /aa := aa*5 .. /print aa .. /stat -> test_5 { .. /aa := aa + 5 .. /print aa .. } .. } zz> test_4 !! here aa is replaced by 4 20 zz> /param 0G cc == 5 !! cc is defined as global in !! test_1 0G aa == 20 zz> test_5 25 zz> /aa := 7 zz> test_4 !! here aa is replaced by 7 35 zz> test_5 40
Scope changing
It is possible to change at anytime the assignment mode of a variable from local (=) to global (:=) and vice versa in the same block in which the variable is declared.
On the other hand it is not possible to change a variable scope from an inner block.
The three different situations are analyzed in the following.
in inner blocks
About local to global
case variable = identifier:
If the variable has an identifier as value, trying to change from a local assignment in a block to a global assignment in an inner block will create a new global variable whose name is the value of the local one:
zz> /gg=cat zz> /stat -> change { .. /gg:=mouse .. /print gg .. } zz> change mouse zz> /param 0G cat == mouse 0L gg == cat (thus this is not a scope change!)
case variable = any other expression:
If the variable has any other expression (different from identifier) as value, the new assignment will cause an error: as said before, when entering the inner block, variable's name is replaced by its value that in this case would not be a legal name (because it is not an identifier).
zz> /aa=5 zz> /stat -> change_bis { .. / aa:=5 !!this is /5 := 5 that does not make !! sense ! .. /print aa .. } zz> change_bis **** SYNTAX ERROR ****
About global to local
Vice versa changing from a global assignment in a block to a local one in an inner block will not cause an error because, as expected, in the inner block a local variable with the same name of the global one is created but this new variable will stop existing when the matching brace } is reached:
zz> /bb:=6 zz> /cc:= 5 zz> /stat -> change { .. /bb=6 .. /cc=9*bb .. /print bb,cc .. /param .. } zz> /print bb !! the global one 6 zz> change 6 54 !! the local ones 0G cc == 5 0G bb == 6 1L cc == 54 1L bb == 6 zz> /param 0G cc == 5 0G bb == 6
Again this is not a change of scope.
There are three kinds of variables: Zz variables, Zz parameters, and thread variables. Of course if you are using Zz to develop a compiler you have to consider also the variables of your language, but for now let's ignore them.
We have already talked about Zz variables.
The Zz parameters are implicitly declared using a nonterminal bead within a thread:
syntagma ^ paramA parameter hides any identically named variable and its scope is the action attached to the thread. Pay attention because if param is a variable the value of that variable replaces the parameter itself in the thread:
zz> /c = alfa zz> /stat -> say ident^c { !!we are entering a new block .. /print alfa !!the param c is replaced by .. } !!alfa in the thread zz> say hello hello zz> /rules RULES Scope kernel stat -> echo stat -> say ident^alfa stat -> echo^s zz> /c=12 zz> /stat -> say ident^c {/print c} *** SYNTAX ERROR ***...
The third kind of variable (thread variable) is made up in the following way:
zz> /$arg -> alfa : return 154 zz> /print alfa 154
$arg is a predefined syntagma used in all the expression to match the arguments; if a new thread (say: alfa) is assigned to it when it matches (say met alfa) the returned value of $arg is the value returned by the action (here: 154). This kind of variables are global, of course it is possible to introduce a friendly interface to declare them:
zz> /stat -> let ident^name "=" int^val { .. /$arg -> name {/return val } .. } zz> let goofie = 3 !!goofie is now a global $arg zz> /print goofie 3
Syntax Extensions are organized in levels. All the levels have a name and they are organized in a stack. New rules are inserted by default in the current level, the top of the stack at startup. There default scope (level) is called the "kernel" scope.
A new scope is created by typing:
/push scope scope_nameAt this point scope_name is the current scope at the top of the stack and all the new rules inserted from now on will be assigned by default to this scope.
The current scope can be removed from the stack typing:
/pop scopeThe scope is not lost. It is only inactive and it can be restored typing again
/push scope scope_nameTo delete a scope it is necessary to type:
/delete scope scope_nameAll the rules that belong to that scope are lost. To insert a rule in a scope which is not the current top of stack the following syntax should be used:
/(scope_name)stat > myrule {...}The stack implies a hierarchy among the scopes. The parser in fact attempts to reduce a rule in the topmost level and, failing that, in the deeper active levels (inactive levels are not considered). If a rule is found at a certain level the parser ignores deeper levels. Within the same level Zz is not able to resolve an ambiguity. Newly created rules can hide rules in deeper levels, meaning that among rules with the same thread but different actions Zz will reduce the rule in the shallowest level.
If there are rules declared within scope_name with the clause /when delete scope the specified actions are executed (see in the following).
It is also possible to empty a scope using the following syntax:
/delpush scope scope_nameThat will delete and repush the scope scope_name.
It is possible to specify an action to be executed when the action associated to a thread is modified. The syntax is the following:
/when change action {action_a }Please note that the simplest statement to change a syntax is:
/syntagma -> thread {action_b }But usually the user introduces some statement to modify automatically the syntax: of course at some deepest level the statement is the simplest one.
The action action_a is executed if the action_b associated to the rule /syntagma -> thread is changed.
Table of Contents
Of course Zz is a program, but it is also available as a C library (libzz.a). If you want to use the Zz library you must provide the main program, some related routines, and your C-Procedures. In this environment you can define your 'hard coded' syntax and moreover you can attach your C routines directly to the syntactical rules.
E.g., suppose you have a valuable routine able to print an important sentence like:
hello() { printf("Hello World!\n"); }
And now you want to create a program that calls the routine when the user types 'say hello'.
The main program is the following:
main() { extern void hello(); kernel(); zkernel(); usrkernel(); zz_parse_tt(); }
Where 'usrkernel' is a routine the user provided which describes the syntax attached to the C-Procedure.
A possible form for the usrkernel() routine of our example is the following:
usrkernel() { zOpen("stat"); zKeyword("say hello"); zCall(hello); ZClose(); }
And, of course, a tool is available that produces this file automatically from the C-prototyping of the C-Procedures.
You have to compile the main program and the subroutine and link them with the libzz.a library. Now you can try:
zz> say hello Hello World!
And you can also use the Zz features:
zz> /for i = 0 to 5 { .. say hello .. } Hello World! Hello World! Hello World! Hello World! Hello World! Hello World!
Here follows a list of the routines you can use in your C-program to build your application.
kernel(); load the Zz base syntax. zkernel(); load the Zz metasyntax. zz_set_output(filename); write outputs to the file filename. zz_set_output(0); write outputs to the stdout. zz_set_prompt(prompt); set prompt for interactive sessions zz_set_default_extension(); set default extension for zz files (default: .zz) ret=zz_parse_tt(); parse stdin ret=zz_parse_file(filename); parse file ret=zz_parse_string(string); parse string; print_error_count(); Print a report about errors occurred during parsing phase.
N.B. It is possible to parse more than one source in the same program. e.g.,
main() { kernel(); zkernel(); usrkernel(); zz_parse_file("configuration"); zz_parse_tt(); }
This is able to read syntax definitions from configuration.zz and then use them during parsing of stdin.
You define a rule using the following routine calls:
zOpen(sintname); zKeyword(terminalbead); zMatch(nonterminalbead); zCall(procedure); or zCallFun(procedure,returnedtype) zClose(); sintname :string. the name of the sintagma procedure: address of the C-Procedure returnedtype: string. name of the tag associated to the returned value terminalbead: string. terminal bead. nonterminalbead: string. name of the non terminal to be matched (e.g.: "int"). Examples: dump_ident(name) char *name; { printf("dump: %s\n",name); } usrkernel() { zOpen("stat"); zKeyword("dump"); zMatch("ident"); zCall(dump_ident); zClose(); } main() { kernel();zkernel();userkernel(); zz_parse_tt(); }
The parameter passing between Zz and C-Procedures is quite simple. The syntactical rule linked with the C-Procedure consists of terminal beads and nonterminal beads. When the rule is reduced (and before the C-Procedure is invoked) each nonterminal bead has an associated value. Those values (in their order) build the argument list of the C-Procedure. In the C-code the arguments of the procedure have to be declared according with the types expected (e.g. int for non terminal int^, char* for nonterminal ident, qstring, and so forth).
The C-Procedure may be invoked as 'Zz procedure' (i.e. linked to a rule of the form: /stat -> .... ) or as 'Zz function' (i.e. linked to a rule of the form: /something_else -> ....). In the last case you want to specify the 'Zz type' of the value returned.
In other words when the C-Procedure has returned a value as 12345 Zz should be able to interpret the number as an integer value or as the address of a string or something else.
This is accomplished by the tag associated to the function.
E.g. let us define a C-Procedure implementing a Zz _function:
test() { return "goofie"; } usrkernel() { zOpen("$arg");zKeyword("test()"); zCallFun(test,"qstring");zClose(); }
We compile, link and run the 'usrZz '. Now, to check the result of our test, we define a Zz type discriminator:
zz> /stat -> which ident^name { /print "ident=",name } zz> /stat -> which qstring^string...{ /print "qstring=",string } zz> /stat -> which int^num {/print "int=",num}
Now you can try:
zz> /x = test() zz> which x qstring=goofie
If you change the tag (inside userkernel(): "qstring") you will obtain different behavior.
Note: You can use as 'tag' the identifier you prefer. For instance you can associate to the function 'test' the tag "myobject". If you do this you also have to provide specific procedures able to handle "myobject"s. Only those procedures will be able to handle the value returned by 'test'.
Let us suppose that you have a SCP function: fopen and a SCP procedure (let us ignore the return value) fputs having the same parameters of the C-language:
zz> /filepointer = fopen ("my_file", "w") zz> fputs ("hello world!", file_pointer)
Formally a SCP procedure not returning a value is used like a Zz statement with the same format of a C routine call.
A SCP procedure returning a value is called within an expression with the same format as above.
This version of Zz uses 4bytes for representing integer and float. This creates a little problem when passing float, because C compilers cast to double the float arguments. So if you write a C-Procedure with some float argument this will go wrong. A similar problem arises for the returned value.
The solution up to now is to declare the C-Procedure as returning and/or accepting long integer and converting the values into/from float inside the procedure using the following trick:
gasp(ix) long ix; { float x,y; long iy; x = *(float*) .... iy= *(long int*) return iy; }
Get the name of the current file (for interactive session return 'stdin').
name = get_source_name();Get the name of the current source (for interactive session return 'stdin').
get_source_line();Get current line number.
fprintf_source_position(chan,flag);Write the current line with an arrow marking the current position and write down the current line number, the current file and so forth.
In this paragraph we describe the environment to develop a Zz application. You need to customize mainly three files. The names of these files are free, let's call them: ua.c, sua.zz and main.c.
You need access to the Zz kernel object library, to the Zz include files and to decl.hz.
You need the files containing the C-Procedures needed for your application.
You will describe all the procedures using UCP mechanism within: ua.c, all the soft C-Procedures have to be described within sua.zz and main.c is the main program of your application, within main.c you invoke Zz (Zz is a C callable routine).
Before invoke Zz your application as to call kernel(), this routine initializes the environment that Zz need. The Zz routine has following prototype:
void zz(char* file_in,char* file_ext,char* file_out);
It is possible to use default values (if the parameter is zero): default file_in is stdin, default file_out is stdout, default file_ext is ".zz".
Example:
main() { kernel(); zz(0,0,0); /* uses all defaults */ }
The SCP mechanism is quite naive: the user has to write a file (say sua.zz) with the ANSI standard prototype of his functions and subroutines, like in the following example:
/include "hzlib:decl.hz" begin int fopen( char * file_spec, char *a_mode); void fputs ( char *buffer, int file_ptr); end
Zz is able to read this file and produces a C source file. The command can be the following:
$ zz +C sua.zz sua.c
The file "sua.c" will contain the description of the syntax to invoke the user procedures and the appropriate calls. This file has to be linked with the Zz kernel and with the file[s] containing the user C procedures (the example doesn't require anything more because fopen and fputs are in the standard C libraries). The resulting executable program will be a configured version of Zz including the Soft user C-Procedures.
There are two ways to extend Zz by adding in external procedures written in C: recompiling the Zz library or interactive component(zzi.c), or by building external libraries and then loading them dynamically. The first method is described in the section titled "Semantic Interface".
We will explore the second method in this section in a series of examples.
To begin well will start with a most basic example that only serves to demonstrate the dynamic loading/linking process. First we need a C program that we will compile into a shared object library:
Example 4.1. Basic Test Program
void init() { printf("Inside lib init().\n"); }
After saving that in a file called "test.c", we can compile it using the following command (on Linux in our case):
$ gcc -shared test.c -o test.so
The "-shared" flag to the compiler indicates that the output is a library and that the internal references do not need to resolve at compile time - they will link during the dynamic loading process.
We should now have a shared object file ready for loading:
/apona/home1/homedirs/brooks/openzz/src> ls -l test.* -rw-r--r-- 1 brooks apedevel 56 Jan 22 13:50 test.c -rwxr-xr-x 1 brooks apedevel 5893 Jan 22 13:50 test.so /apona/home1/homedirs/brooks/openzz/src>
Now we can launch Zz and load the library:
/apona/home1/homedirs/brooks/openzz/src> ./zz Zz 32-bit Version 7.0 with Dynamic Lexical Analyzer APE Group INFN (March 1998), modified at DESY (April 2000) interactive session zz> /load_lib "/apona/home1/homedirs/brooks/openzz/src/test.so" Library '/apona/home1/homedirs/brooks/openzz/src/test.so' Loaded. Inside lib init(). 'init()' executed for library '/apona/home1/homedirs/brooks/openzz/src/test.so'. zz>
So we can see that we have confirmation that our library was loaded and moreover that the init() function was called. init() is a special function in that it will be executed when the library is loaded if it is present - init() is optional.
A fairly simple example but it gets us started.
In this example we will build a C library which we will then load into Zz dynamically in order to extend its grammar. The library will provide a simple new command "echo" which will just echo its argument back to the console in a detailed format. This program demonstrates two useful items: how to extend the Zz grammar, and how to pass parameters to C-Procedures.
First, the program:
Example 4.2. Extending the Grammar
#include <stdlib.h> #include "zlex.h" #include "kernel.h" s_echo_args(argc,argv,ret) int argc; struct s_content argv[], *ret; { int i; printf("'echo' syntagma called with %d arguments.\n", argc); printf("Arg 0 type: %s\n", argv[0].tag->name); printf("Arg 0 value: %s\n", s_content_svalue(argv[0])); } void init() { OPEN(stat) M("/echo") GSB(string_e) PROC(s_echo_args) END }
Let's examine this program in a little detail:
Lets compile this example using the same compilation command from the first example:
/apona/home1/homedirs/brooks/openzz/src> gcc -shared test.c -o test.so /apona/home1/homedirs/brooks/openzz/src> ls -l test.* -rw-r--r-- 1 brooks apedevel 403 Jan 22 15:51 test.c -rwxr-xr-x 1 brooks apedevel 6995 Jan 22 15:58 test.so /apona/home1/homedirs/brooks/openzz/src>
... and then we can execute our test in Zz:
/apona/home1/homedirs/brooks/openzz/src> ./zz Zz 32-bit Version 7.0 with Dynamic Lexical Analyzer APE Group INFN (March 1998), modified at DESY (April 2000) interactive session zz> /load_lib "/apona/home1/homedirs/brooks/openzz/src/test.so" Library '/apona/home1/homedirs/brooks/openzz/src/test.so' Loaded. 'init()' executed for library '/apona/home1/homedirs/brooks/openzz/src/test.so'. zz> /echo "an arg" 'echo' syntagma called with 1 arguments. Arg 0 type: qstring Arg 0 value: an arg zz>
Here we see that we have added the new command to Zz "echo" and after using it we get some information about the parameter that we called it with.
We have seen how to extend Zz by adding new commands from dynamically loaded libraries, and we looked at an example of how to access the parameters that were passed to such commands. Now let's consider how data can be passed back to the Zz program environment from the execution of a command.
We will start by talking a little more about the syntax of the command declaration (macro) statement. For variety let's look at another command defined in kernel.c:
OPEN(float) M("cast_to_float") M("(") GSB(double) M(")") PROC(zz_doubletofloat) END
We talked about each of these parts in a previous example but here we'll focus in on some details:
Before we set you free to dissect Zz, let's finish with our example. We would like to demonstrate returning a value from a C-Procedure, and to do this we'll create a library function lcase() that converts its string argument to lowercase.
Example 4.3. lcase() - Convert Arguments to Lowercase.
#include <stdlib.h> #include "zlex.h" #include "kernel.h" #include "err.h" s_lcase(argc,argv,ret) int argc; struct s_content argv[], *ret; { int i, len; char *s_tmp, *src; // Set a reasonable default for the return value ret->tag = tag_qstring; s_content_svalue(*ret) = NULL; // Test that command arguments are valid if (argc != 1) { zz_error(ERROR, \ "s_lcase() called with incorrect # of params(%d), expecting 1.", \ argc); return 0; } if (argv[0].tag != tag_qstring) { zz_error(ERROR, \ "s_lcase() called with param type(%s), expected 'tag_qstring'.", \ argv[0].tag->name); return 0; } // Make an alias for the input string - keeps things clean src = s_content_svalue(argv[0]); len = strlen(src); // Allocate a temp buffer to create new string in s_tmp = malloc(len + 1); // Ensure malloc succeeded if (!s_tmp) { zz_error(ERROR, \ "s_lcase() system error while executing 'malloc'."); return 0; } // Copy and convert the contents of our source string to buffer for (i=0; i<len; i++) s_tmp[i] = tolower(src[i]); // Bring over the string terminator symbol s_tmp[len] = '\0'; // Use the internal 'zlex_strsave' function to make // a canonical copy - important! s_content_svalue(*ret) = zlex_strsave(s_tmp); // Free up the temporary buffer storage free (s_tmp); return 1; } void init() { OPEN(qstring) M("lcase") M("(") GSB(qstring) M(")") PROC(s_lcase) END }
Being that this is a more realistic example it has grown somewhat. Since the code is commented we'll just talk about the new additions since the last example:
Let's now compile and run our test:
/apona/home1/homedirs/brooks/openzz/src> gcc -shared test.c -o test.so /apona/home1/homedirs/brooks/openzz/src> ls -l test.* -rw-r--r-- 1 brooks apedevel 1499 Jan 23 16:09 test.c -rwxr-xr-x 1 brooks apedevel 7757 Jan 23 17:17 test.so /apona/home1/homedirs/brooks/openzz/src> ./zz Zz 32-bit Version 7.0 with Dynamic Lexical Analyzer APE Group INFN (March 1998), modified at DESY (April 2000) interactive session zz> /load_lib "/apona/home1/homedirs/brooks/openzz/src/test.so" Library '/apona/home1/homedirs/brooks/openzz/src/test.so' Loaded. 'init()' executed for library '/apona/home1/homedirs/brooks/openzz/src/test.so'. zz> /lcase ("teST") + **** SYNTAX ERROR **** | got: '(' | expected one of: '=' '-' ':' int | /lcase ("teST") | ^ | line 13 of stdin
OK! First thing to notice here is that when you specify that a function is of a certain syntagma type other than stat, Zz is expecting you to use it as an "R-Value", or in other words you need to assign or use the result of this function somewhere. Let's continue:
zz> /s = lcase ("teST") zz> /print "s=" & s s=test zz> /print lcase("This Was An InitCapped String.") this was an initcapped string. zz>
Ahh... much better!
Having demonstrated passing of data to and from C-Procedures we'll conclude our library examples here. There's certainly quite a lot more to learn: take a look at kernel.c for the thread syntax definitions and also look in sys.c to see how their handlers are implemented.
Table of Contents
To invoke the basic (unconfigured) version of Zz the command is:
$ zz [ filein [ fileout ] ]If you omit filein, Zz gets the input from the standard input, if you omit fileout the output is given on the standard output.
There is a lexical analyzer that reads the text to be parsed and converts everything to tokens. The internal representation of a token is a couple: (tag, value). The lexical analyzer may return the following tags:
IDENT, FLOAT, INT, QSTRING, CHAR, EOL, EOFThe parser gets these tokens (value, tags) from the lexical analyzer. When the lexical analyzer finds a special character outside quotes it gives the token to the parser with the tag: CHAR.
The double exclamation mark, !!, is interpreted by the lexical analyzer like an EOL. All the characters following this symbol until the true EOL are ignored.
Three contiguous dots: ... are interpreted like a "line continues" marker. It means that the line has to be completed with the following line. All the characters to the EOL are ignored.
The lexical analyzer ignores all redundant spacing. Space and/or tabs are significant only to separate identifiers and numbers. It should be noted that special characters are always to be considered different tokens.
Examples:
Input stream tokens [tag, value]
"+hiA"A++ [qstring,+hiA][ident,A][char,+][char,+] "+h i A" A+ [qstring,+h i A] [ident,A] [char,+] +hi B 3 [char,+] [ident,hi] [ident,B] [int,3] +hiB3 [char,+] [ident,hiB3]
The parser gets tokens from the current source. The current source can be the standard input (an input file or the TTY) as well as a list of tokens within a Zz variable (e.g. an action attached to a successfully matched thread). When the current source is an input stream, the tokens are created by the lexical analyzer.
The parser accepts a sequence of statement according with syntactical rules attached to the syntagma stat. The user can introduce his own statements specifying new syntactical rules to be added to the syntagma stat. More generally the whole Zz syntax can be extended and modified.
There are some implicit precedence rules that the user cannot control. The parser uses the following order of precedence to accept a token:
When beads are in competition to match a token the parser chooses immediately based on the precedence of the tag. E.G. If a token is either a legal identifier or a keyword the parser will match it as a keyword.
Zz starts and recognizes a basic language called Zz language level 0 or simply Zz L0. By means of syntax extensions this language can evolve.
The character: ";" (semi colon) at the end of the statement is necessary to put two or more statements on the same line.
All the Zz L0 statements are prefixed with the character "/", this symbol can be useful to distinguish added statements from the original ones. The user can however introduce statements prefixed with the same slash: "/".
Probably the most important statements of Zz L0 are the ones to increment or modify the syntax. They are described in a separate chapter.
A syntax extension is completely defined giving a production rule and (optionally) the action to be executed when the parser reduces it.
A production rule consists of a nonterminal, called the left side of the production ("target syntagma"), an arrow, and a sequence of terminals and/or nonterminals, called the right side of the production (called a "thread of beads").
The Zz statement that allows the syntax extension has the following format:
/target_syntagma -> thread [action ]That means that wherever a target_syntagma is acceptable the parser will also accept the pattern specified in the thread.
target_syntagma is any legal identifier. The user can create a new syntagma simply using it. A very common syntagma is stat (for "statement") because the parser tries to interpret the source as a sequence of stats.
Thread is a sequence of beads (separated with spaces or tabs):
bead_1 bead_2 ..... bead_nThere are two types of beads: simple (terminal) beads and nonterminal beads. A simple bead is either an identifier, a number (float or int), or a quoted string to be matched exactly. The nonterminal beads have the following format:
syntagma_y ^ parameterThe parser will use syntagma_y to match the input source and, if available, a result will be returned giving a value to the parameter. This value is available in the attached action only.
The action is an optional field; if omitted a default action is performed. The action is a list of tokens. A well formed (usable) action is made up of a list of statements; it has the following format:
{ zz statements [/return expression [as tag]] }The statement /return is explicitly remarked because it is meaningful only within actions. Zz statements are a sequence of user defined statements as well as predefined statements separated with new lines or with semicolons.
As written above, there are two kinds of beads: simple (or terminal) beads and non terminal beads.
The behavior of the parser is the following:
A simple bead can be an identifier, integer number, a floating point number and a quoted string.
Examples: HALLO 666 3.1415 "ABC > 22+C"The first bead matches only the identifier HALLO the second bead will match the integer number 666 (as well as 00666), the third bead will match the floating number 3.1415 (as well as 03.14150 or 31415e4), the fourth one will match the sequence of token ABC > 22 + C (no matter of care of the spacing). Indeed the bead "ABC > 22+C" is totally equivalent to the sequence of beads: ABC ">" 22 "+" C.
A non terminal bead is used to match a syntactical construct (syntagma). The format (within a thread) to insert a non terminal bead is:
syntagma ^ parameterThere are two kinds of syntagmas (and corresponding two types of beads): lexical syntagmas and derived syntagmas.
The lexical beads are:
These beads match the tokens with corresponding tag and by convention return in the parameter the corresponding value (returned by the lexical analyzer). The first bead will match well formed identifiers, the second one string within double quote: " ", the same for float and int. There are special situations when it is useful that the parser accepts any token; the special bead useful in this case is any^. It is possible to attach new rules to a lexical syntagma.
It is important to underline that Zz in order to handle variable and parameters has to identify as soon as possible identifier that are defined as Zz variables, thus the syntagma param matches only identifier having a value and it returns the name of the variable.
The derived beads are a directive for the parser to match the rules corresponding to the related syntagma.
Example:
syntagma_x ^ parameter_xTo be effective some rules of this kind will be defined to give a meaning to syntagma_x:
Example:
/syntagma_x -> thread_a {action_a; /return xxxx} /syntagma_x -> thread_b {action_b; /return yyyy} /syntagma_x -> thread_c {action_c; /return yyyy}
If the successfully matched thread is the list of bead thread_b then parameter_x value will be yyyy.
A bead always matches a variable with a tag having the same name of the bead's syntagma; e.g. the bead colour^value will match a variable with tag colour.
The lexical syntagmas are defined in a previous chapter. This is a summary of them and a short description of the derived syntagmas available within the kernel:
stat^$ matches a Zz statement statlist^$ matches' one or more Zz statements divided by ; or EOL param^ ret matches a Zz parameter or variable and returns its name list_e^ ret matches a list expression and returns the list num_e^ ret matches a numeric (int or float) expression and returns its value string_e^ ret matches a character expression and returns its value int^ ret (lexical) matches a unsigned integer number and returns its value float^ ret (lexical) matches a unsigned float number and returns its value ident^ ret (lexical) matches a identifier and returns its name. qstring^ ret (lexical) matches a quoted string and returns the string.
It is possible to specify an action to be executed when the action associated to a thread is modified. The syntax is the following:
/when change action { action_a }Please note that the simplest statement to change a syntax is:
/syntagma -> thread {action_b }But the user usually introduces some statement to automatically modify the syntax, of course at some deepest level the statement is the simplest one.
The action action_a is executed if the action_b associated to the rule /syntagma -> thread is changed.
For example:
zz> /stat -> changing { /print alfa } zz> /when change action { /print "action changed" } zz> /stat -> changing { /print beta } action changed zz>
Zz variables have a name, a value and a tag. Usually the following tags are used: ident, int, float, qstring, list. New tags can be introduced (a tag can be any identifier).
To create a Zz variable you have to assign a value to it. The simplest statement is the assignment:
/ variable = expression [ as tag ]or
/ variable := expression [ as tag ]Variable is the name of a variable (any identifier is allowed). eg:
goofie, Hello, a_b
Expression may be integer, float, quoted string, single identifiers, list and allowed combinations. The 4 arithmetic operations and parenthesis are allowed on integer and floating point numbers with the conventional precedence rules. The resulting type of the expression is float if any of the operands are float.
There is a list and string concatenation operator: "&". This can also operate on numeric values or identifiers taking the literal representation of the numbers and the ASCII representation of the identifier. e.g.:
"Rose thou"&" are "&sick, {1,2}&{3,4}
Note that variables are allowed in the expressions.
In the assignment the resulting type of the expression fixes the tag of the target variable. It is possible to explicitly force the tag type with the clause "as". In the clause 'as tag', tag may be any identifier (e.g. int, qstring, list, color, town).
The format := creates GLOBAL VARIABLES which remain alive until the EOF is reached, while the = one creates LOCAL VARIABLES which remain alive until the EOF (if declared at level 0) or the matching brace } (if declared within a block) are reached.
LOCAL variables stop existing when the block in which are declared does. For this reason, when defining a new block, those variables, if present, are immediately substituted by their values.
There is an important difference about the use in an inner block of local variables (declared in an outer one) whose value is an identifier (strings of alphanumeric characters, underscores and dollars not beginning with a number) and those whose value is any other expression:
case variable = identifier:
In this case the names of local variables can be used in the left part of an assignment thus creating a new variable whose name is the value of the old one (identifiers are legal names for variables)
case variable = any other expression:
Other expressions (different from identifiers) are not legal names for variables, so in this case it does not make sense to use the name of local variables in the left part of an assignment. An attempt to use them in this manner would cause a syntax error.
In inner blocks we can refer to GLOBAL variables, already declared in an outer block, by their names. In fact, as global variables remain alive until the EOF, when entering a new block, their names are NOT substituted once for all by their values:
So they can always be used to the left of an assignment. Vice versa if declared in a block, a global variable can be referenced later in an outer block, as it is global.
It is possible to change the scope of a variable from local to global and vice versa only in the block where the variable is defined (for local to global) or in the outer block level 0 (for global to local)
This is the format to create a list:
{ token_1 token_2 ..... token_n }A list expression is made up with the list concatenation operator: "&". It is possible to refer to an item of a variable containing a list using the following format:
variable . item_numbervariable is a variable containing a list. item_number is an integer number, lists being indexed with the first item as 1.
The lists are used to introduce blocks of statements (like the actions connected with a rule).
The tokens in the lists are any character with the exception of a right bracket (}) or an unmatched double quote (").
An item in a list can be a variable but regardless from the scope of the variable (LOCAL or GLOBAL) its value is inserted once for all in the list when it is defined.
The following utilities are available within Zz L0:
/dumpnet syntagmaShows the whole syntactical network attached to a syntagma.
/memoryShows the memory usage and the variation of it.
/include filename[.hz] /include filename.type /include "filename"
To include a Zz source file.
/print argument_listTo print something on the screen. Arguments of any basic type can be printed. The arguments have to be separated with commas. Available arguments are: qstrings, integer and float expressions, lists, the length of a qstring (i.e. the number of characters in the string) or of a list (i.e. the number of the element of the list) and any item of a list.
Examples:
zz> /my_list = { alfa b c , "anymore" 23.4 } zz> /print my_list.1 , my_list.4 alfa anymore zz> /print my_list.length 6 zz> /aa = "test" zz> /print aa.length 4 /beep [ message ]
This statement prints a sequential number, the cpu time, the name of the input file, the line number and optionally a message.
/execute list_of_statIs used to execute a block of statement contained in a list.
/rules [ syntagma ]Prints all the user rules or the rules attached to a specified syntagma.
/krules [ syntagma ]Prints all the (user and kernel) rules or the rules attached to a specified syntagma.
/error [ message ]Like /print but outputs results as an error message.
/paramIt shows all the variables, their values types etc...
tag_of(param)Returns the tag (type) of a variable. Note this resolves to a type of qstring itself so it must be used as part of another statement, i.e. /print tag_of(my_var).
/trace optionTo trace the parser actions. Allowed values for option are:
The action is executed (stop_val start_val + step_val)/step_val times; start_val, stop_val and stepval must be integer expressions (float are not recognized).
/foreach variable in list { action }The action is executed once for each element in the list. Variable takes at each iteration the value of an item in the list.
/if logical_condition { action }The action is executed if the condition is true.
The following relational operators are provided:
== equal != not equal < less than > greater than <= less than or equal to >= greater than or equal to
They can all be applied to integer expressions while only == and != can be applied to strings.
/push scope scope_name /pop scope /delete scope scope_name /delpush scope scope_name
The scopes are identified by a name. The default scope is kernel. A new scope is created with /push scope scope_name; the new created rules can hide the old scope.
To exit a scope use /pop scope; this command does not delete the syntactical rules of the scope at the top of the stack, it only saves and hides them; to delete the rules of the scope scope_name use the statement: /delete scope scope_name.
If there are rules declared within scope_name with the clause /when delete scope the specified actions are executed.
To empty a scope use the syntax /delpush scope scope_name that will delete and re-push the scope scope_name.
The action is a list of tokens. Usually it is associated to a syntax extension. It is executed when the grammar rules is reduced (Zz has matched the rule) or when the statement /execute is issued.
See Zz application.
Is the basic element of a thread. There are terminal (simple) beads and non terminal beads. The terminal beads are tokens to be matched exactly (explicit constant numbers, keywords etc.), non terminal beads are made up with a syntagma and a recipient of the actual value the bead will match. The form of a non terminal bead is:
syntagma^var
A program written in C and linked with the Zz kernel, which knows the C-Procedure entry point. Zz will invoke the C-Procedure as specified by the user. There are user written C-Procedures (used to configure Zz to exploit a certain set of user functions within an application) and kernel or system C-Procedures furnished within the Zz kernel.
A syntagma made up of assigned threads created by using the syntax extension statement.
A grammar that may grow during the parsing phase itself.
Syntactical rules are organized in levels. Thus a level is a set of syntactical rules. Levels are ordered, named, and can be active or inactive. The rules in the higher levels hide those in lower levels.
A syntagma returning one of the lexical tags, all these syntagmas are built in within the Zz kernel.
See bead.
Also called a "Syntax rule" or "production rule", is the right side of the Syntax extension statement.
A special statement used within an action. It is used to give a value to the variable associated with a non terminal bead.
The Zz language statement(s) used to extend the syntax recognized by Zz.
The syntagma is a basic structure in the syntax. A syntagma has a name and 0 or many rules (threads) defining what the syntagma will match. A syntagma can be extended (adding more threads to it) using the statement: /syntagma > thread [action]. The common way to refer to a syntagma is using it in a non terminal bead, within a thread: .... syntagma^var ...
A thread is something that the parser will match with the input tokens. A thread is a list of beads. All the threads are organized within syntagmas. The only way to define a thread is adding a thread to a syntagma. It is possible to specify an action to be executed when a thread matches something.
A Zz variable has a name a value and a tag. A variable is defined with the assignments statements = or := or is the left side of a non terminal beads (after the caret symbol: ^).
This is the result of the union of the User C-Procedures and the Zz Kernel (the result we obtain configuring Zz). Usually a Zz application is characterized by a very rich and pleasant syntax too.
The Zz kernel is the unconfigured version of Zz. The Zz Kernel recognizes the Zz language and is able to call the Kernel C-Procedures.
The basic language that Zz recognizes before any language extension is done.
Table of Contents
It is possible to use Zz in a lot of different contexts, although usually it is used to define Command Language Interpreter and Compilers. Some of us are using Zz to design innovative graphical user interfaces or Protocol Adaptive Networks.
All the Zz applications benefits of the dynamic feature which makes the user able to dynamically redefine and to extend the language or the protocol recognized by its application.
In this document we avoid giving too much formal specification in the tutorial guide. Within this informal context it is possible to say that the Zz "recognizes" a wider class of grammar than say a classic LR parser.
It is impossible at a pure syntactic level to introduce in a classic static parser the concept of declared variable or declared routine. In a classic compiler this dirty job (or part of it) is devolved to the semantic.
Of course this is a major problem for new languages like ADA or C++ that try to introduce something like a limited degree of growth in the syntax (new objects, strong type checking etc...).
This chapter will show, using some examples, how to imagine the, Zz based, new compilers.
This appendix examines some examples that are by themselves a part of the work of a compiler writer. The problem we explicitly solve in few lines are: variable, record and types declaration, subroutine and cycles implementation. We leave to the user imagination the assembly language format or the user routines to write the object code and any kind of optimization.
Almost all languages have the variable concept. A variable is basically a name with some information attached to it (e.g. the address) a standard compiler could attach to the variable name also the type. Zz has only to remember the address because of its capability of dynamically insert a name in the proper grammar rule.
To define a variable means to modify (in the variable scope) the grammar accepted by the compiler. All humans (at least the compiler writers!) knows: when I define the real variable ``goofie'', the terminal goofie will be accepted (apart from the scope rules) where the compiler accepts a real variable and, for instance, the compiler could use there the variable address (eg: 0x1234).
Zz is able to understand this concept, the proper way to explain it is:
/real_var -> "goofie" {/return 0x1234}The language syntactic rules to handle the variables have to be inserted yet. Now the simplest way to manage a real_var is to introduce a very simple I/O operation like a "write" (of course write means to generate the proper assembler code):
stat -> write real_var^v { /print "GOSUB write_real_var #",vAfter all Zz is ready to accept this statement:
write goofie
And emits on the standard output device:
GOSUB write_real_var #0x1234
Of course a lot of operations have to be defined to handle properly a "real_var"; we will show something in the following.
Let us imagine that a real_var needs 4 bytes and we use a Zz internal register (a Zz variable ) to manage the memory allocation. The Zz variable we use (say "curr_address") has to be initialized to the proper value and the variable declaration of goofie has to follow the following schema:
/curr_addr = 0xA0000 /addr = cur_address /cur_address := cur_address+1 /real_var -> goofie {/return addr}
NOTE. addr (having the meaning of local variable) is defined using = and it is immediately replaced while cur_address is defined using :=. So doing the above defined syntactical rule, the first time, has the meaning of:
/real_var -> goofie { /return 0xA0000 }
To declare a new variable the right sequence could be:
/addr = cur_address /cur_address := cur_address+1 real_var -> tommy { /return addr }
And tommy is allocated at 0xA0001, of course this way to declare a variable is quite unfriendly.
To allow the variable declaration with a more conventional statement we have to introduce a statement capable to define a new syntax rule.
As an example it is possible to introduce this code:
stat -> real ident^var_name { /addr = cur_address /cur_address := cur_address+1 /real_var > var_name { /return addr } } /cur_address:= 0x1000
Creating the statement:
real var_name
and the programmer can write as an example:
real alfred real barbara
Zz then inserts the rules:
/real_var -> alfred {/return 0x1000} /real_var -> barbara {/return 0x1001}
Using another level of indirect declaration of syntax rules we can insert new variable types; this is the code:
stat -> type ident^type_name { /var = type_name&_var /stat -> type_name ident^var_name { /addr = cur_address /cur_address := cur_address+1 /var -> var_name { /return addr } } } /cur_address := 0x1000
This introduces the new statement with the general format:
type custom_type
NOTE. In the above listed schema a new sintagma name is created using the string concatenation operator "&". As an example the new created type angle uses the new sintagma angle_var. This trick will be used often in the following.
We can try the new statement defining the type angle:
type angle angle teta
Zz does this work for you:
stat -> angle ident^varname { /addr = cur_address /cur_address := cur_address + 1 /angle_var -> var_name { /return addr } } angle teta
This will create the rule:
/angle_var > teta {/return 0x1000}
The above defined statement to declare a variable is quite simplified. The first required improvements will allow to declare a variable list in order to accept:
real a,b,c,d
and could be useful to specify the memory occupation of any type.
stat -> type ident^typename num_e^typesize { /var = typename&_var stat -> typename identlist^varnamelst { /foreach varname in varnamelst { /addr = cur_address /cur_address:=... cur_address+typesize /var -> varname { /return addr } } } } /cur_address := 0x1000
NOTE. identlist is a predefined sintagma matching a comma separated list of identifiers.
The allowed statements are:
type custom_type custom_type_size custom_type list_of_variable
Examples:
type complex 2 type real 1 real x,y,z complex a,b,c
Zz automatically inserts the rules:
/real_var -> x { /return 1000 } /real_var -> y { /return 1001 } /real_var -> z { /return 1002 } /complex_var -> alfa { /return 1004 } /complex_var -> beta { /return 1006 } /complex_var -> gamma { /return 1008 }
Using the mechanism of type we can create objects with arbitrary name and size. To define structures we need something to extract a part of a variable.
As an example:
type bad_struct 3 bad_struct a,b
This declares two variable of three items each but it is impossible to access an item within the variable.
We need rules of this kind:
/real_var -> structured_var^v .x { /return v+0 } /real_var -> structured_var^v .y { /return v+1 } /real_var -> structured_var^v .z { /return v+2 }
so doing an item of something structured is usable as a real variable.
In the following paragraph we'll show how to automatically create the above mentioned rules. The user syntax could be the following:
record record_type custom_type item_list endrecord
As an example:
record point real x,y,z endrecord
and to declare a struct and to access an item...
point position write position.x
We want that the syntax to declare the record fields is the same used to declare a variable; of course we need an offset to access items within a record. It is possible to say that we have an address within the record (say cur_offset)...
stat -> type ident^typename num_e^typesize { /var = typename&_var stat -> typename identlist^varnamelist { /foreach varname in varnamelist { /addr = cur_address /cur_address := cur_address + typesize /var -> varname { /return addr } } } /record_stat> typename identlist^fieldnamelist { /foreach fieldname in fieldnamelist { /addr = cur_offset /cur_offset := cur_offset + typesize /var -> cur_record_var^v "." fieldname {/return v+addr} } } }
In the above example a variable declaration is a good statement (stat) while a record_stat allows declaration of record fields. In the example we suppose: cur_offset initilized to 0 and cur_record_var initialized to the name (the syntagma) of the record we are declaring (e.g. if we declare a record, say our_record, then cur_record_var has the value of our_record_var)
Now we create the syntax of the statement record. We have to initialize cur_offset, to accept record_stat and to invoke type with record name and length.
/record_head -> ident^record_name { /cur_offset := 0; /cur_record_var := record_name&_var /return record_name } /record_body -> record_stat^$ "\n" /record_body -> record_body^$ record_stat^$ "\n" /stat -> record record_head^record_name "\n" record_body^$ end record{ type record_name cur_offset }
Now we can try our master piece:
record point real x,y,z endrecord
This does automatically something like:
/real_var -> point_var^v ".x" { /return v+0 } /real_var -> point_var^v ".y" { /return v+1 } /real_var -> point_var^v ".z" { /return v+2 } type point 3
Of course we can use our new described record and declare a variable of that kind:
point position,speed
We can access the whole record as well as a single item:
write position.y
with this line Zz reduces the following rules:
/point_var -> position { /return1000 } !!valore: 1000 /real_var -> point_var^v.y { /return v+1 } !!valore: 1001 /stat -> write real_var^v { /print ... }
Let's imagine that our target language is able to understand a stack oriented assembler. Our assembler accepts instructions operating over variable address: PUSH, ADD, MOVE, etc... We would like to introduce conventional expressions:
zz> /$arg -> real_var^$ : pass zz> /stat -> real_var^ris"="expr^a{/print "move to", ris} zz> /expr -> term^t zz> /expr -> expr^e "+" term^t { /print "add" } zz> /expr -> expr^e "" term^t { /print "sub" } zz> /term -> fact^f zz> /term -> term^t "*" fact^f { /print "add" } zz> /term -> term^t "/" fact^f { /print "div" } zz> /fact -> real_var^num { /print "push ", num } zz> /fact -> "(" expr^e ")" zz> /fact -> "" fact^f { /print "change sign" }
Now we can try ( using the declaration defined above ):
zz> type real 1 zz> real a,b,c zz> a=b+c push real_var:1001 push real_var:1002 add move to real_var:1000