SIMPLE documentation

Contents


Introduction

About this document

This document describes the SIMPLE language. It is both an introduction to the SIMPLE language, the specifications of the SIMPLE language and the reference manual of the simple program.

The version of simple described in this document is version 1.0.2. This document is quite alpha right now, but I have good hopes that it will get beyond that stage some day.

About SIMPLE

Why ``SIMPLE''?

SIMPLE stands for ``Simple Is a Macro Processing Language Element''. Well, I admit that last part of the name was just the best I could find which begins with an E.

I once read a fortune joke about ``the lesser-known programming languages'' which described the language SIMPLE as being composed of only two instructions, ``BEGIN'' and ``STOP'', neither of which did anything: in that way the same things can be achieved as with other programming languages but without any need for frustration and tedious debugging. As the version 1.0.0 of simple had exactly two instructions: @id@ and @void@, I thought the name SIMPLE was quite adequate.

More seriously, when I first started writing simple, I intended to write something very simple, as I had very modest means. It turns out that I produced a language far more complicated and powerful than I had first expected. In fact, the syntax is so unbelievably strange (albeit completely logical) that SIMPLE programs are the most complicated thing in the world to understand. So the name now remains as a piece of irony.

What does SIMPLE do?

simple is a preprocessor; in other words, it reads an input file, interprets the few things that are meant for it to interpret, and leaves the rest (generally most of the file) alone. The idea is that the file is really meant to be used with another program, or be some other kind of input, like TeX, C, html or perhaps just plain ascii, and that the preprocessor is used to make the writing of whatever kind of file is supposed to be written, simpler, for example by avoiding tedious repetitions or some such thing.

simple is of the ``macro processing'' kind, as are for example cpp and m4. In other words, its essential action consists of evaluating and expanding macros which are defined by the user (generally). In fact, SIMPLE is similar to m4 in its functioning (I was strongly inspired by the sources of m4 when I wrote simple); but it is very different in its syntax.

Initial motivation

The initial idea behind SIMPLE was this: I got fed up of TeX existing in a thousand and one dialects (essentially, plain TeX, LaTeX and my personal macro set, GroTeX, but each one having many internal variations), so I thought, wouldn't it be nice having some kind of universal TeX language, which is a document description language (possibly entirely unlike TeX - but completely standardized) which can be easily converted to any kind of TeX dialect by loading an appropriate macro package. For example, a single command like @fraction(2,3) would produce {2\over 3} in plain TeX mode and \frac{2}{3} in LaTeX mode, and so on. Besides, the output wouldn't even have to be some kind of TeX; for example, the same input could, with an appropriate macro package, produce HTML output (with a very approximative rendering of formulae) or just plain ascii or *roff or whatever. In fact, *roff is already something of the kind, and it could probably be (and indeed, already is) used to achieve these goals, but I don't like *roff very much and I don't believe it's completely up to the task (but then, SIMPLE might also not).

It then appeared that I needed a macro processor to convert whatever meant "fraction with numerator 2 and denominator 3" into whatever output I needed (like {2\over 3}). I first thought of using m4 (cpp is out of question of course). Unfortunately, m4 is made to handle mainly programs and not text files, so I encountered all sorts of difficulties. First of all, I would have had to use the ``m4_ prefix on all builtins'' option because m4 interprets macros wherever they are found (there is no special macro invocation character) and that can be a pain. But most annoying was the problem with the backtick (`) character: apparently the only way in m4 to write a macro which will produce a backtick (without permanently changing the quote characters, because otherwise the same problem would occur for whatever happens to be the open quote character) is to write (with m4_ prefixes):

m4_changequote(`[',`]')m4_define(__lq,`)m4_changequote([`],['])m4_dnl
m4_define(_lq,`m4_changequote(`[',`]')__lq[]m4_changequote([`],['])')m4_dnl
which makes the macro _lq produce a single left quote (exercice for those who know m4 a bit: why did I need the __lq macro and why can't I just use that). Another thing I do not like about m4 is that it does not gobble comments (one would wonder why they're called comments, then), so either one has to use dnl to produce comments or one has to change the comment character to that of TeX which involves making an assumption as to what it is, precisely the sort of things I was trying to avoid. Anyhow, m4 did not suit my needs, so I just had to put my hands in the dirt and write my own macro preprocessor, which is what I did. Et dixi ``fiat SIMPLE''. Et SIMPLE fit.

As to my ``universal TeX'' project, it is not even started yet. But my current idea is to have the files processed by SIMPLE, and even before that by a tiny program which will change all ISO8859-1 characters (which I use a lot because I occasionally write in French) to SIMPLE macros (because SIMPLE does not permit invocation of macros by a single character - and on the other hand it's a pain to have to write a SIMPLE macro invocation for every accented character). SIMPLE, of course, might change these macros back to the ISO8859-1 character in question, if ISO8859-1 input is recognized by whatever form of TeX (or other) is sought.

About macro processing

There was a time when I attempted to classify programming languages - it proved fruitless: each programming language seems to occupy its very own class. This applies to macro processing languages. They seem to be closer to functional languages (such as caml or Miranda) than to imperative languages (such as C or Pascal), but the issue is not altogether clear.

One thing that can help distinguish programming languages is the kind of calling mechanism which they use. The kind of calling mechanism which macro processors use is ``call-by-need'' which means that the arguments to a function (macro) are evaluated first, before the macro is itself expanded, and even if the macro does not need these arguments. So, essentially, if you write ignorearg(screwupall()), everything does get screwed up, contrarily to what would happen if call-by-name were used (this obviously illustrates the infinite superiority of macro processing languages :-). Still, macro processing languages provide ways to inhibit evaluation: that is called ``quoting'', and we will have much more to say on the subject.

In an ideal functional language, functions cannot have global effects, so that calling the same function twice with the same arguments should produce the same result. That restriction does not apply to macro processing languages: a macro may modify a variable (that is, redefine a macro), so that applying it twice may yield completely different results.

Macro processors resemble functional languages in that there is, really, no such thing as an ``instruction'', at least no difference between ``expressions'' and ``instructions''. A functional language (say, pure lambda-calculus) may be completely untyped, everything being of the ``function'' type. As far as macro processors go, everything is of the ``list'' type, where ``list'' means ``list of tokens'' or ``character string'' as the case may be.

The central idea behind a macro processor is that of ``re-evaluation'': when a macro has been evaluated (expanded), the expansion obtained is fed back to the input so that it will be evaluated again. Only non-macro tokens and quoted elements are not (re)evaluated. As a very simple example, suppose that the macro infiniteloop evaluates to infiniteloop; then that expansion will be re-evaluated, causing an infinite loop. Wonderful invention, the wheel.

As a slightly more sophisticated example of perpetual motion, let us suppose we have a macro double which takes a parameter and evaluates to that parameter applied to itself. Then we might apply the macro double on itself, which will result in double being applied to itself, and so on, perpetually re-evaluating the same thing. Now there is one important thing to note: we should not write double(double) (if the syntax is m4-like, say) to mean ``double applied to itself'', because if we write that, then the ``inner'' double gets evaluated first (as are any arguments), resulting in either nothing at all or in an error, as it was not given any arguments. Rather, we should quote the inner double to prevent its evaluation and pass the double object itself (rather than its evaluation) to the ``outer'' double. So in m4 we would write double(`double'). In fact, the complete program in m4 is:

define(`double',`$1(`$1')')double(`double')
try it and watch your computer start spinning like mad (note that there are three pairs of quotes in the definition of double, the really interesting one being the inner one which sees to it that double(`double') does indeed evaluate to double(`double') and not simply to double(double)). The corresponding program in SIMPLE is:
@def@<@double@>|<@1@<@1@>">"@double@<@double@>"

SIMPLE user's guide

Convention: SIMPLE examples will be presented in the following way:
@def@<@greet@>|<Hello, @1@!>"%
@greet@world"
->
Hello, world!
The part before the arrow (->) is the input which is presented to simple and second part is the output produced by it.

We encourage readers to try all the examples.

Basics

Let us start with something very simple.
DON'T PANIC
->
DON'T PANIC
In other words, simple just copies to the output whatever it is fed in; that is true so long as the input does not contain any of the eight special characters, which are @, ", |, #, <, >, % and `.

Escaping special characters

The backtick ` is SIMPLE's escape character. This means that any special character (including the backtick itself) looses its special signification when preceeded by the backtick (unless the backtick in question is itself... well, you get the picture). So we get the following picture:
Santa Claus `<santa.claus`@toys.np`>
->
Santa Claus <santa.claus@toys.np>
Note that it is not an error to escape an ordinary (i.e. not special) character: it just leaves the ordinary character in question unaltered.
Wonderful`!
->
Wonderful!

Comments

The special character % is the comment character. Anything that follows it (if it not escaped, that is), and up to the end of the line, is ignored by SIMPLE. It will be gobbled, that is, it will not appear on the output (compare this with m4's behaviour). Here is an example:
This is ordinary text %and this is a comment.
and this is the continuation of it.
Note how the new line was swallowed by the comment.
10`% of 90 is 9.
`%This is not a comment %but this is.
so it should appear on the output.
->
This is ordinary text and this is the continuation of it.
Note how the new line was swallowed by the comment.
10% of 90 is 9.
%This is not a comment so it should appear on the output.

Macros, commands

The essential concept in SIMPLE is that of a macro, also called a command (or a function, or whatever). There are two kinds of macros: builtin macros and user-defined macros. Builtins are those macros which are hard-coded inside simple (they are programmed in C). They are always present. User-defined macros, on the other hand, are defined by the user (as their name seems to indicate).

Macro invocation

Whether a macro is builtin or user-defined does not make any difference in order to call it: the macro invocation sequence is
@macro name@arguments"
That is, the name of the macro is surrounded by at signs (@), and is followed by the arguments and the whole thing is ended by a quote sign ("). The arguments themselves are separated by vertical bars (``pipe'' signs, |). Here is an example:
The `@id`@ builtin just evaluates to its first argument:
@id@First argument|Second argument|Third argument"
Of course, if there is only one argument, it evaluates to that:
@id@(of course)"
As for the `@void`@ builtin, it is even less useful: it evaluates to nothing:
@void@SIMPLE is really stupid!"
->
The @id@ builtin just evaluates to its first argument:
First argument
Of course, if there is only one argument, it evaluates to that:
(of course)
As for the @void@ builtin, it is even less useful: it evaluates to nothing:

Perhaps you don't see it, but there's an empty line at the end of the output in the previous example. That is because the linefeed character after the last double quote character in the input was copied to the output.

Note that it is not possible to call a function with no argument. That is because the quote character is required to finish a function call. The next best thing one can do is call a macro with a single argument and let that argument be empty, like in @mymacro@".

Note that an argument to a macro may perfectly well contain itself a macro call. That constitutes a nested macro call, and it works just like you'd think:

@id@This @id@is"@void@ stupid, @id@really"" a @id@@id@nested"" macro call.|No"
->
This is a nested macro call.

David Madore