ICI Technical Description

Version 1.2

Tim Long

© 1992-2000 Tim Long

Regular expression portions © 1997-1999 University of Cambridge

Permission granted to reproduce provided copyright notices are preserved.

 

Introduction

ICI is a general purpose interpretive programming language that has dynamic typing and flexible data types with the basic syntax, flow control constructs and operators of C. It is designed for use in many environments, including embedded systems, as an adjunct to other programs, as a text-based interface to compiled libraries, and as a cross-platform scripting language with good string-handling capabilities.

The ICI language and source is not copyright in any way.

This document is the basic reference for the core language and functions. There is also an extensive man page that includes details command line invocation not described here. Additional documentation is provided in ICI source releases. The ICI web site is http://www.zeta.org.au/~atrn/ici/

Basics

The ICI interpreter's execution engine calls on the parser to read and compile a statement from an input stream. The parser in turns calls on the lexical analyser to read tokens. Upon return from the parser the execution engine executes the compiled statement. When the statement has finished execution, the execution engine repeats the sequence.

The lexical analyser

The ICI lexical analyser breaks the input stream into tokens, optionally separated by white-space (which includes comments as described below). The next token is always the longest string of following characters which could possibly be a token. The following are tokens:

/     
/=    
$     
@     
(     
)     
{     
}     
,
~
~~
~~=
~~~
[
]
.
*
*=
%
%=
^
^=
+
+=
++
-
-=
--
->
>
>=
>>
>>=
<
<=
<=>
<<
<<=
=
==
!
!=
!~
&
&&
&=
|
||
|=
;
?
:
:=
:^
 
 

The following are also tokens:

  • The character '#' followed by any sequence of characters except a newline, then another '#'. This token is a regular-expression.
  • The character ' (single quote) followed by a single character (other than a newline) or a single backslash character sequence (described below), followed by another single quote. This token is a character-code. A single quote followed by other than the above sequence will result in an error.
  • The character " (double quote) followed by any sequence of characters (other than a newline) and backslash character sequences, up to another double quote character. This token is a string.

A backslash character sequence is any of the following:

\n

newline (ASCII 0x0A)

\t

tab (ASCII 0x09)

\v

vertical tab (ASCII 0x0B)

\b

back space (ASCII 0x08)

\r

carriage return (ASCII 0x0D)

\f

form feed (ASCII 0x0C)

\a

audible bell (ASCII 0x07)

\e

escape (ASCII 0x1B)

\\

backslash (ASCII 0x5C)

\'

single quote (ASCII 0x27)

\"

double quote (ASCII 0x22)

\?

question mark (ASCII 0x3F)

\cx

control-x

\xx.. 

the character with hex code x...

\n

the character with octal code n. (1, 2 or 3 octal digits)

Consecutive string-literals, seperated only by white-space, are concatenated to form a single string-literal.

  • Any upper or lower case letter, any digit, or '_' (underscore) followed by any number of the same (or other characters which may be involved in a floating point number while that is a valid interpretation). A token of this form may be one of three things:

If it can be interpreted as an integer, it is an integer-number.

Otherwise, if it can be interpreted as a floating point number, it is a floating-point-number.

Otherwise, it is an identifier.

Notice that keywords are not recognised directly by the lexical analyser. Instead, certain identifiers are recognised in context by the parser as described below.

There are two forms of comments (which are white-space). One starts with the characters /* and continue until the next */ . The other starts with the characters // and continues until the next end of line. Also, lines which start with a # character are ignored. (Lines may be terminated with linefeed , carriage return or carrige return plus linefeed .)

An introduction to variables, modules and scope

Variables are simple identifiers which have a value associated with them. They are in themselves typeless, depending on the type of the value currently assigned to them.

The term module in ICI refers to a collection of functions, declarations and code which share the same variables. Typically each source file is a module, but not necessarily.

In ICI, modules may be nested in a hierarchical fashion. Within a module, variables can be declared as either static or extern. When a variable is declared as static it is visible to code defined in the module of its definition, and to code defined in sub-modules of that one. This is termed the scope of the variable.

When a variable is defined as extern it is declared static in the parent module. Thus the parent module and all sub-modules of the parent module have that variable in their scope. Variables of this type, whether originally declared extern or static, will be henceforward referred to as static variables.

Static variables are persistent variables. That is they remain in existence even when execution completely leaves their scope, despite not being visible to any executing code. They are visible again when code flow again enters their scope.

The scoping of static variables is strictly governed by the nesting of the modules, not by the flow of execution. For example. Suppose two neighbouring modules (call them module A and module B) each define a variable called theVariable. When some code in module A calls a function defined in module B and that function refers to theVariable; it is referring to the version of theVariable defined in module B, not the one defined in module A.

Variables in sub scopes hide variables of the same name defined in outer scopes.

The second type of variable in ICI is the automatic, or auto, variable. Automatic variables are not persistent. They last only as long as a module is being parsed or a function is being executed. For instance, each time a function is entered a copy is made of the auto variables which were declared in the function. This group of variables generally only persists during the execution of the function; once the function returns they are discarded.

The parser

The parser uses the lexical analyser to read a source input stream. The parser also has reference to the variable-scope within which this source is being parsed, so that it may define variables.

When encountering a variable definition, the parser will define variables within the current scope. When encountering normal executable code at the outermost level, the parser returns its compiled form to the execution engine for execution.

For some constructs the parser will in turn recursively call upon the execution engine to evaluate a sub-construct within a statement.

The following sections will work through the syntax of ICI with explanations and examples. Occasionally constructs will be used ahead of their full explanation. Their intent should be obvious.

The following notation is used in the syntax in these sections. Note that the syntax given in the text is not always exact, but rather designed to aid comprehension. The exact syntax is given in a later section.

bold

The bold text is literal ASCII text.

italic

The italic text is a construct further described elsewhere.

[ xxx ]     

The xxx is optionally present.

xxx...

The xxx may be present zero or more times.

( xxx | yyy )     

Either xxx or yyy may be present.

As noted previously there are no reserved words recoginsed by the lexical anaylyser, but certain identifiers will be recognised by the parser in certain syntactic positions (as seen below). While these identifiers are not otherwise restricted, special action may need to be taken if they are used as simple variable names. They probably should be avoided. The complete list is:

 

NULL            

auto              

break              

case              

 

continue

default

do

else

 

extern

for

forall

if

 

in

onerror

return

static

 

switch

try

while

 

We now turn our attention to the syntax itself.

Firstly consider the basic statement which is the unit of operation of the parser. As stated earlier the execution engine will call on the parser to parse one top-level statement at a time. We split the syntax of a statement into two categories (purely for semantic clarity):

statement       

executable-statement

 

declaration

That is, a statement is either an executable-statement or a declaration. We will first consider the executable-statement.

These are statements that, at the top-level of parsing, can be translated into code which can be returned to the execution engine. This is by far the largest category of statements:

executable-statement    

expression ;

 

compound-statement

 

if ( expression ) statement

 

if ( expression ) statement else statement

 

while ( expression ) statement

 

do statement while ( expression ) ;

 

for ( [ expression ] ; [ expression ] ; [ expression ] ) statement

 

forall ( expression [ , expression ] in expression ) statement

 

switch ( expression ) compound-statement

 

case parser-evaluated-expression :

 

default :

 

break ;

 

continue ;

 

return [ expression ] ;

 

try statement onerror statement

 

;

These are the basic executable statement types. Many of these involve expressions, so before examining each statement in turn we will examine the expression.

Expressions

We will examine expressions by starting with the most primitive elements of expressions and working back up to the top level.

Factors

The lowest level building block of an expressions is the factor:

factor          

integer-number

 

character-code

 

floating-point-number

 

string

 

regular-expression

 

identifier

 

NULL

 

( expression )

 

[ array expression-list ]

 

[ set expression-list ]

 

[ struct [ ( : | = ) expression , ] assignment-list ]

 

[ class [ ( : | = ) expression , ] assignment-list ]

 

[ func function-body ]

 

[ module [ ( : | = ) expression , ] statement... ]

The constructs integer-number, character-code, floating-point-number, string, and regular-expression are primitive lexical elements (described above). Each is converted to its internal form and is an object of type int, int, float, string, or regexp respectively.

A factor which is an identifier is a variable reference. But its exact meaning depends upon its context within the whole expression. Variables in expressions can either be placed so that their value is being looked up, such as in:

a + 1

Or they can be placed so that their value is being set, such as in:

a = 1

Or they can be placed so that their value is being both looked up and set, as in:

a += 1

Only certain types of expression elements can have their value set. A variable is the simplest example of these. Any expression element which can have its value set is termed an lvalue because it can appear on the left hand side of an assignment (which is the simplest expression construct which requires an lvalue). Consider the following two expressions:

1 = 2					/* WRONG */
a = 2					/* OK */

The first is illegal because an integer is not an lvalue, the second is legal because a variable reference is an lvalue. Certain expression elements, such as assignment, require an operand to be an lvalue. The parser checks this.

The next factor in the list above is NULL. The keyword NULL stands for the value NULL which is the general undefined value. It has its own type, NULL. Variables which have no explicit initialisation have an initial value of NULL. Its other uses will become obvious later in this document.

Next is the construct ( expression ). The brackets serve merely to make the expression within the bracket act as a simple factor and are used for grouping, as in ordinary mathematics.

Finally we have the four constructs surrounded by square brackets. These are textual descriptions of more complex data items; typically known as literals. For example the factor:

[array 5, 6, 7]

is an array of three items, that is, the integers 5, 6 and 7. Each of these square bracketed constructs is a textual description of a data type named by the first identifier after the starting square bracket. A full explanation of these first requires an explanation of the fundamental aggregate types.

An introduction to arrays, sets and structs

There are three fundamental aggregate types in ICI: arrays, sets, and structs. Certain properties are shared by all of these (and other types as will be seen later). The most basic property is that they are each collections of other values. The next is that they may be "indexed" to reference values within them. For example, consider the code fragment:

a = [array 5, 6, 7];
i = a[0];

The first line assigns the variable a an array of three elements. The second line assigns the variable i the value currently stored at the first element of the array. The suffixing of an expression element by an expression in square brackets is the operation of "indexing", or referring to a sub-element of an aggregate, and will be explained in more detail below.

Notice that the first element of the array has index zero. This is a fundamental property of ICI arrays.

The next ICI aggregate we will examine is the set. Sets are unordered collections of values. Elements "in" the set are used as indexes when working with the set, and the values looked up and assigned are interpreted as a booleans. Consider the following code fragment:

s = [set 200, 300, "a string"];
if (s[200])
	printf("200 is in the set\n");
if (s[400])
	printf("400 is in the set\n");
if (s["a string"])
	printf("\"a string\" is in the set\n");
s[200] = 0;
if (s[200])
	printf("200 is in the set\n");

When run, this will print:

200 is in the set
"a string" is in the set

Notice that there was no second printing of "200 is in the set" because it was removed from the set on the third last line by assigning zero to it.

Now consider structs. Structs are unordered collections of values indexed by any values. Other properties of structs will be discussed later. The typical indexes of structs are strings. For this reason notational shortcuts exist for indexing structures by simple strings. Also, because each element of a struct is actually an index and value pair, the syntax of a struct literal is slightly different from the arrays and sets seen above. Consider the following code fragment:

s = [struct a = 123, b = 456, xxx = "a string"];
printf("s[\"a\"] = %d\n", s["a"]);
printf("s.a = %d\n", s.a);
printf("s.xxx = \"%s\"\n", s.xxx);

Will print:

s["a"] = 123
s.a = 123
s.xxx = "a string"

Notice that on the second line the structure was indexed by the string "a", but that the assignment in the struct literal did not have quotes around the a. This is part of the notational shortcut which will be discussed further, below. Also notice the use of s.a in place of s["a"]. This is a similar shortcut, also discussed below.

Back to expression syntax

The aggregate literals, which in summary are:

                  

[ array expression-list ]

 

[ set expression-list ]

 

[ struct [ ( : | = ) expression , ] assignment-list ]

 

[ class [ ( : | = ) expression , ] assignment-list ]

 

[ func function-body ]

 

[ module [ ( : | = ) expression , ] statement... ]

involve three further constructs, the expression-list, which is a comma separated list of expressions; the assignment-list, which is a comma separated list of assignments; and the function-body, which is the argument list and code body of a function. The syntax of the first of these is:

expression-list        

empty

 

expression [ , ]

 

expression , expression-list

The expression-list is fairly simple. The construct empty is used to indicate that the whole list may be absent. Notice the optional comma after the last expression. This is designed to allow a more consistent formatting when the elements are line based, and simpler output from programmatically produced code. For example:

[array
	"This is the first element",
	"This is the second element",
	"This is the third element",
]

The assignment list has similar features:

assignment-list           

empty

 

assignment [ , ]

 

assignment , assignment-list

 

 

assignment

struct-key

 

struct-key = expression

 

struct-key function-body

 

 

struct-key

identifier

 

( expression )

Each assignment is either an assignment to a simple identifier or an assignment to a full expression in brackets. The assignment to an identifier is merely a notational abbreviation for an assignment to a string. The following two struct literals are equivalent:

[struct abc = 4]
[struct ("abc") = 4]

The syntax of a function-body is:

function-body            

( identifier-list ) compound-statement

 

 

identifier-list

empty

 

identifier [ , ]

 

identifier , identifier-list

That is, an identifier-list is an optional comma separated list of identifiers with an optional trailing comma. Literal functions are rare in most programs; functions are normally named and defined with a special declaration form which will be seen in more detail below. The following two code fragments are equivalent; the first is the abbreviated notation:

static fred(a, b){return a + b;}

and:

static fred = [func (a, b){return a + b;}];

The meaning of functions will discussed in more detail below.

Aggregates in general, and literal aggregates in particular, are fully nestable:

[array
	[struct a = 1, c = 2],
	[set "a", 1.2, 3],
	"a string",
]

Note that aggregate literals are entirely evaluated by the parser. That is, each expression is evaluated and reduced to a particular value, these values are then used to build an object of the required type. For example:

[struct a = sin(0.5), b = cos(0.5)]

Causes the functions sin and cos to be called during the parsing process and the result assigned to the keys a and b in the struct being constructed. It is possible to refer to variables which may be in existence while such a literal is being parsed 1.

This ends our consideration of the lowest level element of an expression, the factor.

Primary operators

A simple factor may be adorned with a sequence of primary-operations to form a primary-expression. That is:

primary-expression       

factor primary-operation...

 

 

primary-operation

[ expression ]

 

index-operator identifier

 

index-operator ( expression )

 

 

index-operator

Any of:

 

.  ->  :  :^

The first primary-operation (above) we have already seen. It is the operation of "indexing" which can be applied to aggregate types. For example, if xxx is an array:

xxx[10]

refers to the element of xxx at index 10. The parser does not impose any type restrictions (because typing is dynamic), although numerous type restrictions apply at execution time (for instance, arrays may only be indexed by integers, and floating point numbers are not able to be indexed at all).

Of the other index operators, . identifier, is a notational abbreviation of [ "identifier" ] , as seen previously. The bracketed form is again just a notational variation. Thus the following are all equivalent:

xxx["aaa"]
xxx.aaa
xxx.("aaa")

And the following are also equivalent to each other:

xxx[1 + 2]
xxx.(1 + 2)

Note that factors may be suffixed by any number of primary-operations. The only restriction is that the types must be right during execution. Thus:

xxx[123].aaa[10]

is legal.

The two constructs

                                     

-> identifier

 

-> ( expression )

are again notational variations. In general, constructs of the form:

                                     

primary-expression -> identifier

 

primary-expression -> ( expression )

are re-written as:

                                     

( * primary-expresion ) . identifier

                                     

( * primary-expression ) . ( expression )

The unary operator * used here is the indirection operator, its meaning is discussed later.

The index operators : and :^ index the primary expression to discover a function, the result of the operation is a callable method. These operators and methods are discussed in more detail below.

The last of the primary-operations:

                                     

( expression-list )

is the call operation. Although, as usual, no type checking is performed by the parser; at execution time the thing it is applied to must be callable. For example:

my_function(1, 2, "a string")

and

xxx.array_of_funcs[10]()

are both function calls. Function calls will be discussed in more detail below.

This concludes the examination of a primary-expression.

Terms

Primary-expressions are combined with prefix and postfix unary operators to make terms:

term

[ prefix-operator... ] primary-expression [ postfix-operator... ]

 

 

prefix-operator           

Any of:

 

*  &  -  +  !  ~  ++  --  @  $

 

 

postfix-operator

Any of:

 

++  --

That is, a term is a primary-expression surrounded on both sides by any number of prefix and postfix operators. Postfix operators bind more tightly than prefix operators. Both types bind right-to-left when concatenated together. That is: -!x is the same as -(!x). As in all expression compilation, no type checking is performed by the parser, because types are an execution-time consideration.

Some of these operators touch on subjects not yet explained and so will be dealt with in detail in later sections. But in summary:

Prefix operators

    

*      

Indirection; applied to a pointer, gives target of the pointer.

 

&

Address of; applied to any lvalue, gives a pointer to it.

 

-

Negation; gives negative of any arithmetic value.

 

+

Positive; no real effect.

 

!

Logical not; applied to 0 or NULL, gives 1, else gives 0.

 

~

Bit-wise complement.

 

++

Pre-increment; increments an lvalue and gives new value.

 

--

Pre-decrement; decrements an lvalue and gives new value.

 

@

Atomic form of; gives the (unique) read-only version of any value.

 

$

Immediate evaluation; see below.

 

Postfix operators

 

++

Post-increment; increments an lvalue and gives old value.

 

--

Post-increment; decrements an lvalue and gives old value.

One of these operators, $, is only a pseudo-operator. It actually has its effect entirely at parse time. The $ operator causes its subject expression to be evaluated immediately by the parser and the result of that evaluation substituted in its place. This is used to speed later execution, to protect against later scope or variable changes, and to construct constant values which are better made with running code than literal constants. For example, an expression involving the square root of two could be written as:

x = y + 1.414213562373095;

Or it could be written more clearly, and with less chance of error, as:

x = y + sqrt(2.0);

But this construct will call the square root function each time the expression is evaluated. If the expression is written as:

x = y + $sqrt(2.0);

The square root function will be called just once, by the parser, and will be equivalent to the first form.

When the parser evaluates the subject of a $ operator it recursively invokes the execution engine to perform the evaluation. As a result there is no restriction on the activity which can be performed by the subject expression. It may reference variables, call functions or even read files. But it is important to remember that it is called at parse time. Any variables referenced will be immediately interrogated for their current value. Automatic variables of any expression which is contained in a function will not be available, because the function itself has not yet been invoked; in fact it is clearly not yet even fully parsed.

The $ operator as used above increased speed and readability. Another common use is to avoid later re-definitions of a variable. For instance:

($printf)("Hello world\n");

Will use the printf function which was defined at the time the statement was parsed, even if it is latter re-defined to be some other function. It is also slightly faster, but the difference is small when only a simple variable look-up is involved. Notice the bracketing which has been used to bind the $ to the word printf. Function calls are primary operations so the $ would have otherwise referred to the whole function call as it did in the first example.

This concludes our examination of a term (remember that the full meaning of other prefix and postfix operators will be discussed in later sections).

Binary operators

We will now turn to the top level of expressions where terms are combined with binary operators:

expression

term

 

expression infix-operator expression

infix-operator     

Any of:

 

@

 

*   /   %

 

+   -

 

>>   <<

 

<   >   <=   >=

 

==   !=   ~   !~   ~~   ~~~

 

&

 

^

 

|

 

&&

 

||

 

:

 

?

 

=   +=   -=   *=   /=   %=   >>=   <<=   &=   ^=   |=   ~~=   <=>

 

,

That is, an expression can be a simple term, or two expressions separated by an infix-operator. The ambiguity amongst expressions built from several binary-operator separated expressions is resolved by assigning each operator a precedence and also applying rules for order of binding amongst equal precedence levels 2. The lines of binary operators in the syntax rules above summarise their precedence. Operators on higher lines have higher precedence than those on lower lines. Thus 1+2*3 is the same as 1+(2*3). Operators which share a line have the same precedence. All operators except those on the second last line group left-to-right. Those on the second last line (the assignment operators) group right-to-left. Thus

a * b / c

is the same as:

(a * b) / c

But:

a = b += c

is the same as:

a = (b += c)

As with unary operators, the full meaning of each will be discussed in a later section. But in summary:

Binary operator summary

    

@    

Form pointer

 

*

Multiplication, Set intersection

 

/

Division

 

%

Modulus

 

+

Addition, Set union

 

-

Subtraction, Set difference

 

>>

Right shift (shift to lower significance)

 

<<

Left shift (shift to higher significance)

 

<

Logical test for less than, Proper subset

 

>

Logical test for greater than, Proper superset

 

<=

Logical test for less than or equal to, Subset

 

>=

Logical test for greater than or equal to, Superset

 

==

Logical test for equality

 

!=

Logical test for inequality

 

~

Logical test for regular expression match

 

!~

Logical test for regular expression non-match

 

~~

Regular expression sub-string extraction

 

~~~

Regular expression multiple sub-string extraction

 

&

Bit-wise and

 

^

Bit-wise exclusive or

 

|

Bit-wise or

 

&&

Logical and

 

||

Logical or

 

:

Choice separator (must be right hand subject of ? operator)

 

?

Choice (right hand expression must use : operator)

 

=

Assignment

 

+=

Add to

 

-=

Subtract from

 

*=

Multiply by

 

/=

Divide by

 

%=

Modulus by

 

>>=

Right shift by

 

<<=

Left shift by

 

&=

And by

 

^=

Exclusive or by

 

|=

Or by

 

~~=

Replace by regular expression extraction

 

<=>

Swap values

 

,

Multiple expression separator

This concludes our consideration of expressions.

Statements

We will now move on to each of the executable statement types in turn.

Simple expression statements

The simple expression statement:

                                     

expression ;

Is just an expression followed by a semicolon. The parser translates this expression to its executable form. Upon execution the expression is evaluated and the result discarded. Typically the expression will have some side-effect such as assignment, or make a function call which has a side-effect, but there is no explicit requirement that it do so. Typical expression statements are:

printf("Hello world.\n");
x = y + z;
++i;

Note that an expression statement which could have no side-effects other than producing an error may be completely discarded and have no code generated for it.

Compound statements

The compound statement has the form:

                                     

{ statement... }

That is, a compound statement is a series of any number of statements surrounded by curly braces. Apart from causing all the sub-statements within the compound statement to be treated as a syntactic unit, it has no effect. Thus:

printf("Line 1\n");
{
	printf("Line 2\n");
	printf("Line 3\n");
}
printf("Line 4\n");

When run, will produce:

Line 1
Line 2
Line 3
Line 4

Note that the parser will not return control to the execution engine until all of a top-level compound statement has been parsed. This is true in general for all other statement types.

The if statement

The if statement has two forms:

                                     

if ( expression ) statement

 

if ( expression ) statement else statement

The parser converts both to an internal form. Upon execution, the expression is evaluated. If the expression evaluates to anything other than 0 (integer zero) or NULL, the following statement is executed; otherwise it is not. In the first form this is all that happens, in the second form, if the expression evaluated to 0 or NULL the statement following the else is executed; otherwise it is not.

The interpretation of both 0 and NULL as false, and anything else as true, is common to all logical operations in ICI. There is no special boolean type.

The ambiguity introduced by multiple if statements with an lesser number of else clauses is resolved by binding else clauses with their closest possible if. Thus:

if (a) if (b) dox(); else doy();

If equivalent to:

if (a)
{
	if (b)
		dox();
	else
		doy();
}

The while statement

The while statement has the form:

                                     

while ( expression ) statement

The parser converts it to an internal form. Upon execution a loop is established. Within the loop the expression is evaluated, and if it is false (0 or NULL) the loop is terminated and flow of control continues after the while statement. But if the expression evaluates to true (not 0 and not NULL) the statement is executed and then flow of control moves back to the start of the loop where the test is performed again (although other statements, as seen below, can be used to modify this natural flow of control).

The do-while statement

The do-while statement has the following form: ###UP TO HERE

do statement while ( expression ) ;

The parser converts it to an internal form. Upon execution a loop is established. Within the loop the statement is executed. Then the expression is evaluated and if it evaluates to true, flow of control resumes at the start of the loop. Otherwise the loop is terminated and flow of control resumes after the do-while statement.

The for statement

The for statement has the form:

for ( [ expression ]; [ expression ]; [ expression ] ) statement

The parser converts it to an internal form. Upon execution the first expression is evaluated (if present). Then, a loop is established. Within the loop: If the second expression is present, it is evaluated and if it is false the loop is terminated. Next the statement is executed. Finally, the third expression is evaluated (if present) and flow of control resumes at the start of the loop. For example:

for (i = 0; i < 4; ++i)
	printf("Line %d\n", i);

When run will produce:

Line 0
Line 1
Line 2
Line 3

The forall statement

The forall statement has the form:

forall ( expression [ ,expression ] in expression ) statement

The parser converts it to an internal form. In doing so the first and second expressions are required to be lvalues (that is, capable of being assigned to). Upon execution the first expression is evaluated and that storage location is noted. If the second expression is present the same is done for it. The third expression is then evaluated and the result noted; it must evaluate to an array, a set, a struct, a string, or NULL; we will call this the aggregate. If this is NULL, the forall statement is finished and flow of control continues after the statement; otherwise, a loop is established.

Within the loop, an element is selected from the noted aggregate. The value of that element is assigned to the location given by the first expression. If the second expression was present, it is assigned the key used to access that element. Then the statement is executed. Finally, flow of control resumes at the start of the loop.

Each arrival at the start of the loop will select a different element from the aggregate. If no as yet unselected elements are left, the loop terminates. The order of selection is predictable for arrays and strings, namely first to last. But for structs and sets it is unpredictable. Also, while changing the values of the structure members is acceptable, adding or deleting keys, or adding or deleting set elements during the loop will have an unpredictable effect on the progress of the loop.

As an example:

forall (colour in [array "red", "green", "blue"])
	printf("%s\n", colour);

when run will produce:

red
green
blue

And:

forall (value, key in [struct a = 1, b = 2, c = 3])
	printf("%s = %d\n", key, value);

when run will produce (possibly in some other order):

c = 3
a = 1
b = 2

Note in particular the interpretation of the value and key for a set. For consistency with the access method and the behavior of structs and arrays, the values are all 1 and the elements are regarded as the keys, thus:

forall (value, key in [set "a", "b", "c"])
	printf("%s = %d\n", key, value);

when run will produce:

c = 1
a = 1
b = 1

But as a special case, when the second expression is omitted, the first is set to each "key" in turn, that is, the elements of the set. Thus:

forall (element in [set "a", "b", "c"])
	printf("%s\n", element);

when run will produce:

c
a
b

When a forall loop is applied to a string (which is not a true aggregate), the "sub-elements" will be successive one character sub-strings.

Note that although the sequence of choice of elements from a set or struct is at first examination unpredictable, it will be the same in a second forall loop applied without the structure or set being modified in the interim.

The switch, case, and default statements

These statements have the forms:

switch ( expression ) compound-statement
case expression :
default :

The parser converts the switch statement to an internal form. As it is parsing the compound statement, it notes any case and default statements it finds at the top level of the compound statement. When a case statement is parsed the expression is evaluated immediately by the parser. As noted previously for parser evaluated expressions, it may perform arbitrary actions, but it is important to be aware that it is resolved to a particular value just once by the parser. As the case and default statements are seen their position and the associated expressions are noted in a table.

Upon execution, the switch statement's expression is evaluated. This value is looked up in the table created by the parser. If a matching case statement is found, flow of control immediately moves to immediately after that case statement. If there is a default statement, flow of control immediately moves to just after that. If there is no matching case and no default statement, flow of control continues just after the entire switch statement.

For example:

switch ("a string")
{
case "another string":
	printf("Not this one.\n");
case 2:
	printf("Not this one either.\n");
case "a string":
	printf("This one.\n");
default:
	printf("And this one too.\n");
}

When run will produce:

This one.
And this one too.

Note that the case and default statements, apart from the part they play in the construction of the look-up table, do not influence the executable code of the compound statement. Notice that once flow of control had transferred to the third case statement above, it continued through the default statement as if it had not been present. This behavior can be modified by the break statement described below.

It should be noted that the "match" used to look-up the switch expression against the case expressions is the same as that used for structure element look-up. That is, to match, the switch expression must evaluate to the same object as the case expression. The meaning of this will be made clear in a later section.

The break and continue statements

The break and continue statements have the form:

break ;
continue ;

The parser converts these to an internal form. Upon execution of a break statement the execution engine will cause the nearest enclosing loop (a while, do, for or forall) or switch statement within the same scope to terminate. Flow of control will resume immediately after the affected statement. Note that a break statement without a surrounding loop or switch in the same function or module is illegal.

Upon execution of a continue statement the execution engine will cause the nearest enclosing loop to move to the next iteration. For while and do loops this means the test. For for loops it means the step, then the test. For forall loops it means the next element of the aggregate.

The return statement

The return statement has the form:

return [ expression ] ;

The parser converts this to an internal form. Upon execution, the execution engine evaluates the expression if it is present. If it is not, the value NULL is substituted. Then the current function terminates with that value as its apparent value in any expression it is embedded in. It is an error for there to be no enclosing function.

The try statement

The try statement has the form:

try statement onerror statement

The parser converts this to an internal form. Upon execution, the first statement is executed. If this statement executes normally flow continues after the try statement; the second statement is ignored. But if an error occurs during the execution of the first statement control is passed immediately to the second statement.

Note that "during the execution" applies to any depth of function calls, even to other modules or the parsing of sub-modules. When an error occurs both the parser and execution engine unwind as necessary until an error catcher (that is, a try statement) is found.

Errors can occur almost anywhere and for a variety of reasons. They can be explicitly generated with the fail function (described below), they can be generated as a side-effect of execution (such as division by zero), and they can be generated by the parser due to syntax or semantic errors in the parsed source. For whatever reason an error is generated, a message (a string) is always associated with it.

When any otherwise uncaught error occurs during the execution of the first statement, two things are done:

  • Firstly, the string associated with the failure is assigned to the variable error. The assignment is made as if by a simple assignment statement within the scope of the try statement.
  • Secondly, flow of control is passed to the statement following the onerror keyword.

Once the second statement finishes execution, flow of control continues as if the whole try statement had executed normally.

For example:

static
div(a, b)
{
	try
		return a / b;
	onerror
		return 0;
}

printf("4 / 2 = %d\n", div(4, 2));
printf("4 / 0 = %d\n", div(4, 0));

When run will print:

4 / 2 = 2
4 / 0 = 0

The handling of errors which are not caught by any try statement is implementation dependent. A typical action is to prepend the file and line number on which the error occurred to the error string, print this, and exit.

The null statement

The null statement has the form:

;

The parser may convert this to an internal form. Upon execution it will do nothing.

Declaration statements

There are two types of declaration statements:

declaration storage-class declaration-list ;
storage-class identifier function-body

storage-class extern
static
auto

The first is the general case while the second is an abbreviated form for function definitions. Declaration statements are syntactically equal to any other statement, but their effect is made entirely at parse time. They act as null statements to the execution engine. There are no restriction on where they may occur, but their effect is a by-product of their parsing, not of any execution.

Declaration statements must start with one of the storage-class keywords listed above 3. Considering the general case first, we next have a declaration-list.

declaration-list identifier [ = expression ]
declaration-list , identifier [ = expression ]

That is, a comma separated list of identifiers, each with an optional initialisation, terminated by a semicolon. For example:

static a, b = 2, c = [array 1, 2, 3];

The storage class keyword establishes which scope the variables in the list are established in, as discussed earlier. Note that declaring the same identifier at different scope levels is permissible and that they are different variables.

A declaration with no initialisation first checks if the variable already exists at the given scope. If it does, it is left unmodified. In particular, any value it currently has is undisturbed. If it does not exist it is established and is given the value NULL.

A declaration with an initialisation establishes the variable in the given scope and gives it the given value even if it already exists and even if it has some other value.

Note that initial values are parser evaluated expressions. That is they are evaluated immediately by the parser, but may take arbitrary actions apart from that. For example:

static
fibonacci(n)
{
	if (n <= 1)
		return 1;
	return fibonacci(n - 1) + fibonacci(n - 2);
}

static fib10 = fibonacci(10);

The declaration of fib10 calls a function. But that function has already been defined so this will work.

Note that the scope of a static variable is (normally) the entire module it is parsed in. For example:

static
func()
{
	static aStatic = "The value of a static.";
}

printf("%s\n", aStatic);

when run will print:

The value of a static.

That is, despite being declared within a function, the declaration of aStatic has the same effect as if it had been declared outside the function. Also notice that the function has not been called. The act of parsing the function caused the declaration to take effect.

The behavior of extern variables has already been discussed, that is, they are declared as static in the parent module. The behavior of auto variables, and in particular their initialisation, will be discussed in a later section.

Abbreviated function declarations

As seen above there are two forms of declaration. The second:

storage-class identifier function-body

is a shorthand for:

storage-class identifier = [ func function-body ] ;

and is the normal way to declare simple functions. Examples of this have been seen above.

Functions

As with most ICI constructs there are two parts to understanding functions; how they are parsed and how they execute.

When a function is parsed four things are noted:

  • the names and positions of the formal parameters;
  • the names and initialisation of auto variables;
  • the static scope in which the function is declared;
  • the code generated by the statements in the function.

The formal parameters (that is, the identifiers in the bracket enclosed list just before the compound statement) are actually implicit auto variable declarations. Each of the identifiers is declared as an auto variable without an initialisation, but in addition, its name and position in the list is noted.

Upon execution (that is, upon a function call), the following takes place:

  • The auto variables, as noted by the parser, along with any initialisations, are copied as a group. This copy forms the auto variables of this invocation.
  • Any actual parameters (that is, expressions provided by the caller) are matched positionally with the formal parameter names, and the value of those expressions are assigned to the auto variables of those names.
  • If there were more actual parameters than formal parameters, and there is an auto variable called vargs, the remaining argument values are formed into an array which is assigned to vargs.
  • The variable scope is set such that the auto variables are the inner-most scope, the static variables noted with the function are the next outer scope etc.
  • The flow of control is diverted to the code generated by parsing the function.

A return statement executed within the function will cause the function to return to the caller and act as though its value were the expression given in the return statement. If no expression was given in the return statement, or if execution fell through the bottom of the function, the apparent return value is NULL. In any event, upon return the scope is restored to that of the caller. All internal references to the group of automatic variables are lost (although as will be seen later explicit program references may cause them to remain active).

Simple functions have been seen in earlier examples. We will now consider further issues.

It is very important to note that the parser generates a prototype set of auto variables which are copied, along with their initial values, when the function is called. The value which an auto variable is initialised with is a parser evaluated expression just like any other initialisation. It is not evaluated on function entry. But on function entry the value the parser determined is used to initialise the variable. For example:

static myVar = 100;

static
myFunc()
{
	auto anAuto = myVar;

	printf("%d\n", anAuto);
	anAuto = 500;
}

myFunc();
myVar = 200;
myFunc();

When run will print:

100
100

Notice that the initial value of anAuto was computed just once, changing myVar before the second call did not affect it. Also note that changing anAuto during the function did not affect its subsequent re-initialisation on the next invocation.

As stated above, formal parameters are actually uninitialised auto variables. Because of the behavior of variable declarations it is possible to explicitly declare an auto variable as well as include it in the formal parameter list. In addition, such an explicit declaration may have an initialisation. In this case, the explicit initialisation will be effective when there is no actual parameter to override it. For example:

static
print(msg, file)
{
	auto file = stdout; /* Default value. */

	fprintf(file, "%s\n", msg);
}

print("Hello world");
print("Hello world", stderr);

In the first call to the function print there is no second actual parameter. In this case the explicit initialisation of the auto variable file (which is the second formal parameter) will have its effect unmolested. But in the second call to print a second argument is given. In this case this value will over-write the explicit initialisation given to the argument and cause the output to go to stderr.

As indicated above there is a mechanism to capture additional actual parameters which were not mentioned in the formal parameter list. Consider the following example:

static
sum()
{
	auto vargs;
	auto total = 0;
	auto arg;

	forall (arg in vargs)
		total += arg;
	return total;
}

printf("1+2+3 = %d\n", sum(1, 2, 3));
printf("1+2+3+4 = %d\n", sum(1, 2, 3, 4));

Which when run will produce:

1+2+3 = 6
1+2+3+4 = 10

In this example the unmatched actual parameters were formed into an array and assigned to the auto variable vargs, a name which is recognised specially by the function call mechanism.

And also consider the following example where a default initialisation to vargs is made. In the following example the function call is used to invoke a function with an array of actual parameters, the function array is used to form an array at run-time, and addition is used to concatenate arrays; all these features will be further explained in later sections:

static
debug(fmt)
{
	auto fmt = "Reached here.\n";
	auto vargs = [array];

	call(fprintf, array(stderr, fmt) + vargs);
}

debug();
debug("Done that.\n");
debug("Result = %d, total = %d.\n", 123, 456);

When run will print:

Reached here.
Done that.
Result = 123, total = 456.

In the first call to debug no arguments are given and both explicit initialisations take effect. In the second call the first argument is given, but the initialisation of vargs still takes effect. But in the third call there are unmatched actual parameters, so these are formed into an array and assigned to vargs, overriding its explicit initialisation.

Method Calls

In addition to the above ICI has a simple mechanism for calling methods -- functions contained within an object (typically a struct ) that accept that object as their first parameter. The method call mechanism is enabled via a modification to the call operator, "()", to add semantics for calling a pointer object and through the addition of a new operator, binary-@, to form a pointer object from an object and a key. ICI pointers, described below, consist of an object and a key. To indirect though the pointer the object is indexed by the key and the resulting object used as the result. This is the same operation used in dynamic dispatch in languages such as Smalltalk and Objective-C.

The call operator now accepts a pointer as its first operand (we may think of the call operator as a n-ary operator that takes a function or pointer object as its first operand the function parameters as the remaining operands). When a pointer is "called" the key is used to index the pointer's container object and the result, which must be a function object, is called. In addition the container object within the pointer is passed as an implicit first parameter to the function (thus passing the actual object used to invoke the method to the method). Apart from the calling semantics the functions used to implemented methods are in all respects normal ICI functions.

Struct objects are typically used as the "container" for objects used with methods. The super mechanism provides the hierarichal search needed to allow class objects to be shared by multiple instances and provide a natural means of encapsulating information.

A typical way of using methods is,

/*
 * Define a "class" object representing our class and
 * containing the class methods.
 */
static MyClass = [struct
 
    doubleX = [func (self)
    {
        return self.x * 2;
    }]
 
];
 
...
 
static a;
a = struct(@MyClass);
a.x = 21;
printf("%d\n", a@doubleX());

 

We first define a class by using a literal struct to contain our named methods. You could also define class variables in this struct as it is shared by all instances of that class. In our class we've got a single method, doubleX, that doubles the value of an instance variable called x.

Later in the program we create an instance of a MyClass object by making a new struct object and setting its super struct to the class struct. The super is made atomic which ensures all instances share the same object and makes it read-only for them. Then we create an "instance variable" within the object by assigning 21 to a.x and finally invoke the method. We do not pass any parameters to doubleX. The call through the pointer object formed by the binary-@ operator passes "a" implicitly

Objects

Up till now few exact statements about the nature of values and data have been made. We will now examine values in more detail. Consider the following code fragment:

static x;
static y;

x = [array 1, 2, 3, 4];
y = x;

After execution of this code the variable x refers to an array