Essence# Syntax: Self Expressions

A self expression is essentially a (possibly cascaded) message sent to self, except that the pseudo-variable “self” is omitted. The value of the “invisible self” that receives the message is established by configuring the compiler to set the pseudo-variable self to the desired value during execution of the self expression.

A self expression is allowed just one message chain having just one message receiver–and the receiver must not be specified syntactically.

Conceptually, the syntactically unspecified–and therefore implied–receiver of the message(s) in the message chain of a self expression is the pseudo-variable self. That’s because the same compiler infrastructure used to set the pseudo-variable self to the correct value during the execution of a method or block is also used to set the value of the implied, unspecified receiver of the message(s) in the message chain of a self expression, and also because a self expression is parsed by the parser simply by adding an actual parse node for the pseudo-variable self as the receiver of the message(s) in the message chain of the self expression. That’s possible because the parser implements self expressions as their own “root parse node” or “grammatical start symbol.”

Any syntactically-valid expression which sends a message to an operand can be converted into a self expression simply be removing whatever syntactical construct is the receiver of the message (a.k.a, the leftmost operand in an expression.) The following examples show two pairs of expressions, where the first member of the pair uses self expression syntax and the second one does not:

Configuring a class

"Using 'self expression' syntax:"

        superclass: Object; 
        instanceVariableNames: #(red green blue)

 

"Using 'expression' syntax:"

        Color 
                superclass: Object; 
                instanceVariableNames: #(red green blue)

 

Adding methods to a class (using method literals to define the method):

"Using 'self expression' syntax:"

        protocol: #accessing 
        method:
                [## red
                        ^red
                ]

 

"Using 'expression' syntax:"

        aClass
                protocol: #accessing 
                method:
                        [## red
                                ^red
                        ]

 

Prior Art

The inspiration for self expressions comes from Tirade, a data representation language invented by Göran Krampe. The main reason that Essence# uses self expressions instead of Tirade is simply because, once you have a full parser/compiler, it is significantly simpler to implement self expressions than to implement Tirade. Given a full compiler, implementing self expressions only involves adding a few, relatively small methods to the parser and compiler. And it also technically doesn’t require adding new syntax to the language, since self expressions only use syntactical forms that would have to exist in any case; the only innovation is to permit simpler syntax that omits an otherwise-required syntactical construct. So self expressions were “the simplest thing that could possibly work.”

That said, Tirade would be a much superior solution to the problem of programming-language neutral data interchange. But that’s not the problem that self expressions are intended to solve.

Essence# Syntax: Method Declarations

method declaration is the syntactical construct used to define a method, including its name and its logic. In Essence#, the name of a method is typically referred to as its method selector; or even just its selector.

The EBNF for a method declaration:

MethodDeclaration   = [MethodHeaderToken, [ClassSpecification]], 
                      MethodHeader, ExecutableCode;
MethodHeaderToken   = OptionalWhitespace, "##", OptionalWhitespace;
MethodHeader        = UnaryMethodHeader | 
                               BinaryMethodHeader | 
                               KeywordMethodHeader;
UnaryMethodHeader   = UnaryMessageSelector;
BinaryMethodHeader  = BinaryMessageSelector, OptionalWhiteSpace, BindableIdentifier;
KeywordMethodHeader = KeywordMethodHeaderSegment, 
                      {Whitespace, KeywordMethodHeaderSegment};
ClassSpecification  = QualifiedIdentifier,OptionalWhitespace,">>", OptionalWhitespace;

 

At the beginning of a method declaration, the parser will recognize “##” (two immediately-adjacent hash characters) as a lexical token called a method header token, and will interpret the token to mean “what follows is a method header.”

Whitespace is permitted, but not required, both preceding and following the method header token.

A method header token may optionally appear as the first token of any method declaration; but it is not required if the method declaration is the root of the parse tree.

However, if a method header token occurs as the first (non-whitespace) token following a “[” token, then the parser will have no choice but to interpret what follows as the remaining tokens of a method literal (which must be terminated by a “]” token eventually.) And in that case, the “##” (method header) token may be required:

If the initial token following a “[” (BlockBegin) token is either a binary message selector or a keyword, then the source code enclosed within the “[” and “]” tokens will be parsed as a method declaration even though there is no leading method header token, and the entire construct (including the “[” and “]” tokens) will be interpreted as a method literal, and not as a block literal. So in either of those cases, no method header token is required.

However, if the first token following a “[” token is a unary message selector (which might instead just be a variable name,) and if there is no leading method header token in between the “[” token and the unary message selector, then the parser will not interpret the construct as a method literal, but will instead interpret it as a block. So, when a method declaration occurs as part of a method literal, and said method declaration has a unary method header,  the only way to get the parser to interpret the construct as a method literal, and not as a block, is to use a method header token as a prefix to the unary method header.

Note: The method header token will also be required as a prefix to a binary method header, if the binary selector is the “|” (vertical bar) token. That constraint is required in order to avoid syntactical ambiguity, due to the fact that a vertical bar token may also be the initial token of a variable declaration list.

If and only if a method header is preceded by the “##” (method header) token, the name of the class which is to be used as the environment for binding variable references when compiling the method may be specified preceding the method header. But in that case, the token “>>” must then be used as a separator between the class name and the method header.

Whitespace is permitted but not required in between the class name and the “>>” token, and in between the “>>” token and the method header.

Following the method header there must be an executable code construct. An executable code construct defines the method’s logic.  Colloquially, an executable code construct is referred to as a method body. A method bodyhas the same exact syntactical structure as a block body.

There are three different types of method header: A unary method header, a binary method header and akeyword method header.

A method declaration with a unary method header must be invoked using a unary message.

A method declaration with a binary method header must be invoked using a binary message.

A method declaration with a keyword method header must be invoked using a keyword message.

Examples of method declarations using all three types of method header are shown below (none of which have a method header token as a prefix):

Method declaration using a unary method header:

printString
        | stream |
        stream := String new writeStream.
        self printOn: stream.
        ^stream contents

 

Method declaration using a binary method header:

@ y
        ^Point x: self y: y

 

Method declaration using a keyword method header:

displayOn: aGraphicsContext at: aPoint
        aGraphicsContext displayString: self at: aPoint

 

If a method header token (“##”) precedes it, then a class specification construct may optionally precede any of the three types of method header. If present, the class specified by the class specification will be used by the compiler as the behavioral context in which the method will be compiled. In other words, the instance variables defined by the specified class, the class variables defined by the specified class and the global variables imported by the specified class will be used to bind any variables referenced by the method that aren’t either method parameters or local variables.

Here are the same three method declarations constructed to have an optional method header token and class specification construct as a prefix to the method header:

Method declaration using a unary method header and an optional class specification:

## Object>>printString
        | stream |
        stream := String new writeStream.
        self printOn: stream.
        ^stream contents
 

 

Method declaration using a binary method header and an optional class specification:

## Number>> @ y
        ^Point x: self y: y

 

Method declaration using a keyword method header and an optional class specification:

## String>>displayOn: aGraphicsContext at: aPoint
        aGraphicsContext displayString: self at: aPoint

 

Any method declaration (whether it uses a unary method header, a binary method header or a keyword method header, and whether or not it uses an optional class specification) may optionally begin with a method header token. The reason the method header token is optional is because its purpose is either to separate one method declaration from another in a sequence of method declarations, or else to distinguish a method literal from ablock. Outside of those two cases, it has no purpose, function or meaning. Its presence or absence has no effect on the semantics of the method.

Here’s an example showing two method declarations separated by an intervening method header token:

Duration>>asDays
        ^self ticks / TicksPerDay

## 

Duration>>asHours 
        ^self ticks / TicksPerHour

Essence# Syntax: Method Literals

A method literal is a method declaration surrounded by enclosing syntax so that it can be embedded as a literal value in an Essence# expression.

The EBNF for method literals:

MethodLiteral       = "[", MethodDeclaration, "]";

 

A method literal must be enclosed between a single beginning “[” character and a single ending “]” character, making its syntax rather similar to that of a block. The key difference between the syntax of a block and the syntax of a method literal is that the construct that immediately follows the beginning “[” character must be unambiguously a method header. And that, in fact, is one of the reasons that the “##” (method header) tokenexists, and is required in some cases, but optional in others: When the “##” token is required, it’s because its absence would create syntactical ambiguity, such that it would not be possible for the parser to distinguish ablock from a method literal.

For more information on the syntax of a method declaration, please see the article on that topic.

Methods defined in Essence# class libraries declare methods as method literals, instead of as method declarations that are the root of their respective parse trees. Using method literals for that purpose obviates any need to encode method names as filenames; or alternatively, it obviates any need to define a special syntax for dealing with sequences of method declarations, or for syntactically embedding method declarations inside of class declarations. So there’s no need for a special “file in” syntax, nor any need for a special parser that can consume a special “file in format.”.

That said, the ANSI standard does require that a conforming implementation support the Smalltalk Interchange Format. Essence# does not currently support that format, but will do so before it leaves beta.

Using Class Specifications

A method declaration may optionally use a class specification construct–but only if a method header token is also used. That means a method literal may also use a class specification construct, since its syntax is defined as an embedding of a method declaration enclosed in between the tokens “[” and “]”.

The presence or absence of the class specification construct may change the behavior of the compiler:

If there is no class specification in the method header, then whether or not the compiler will attempt to bind non-local variable references depends upon how the compiler is invoked. If the compiler is not provided with a binding context for non-local variables when it’s invoked, and if there is no class specification in the method header to provide one, then the compiler won’t check whether any references to non-local variables might be undeclared (however, that check is always performed whenever a method is added to a class or trait.)

On the other hand, if the method header includes a class specification, then the compiler will always attempt to bind references to non-local variables, using whichever class is specified by the class specification construct as the binding context. In that case, any undeclared variables will be treated as compilation errors.

In other words, the compiler interprets the presence of a class specification construct in a method header as a command to verify that there would be no undeclared variables referenced by the method it’s compiling, if that method were to be added to the specified class. Conversely, it interprets the absence of a class specification as a command to defer any such checks until the method is actually added to a class or trait.

When compiling either self expressions or “do its” (initializers or scripts,) the compiler is not configured to provide any default binding context for method declarations–and therefore is also not configured to do so formethod literals. That’s because there’s no way to know a priori what the “right” binding context might be in such cases.

Since methods are checked for any references to undeclared variables when they are added to a class or to a trait (which is usually the proper time, because that’s when the right binding context is known absolutely,) there are no system integrity issues raised by this binding paradigm. And that’s why the method literals in “methods.instance” and “methods.class” files don’t use class specification constructs in their method headers. There’s no need, really.

However, there are compilation use cases other than compiling “methods.instance” and “methods.class” files. And some of those use cases do require that the compiler bind all variable references during initial compilation–which is why class specification syntax is present as on option for method headers.

Multiple Object Spaces In Essence#

What is an Object Space?

An object space is an object that encapsulates the execution context of an Essence# program. It is also responsible for initializing and hosting the Essence# run time system, including the dynamic binding subsystem that animates/reifies the meta-object protocol of Microsoft’s Dynamic Language Runtime (DLR.)

Any number of different object spaces may be active at the same time. Each one creates and encapsulates its own, independent execution context. The compiler and the library loader operate on and in a specific object space. Blocks and methods execute in the context of a specific object space. Essence# classes, traits and namespaces are bound to a specific object space. Even when a class, trait or namespace is defined in the same class library and the same containing namespace, they are independent and separate from any that might have the same qualified names that are bound to a different object space.

In spite of that, it is quite possible for an object bound to one object space to send messages to an object bound to a different object space. One way to do that would be to use the DLR’s hosting protocol. That’s because an Essence# object space is the Essence#-specific object that actually implements the bulk of the behavior required by a DLR language context, which is an architectural object of the DLR’s hosting protocol.

The C# class EssenceSharp.ClientServices.ESLanguageContext subclasses the DLR class Microsoft.Scripting.Runtime.LanguageContext, and thereby is enabled to interoperate with the DLR’s hosting protocol. But an instance of EssenceSharp.ClientServices.ESLanguageContext’s only real job is to serve as a facade over instances of the C# class EssenceSharp.Runtime.ESObjectSpace. And EssenceSharp.Runtime.ESObjectSpace is the class that reifies an Essence# object space.

So, if you are only interested in using Essence#, and have no interest in using other dynamic languages hosted on the DLR, there is no need to use a DLR language context in order to invoke the Essence# compiler and run time system from your own C#, F# or Visual Basic code. You can use instances of EssenceSharp.Runtime.ESObjectSpace directly. The only disadvantage of that would be that using other DLR-hosted languages would then require a completely different API (e.g, using an IronPython library from Essence# code requires using the DLR hosting protocol, and hence requires using a DLR language context).

The advantages of using instances of EssenceSharp.Runtime.ESObjectSpace directly would be a much richer API that is far more specific to Essence#.

You can get the object space for the current execution environment by sending the message #objectSpace to any Essence# class (even to those that represent CLR types.) And the Essence# Standard Library includes a definition for an Essence# class that represents the Essence#-specific behavior of instances of the C# class EssenceSharp.Runtime.ESObjectSpace. It’s in the namespace CLR.EssenceSharp.Runtime, and so can be found at %EssenceSharpPath%\Source\Libraries\Standard.lib\CLR\EssenceSharp\Runtime\ObjectSpace.

There are many ways that Essence# object spaces might be useful. One example would be to use one object space to host programming tools such as as browsers, inspectors and debuggers, but to have the applications on which those tools operate be in their own objects spaces. That architecture would isolate the programming tools from any misbehavior of the applications on which they operate–and vice versa.

To get additional insight into the concept of object spaces and how they might be used to good effect, the paper Virtual Smalltalk Images: Model and Applications is highly recommended.

Using Reflection On The Essence# Code Base

Just because there is as yet no Essence# GUI library, and therefore no native Essence# code browsing tools, doesn’t mean that the intrinsic reflecting capabilities of Essence# can’t be used. In fact, scripts are provided in the shared scripts folder that provide at least some of the functionality traditionally provided by code browsers:

ShowAllMethods: The ShowAllMethods.es script can be used to print out the names and declaring class or trait of all the methods of a class or trait. The subject class or trait must be passed in as an argument, as in the following example which will print out the names and declaring class or trait of all the methods of class Array to the Transcript:

es ShowAllMethods -a Array | more

ShowAllMessagesSent: The ShowAllMessagesSent.es script can be used to print out all the messages sent by each method of a class or trait. The output is cross-referenced by the sending methods, and each such method specifies the class or trait that declares it. The subject class or trait must be passed in as an argument, as in the following example which will print out the names of all the messages sent by each method of class Array to the Transcript:

es ShowAllMessagesSent -a Array | more

ShowAllSenders: The ShowAllSenders.es script can be used to print out the names and declaring class or trait of all the methods in the object space that send a specified message. The subject message selector must be passed in as an argument, as in the following example which will print out to the Transcript the names and declaring class or trait of all the methods in the object space that send the message do:

es ShowAllSenders -a #do: | more

ShowAllSendersInHierarchy: The ShowAllSendersInHierarchy.es script can be used to print out the names and declaring class or trait of all the methods of a specified class that send a specified message. The subject message selector and the subject class or trait must both be passed in as arguments, as in the following example which will print out to the Transcript the names and declaring class or trait of all the methods of OrderedCollection that send the message do:

es ShowAllSendersInHierarchy -a #do: -a OrderedCollection | more

ShowUnimplementedMessages: The ShowUnimplementedMessages.es script can be used to print out the names of all the messages sent by the methods of a specified class or trait to the pseudo-variable self for which the specified class or trait has no implementing methods. The subject class or trait must be passed in as an argument, as in the following example which will print out to the Transcript the names of any messages sent to self by the class OrderedCollection that send messages for which OrderedCollection has no implementing methods:

es ShowUnimplementedMessages -a OrderedCollection | more

Note: Classes that represent CLR types typically have virtual Essence# methods that don’t need to be formally declared, because the Essence# dynamic binding system will automatically bind to and invoke the methods of a CLR type, provided those methods have less than two parameters. Messages sent in order to invoke such methods of CLR types will unavoidably show up as “unimplemented messages” when using the ShowUnimplementedMessages.es script.

ShowTraitUsageConflicts: The ShowTraitUsageConflicts.es script can be used to print out the name and declaring trait of all methods which were excluded from a trait usage expression due to the fact that methods with the same selectors were declared by two or more of the traits combined in a trait usage expression. The subject class or trait must be passed in as an argument, as in the following example which will print out to the Transcript methods excluded from the trait usage of ReadStream because two or more of the traits used by ReadStream had the same method selector:

es ShowTraitUsageConflicts -a ReadStream | more

 

Not My Type: Dealing with CLR types in Essence#

There are three issues that you will encounter when using Essence# that involve CLR types:

  1. Obtaining a CLR type as a value that can be stored in a variable, passed as a parameter or sent messages;
  2. Converting a value from one CLR type to another; and
  3. Creating instances of CLR generic types.

Obtaining A CLR Type As An Essence# Value

The Essence# class System.Type (in the Essence# namespace CLR.System, which corresponds to the .Net namespace System) has many class messages that answer commonly-used CLR types: For example, the expression Type string evaluates to the .Net object that represents the .Net type System.String and the expression Type timeSpan evaluates to the .Net object that represents the .Net type System.TimeSpan. [Note: Those messages are unique to Essence#; the actual .Net class System.Type doesn’t support them. You can add Essence#-specific instance methods or class methods to any .Net class or struct; they just won’t be available outside of Essence#.]

Of course, if you have an instance of the type, you can just send it the message #getType (which is a message to which all .Net objects will respond.) Another way to get a CLR type is to send the message #instanceType to the Essence# class that represents CLR values having that type.

If the CLR type can’t be obtained using one of the techniques explained above, the fallback is to encode the assembly-qualified name of the type in a string, and then send the string the message #asHostSystemType, as shown in the following example:

'System.IO.ErrorEventArgs, System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'
        asHostSystemType

In most cases (but not all,) the #asHostSystemType message will fail if the fully-qualified name of the assembly is not provided. Two exceptions to that would be when the type is in the mscorlib assembly or in the EssenceSharp assembly.

CLR Type Conversion

Formally–in other words, in theory–Essence# is dynamically typed: The “type” of an expression is simply the powerset of all messages that can be sent to the expression without causing a method-binding error. And that is also true in practice, in that the compiler permits any message to be sent to any expression, and in the fact that the compiler permits any expression to be assigned to any variable or passed as any argument in any message, and in the fact that typing errors are only discovered and reported at run time.

But that does not mean that any expression can be passed as any argument in any message to any receiver without causing any run-time type errors, even in cases where the only messages that will be sent to the value of an argument won’t result in a method binding error. That constraint does hold in most other Smalltalk implementations, and it does hold when the method that will be invoked by the message is an Essence# method. But it does not hold when the “method” that will be invoked is a CLR method.

In Essence#, a CLR method is just a “user primitive”–although the “primitive” does not always need to be formally defined as a method in the Essence# source code, because the dynamic binding subsystem automatically binds Essence# messages to CLR methods that require less than 2 arguments, to CLR type constructors that require no arguments, and to any non-private properties or fields of a CLR type. And it is generally the case in all Smalltalk implementations that “primitive methods” will fail if the types of their arguments are not what they require–even if the primitive could have performed its intended operation simply by sending (using Smalltalk dynamic message dispatch) the right messages to the argument. Primitives generally don’t send messages to their arguments using Smalltalk dynamic message dispatch. And the methods of CLR types certainly do not.

Fortunately, the Essence# dynamic binding system will–if it can–automatically do the required type conversions for you, such as number type conversions, String/Symbol conversions, and any conversions defined by implicit or explicit conversion operators defined by a CLR type. It will even convert Essence# blocks into CLR functions with the required parameter type signature.

But the Essence# dynamic binding system isn’t magic. It can’t handle all cases. It’s simply not possible to convert any type into any other type. And even when it is, the run time system may not have the information required to do it correctly. And in other cases, the logic to do the required conversion simply hasn’t been implemented, for one reason or another.

CLR Array Types

One common case where a conversion might be possible, but the dynamic binding system doesn’t attempt to do it, involves arrays. From the point of view of the CLR, an Essence# array is an array whose elements have the type System.Object (an instance of the Essence# class Array,) or the type System.Byte (a ByteArray,) or the type System.UInt16 (a HalfWordArray,) or the type System.UInt32 (a WordArray,) or the type System.UInt64 (a LongWordArray,) or the type System.Single (a FloatArray,) or the type System.Double (a DoubleArray,) or the type System.Decimal (a QuadArray) or the type String (a Pathname.)

Unfortunately, there are many methods of CLR types that require an array having elements of some type other than the ones supported by Essence#. If the CLR method parameter requires an array whose element type is not the same as that of the corresponding argument at run time, then the binding to the method will fail.

Fortunately, if the conversion of the array to one with the required type is possible (because all the elements can be converted to the required CLR type,) you can code that conversion yourself in Essence#. The following example shows how (this is not new functionality; it just hasn’t been documented/explained):

| clrSourceArray elementType arrayType argument|
		
"Step 1: Convert the Essence# array into a CLR array:"
clrSourceArray := #('one' 'two' 'three') asHostSystemArray.

"Step 2: Get the desired element type of the array to be passed as an argument:"
elementType := System.Type string.

"Step 3: Create a CLR array type with the necessary element type:"
arrayType := elementType makeArrayType.

"Step 4: Create the CLR array having the correct size and element type:"
argument:= arrayType new: clrSourceArray size.

"Step 5: Copy the elements of the source array into the array that will be used as an argument:"
1 to: clrSourceArray size do: [:index | argument at: index put: (clrSourceArray at: index)].
^argument

Or you could just send the message #asHostSystemArrayWithElementType: to the Essence# array, as in the following example:

#('one' 'two' 'three')
        asHostSystemArrayWithElementType: System.Type string

CLR Generic Types

The CLR has three types of “generic type”: Open generic types, partially-open generic types and closed generic types. An open generic type is also called a generic type definition. An open generic type is one where all of its generic type parameters remain “open” because none of them have been bound to a specific type argument. A partially-open generic type is one where some, but not all, of the type’s generic type parameters have been bound to specific type arguments. A closed generic type is one where all of the type’s generic type parameters have been bound to specific type arguments. For the most part, a partially-open generic type is essentially the same as an open generic type. The distinction that really matters is between closed generic types and ones that have generic type parameters that aren’t bound to a specific type.

It’s possible to create instances of closed generic types, but it is not possible to create instances of open or partially-open generic types. And that can be a problem, because .Net class libraries typically only provide generic type definitions that define open generic types. So, although you can define an Essence# class that represents a .Net type that is an open generic type definition, you won’t be able to use that Essence# class to create instances of the type by sending it the normal instance creation messages (e.g, #new or #new:). Creating instances of such a type requires providing one or more types that will be used as the type arguments to construct a closed generic type from the open generic type defined by the .Net class library.

Fortunately, you can use Essence# to construct a closed generic type from an open generic type definition, as illustrated in the following example:

| openGenericDictTypeDefinition esClass closedGenericType |
openGenericDictTypeDefinition := 
        'System.Collections.Generic.Dictionary`2' asHostSystemType.
esClass := openGenericDictTypeDefinition asClass.
closedGenericType := esClass 
                        instanceTypeWith: Type string 
                        with: Type string.
dict := closedGenericType new.
dict at: #foo 
	ifPresent: 
                [:value | 
                        System.Console 
                                write: 'The value at #foo is '; 
                                writeLine: value
                   ] 
	ifAbsent: 
                [
                        System.Console writeLine: '#foo is not present'
                ].
dict at: #foo put: #bar.
dict at: #foo 
	ifPresent: 
                [:value | 
                        System.Console 
                                write: 'The value at #foo is '; 
                                writeLine: value
                   ] 
	ifAbsent: 
                [
                        System.Console writeLine: '#foo is not present'
                ].

The Essence# class that represents a CLR type can be obtained by sending the message #asClass to the CLR type object.

If an Essence# class represents a generic type (whether the type is open, partially-open or closed makes no difference,) then you can create a closed generic type by sending one of the following messages to the Essence# class: #instanceTypeWith: aCLRType, #instanceTypeWith: aCLRType with: aCLRType, #instanceTypeWith: aCLRType with: aCLRType with: aCLRType ,  , #instanceTypeWith: aCLRType with: aCLRType with: aCLRType, #instanceTypeWith: aCLRType with: aCLRType with: aCLRType with: aCLRType with: aCLRType or #instanceTypeWithAll: anArrayOfCLRTypes.

One good way to handle the case where the .Net type you want to use is an open generic type definition would be to define a subclass of an Essence# class that represents that type, and then use one of the messages above to construct a closed generic type that will be the instance type of the subclass.

Another good way to do it is simply to use the CLR’s syntax for closed generic types when specifying the instance type of the subclass. The following examples show both approaches:

	"Class creation/configuration using the Essence# #instanceTypeWith: message"
	
	| superclass class |
        superclass := 'System.Collections.Generic.List`1, mscorlib'
                                asHostSystemType asClass.
        class := Class new.
	class 
                superclass: superclass;
                instanceType: (superclass instanceTypeWith: System.Type object).

	"Class creation/configuration using the CLR's syntax for closed generic types"
	| class |
        class := Class new.
	class 
                assemblyName: 'mscorlib';
                hostSystemNamespace: #System.Collections.Generic;
                hostSystemName: 'List`1[[System.Object]]'

Invoking .Net methods that have “out” or “ref” parameters

The latest release of Essence# introduces the ability to invoke .Net methods that have “out” or “ref” parameters. This post will explain what those are, why supporting them is a technical challenge, and how to pass such parameters to .Net methods that use them.

Conceptually, there are only two ways to pass parameters to functions: By value or by reference. Although there are (usually rather old or even ancient) programming languages that pass all parameters “by reference,” most programming languages pass parameters “by value” in the default case. And some languages, such as Essence#, pass all parameters “by value.”

The following example will be used to illustrate the difference in the semantics:

var x = 3;
var y = f(x);

If the variable x is passed to the function f as a “by value” parameter, then it will be impossible for the function f to change the value of the variable x. However, if the variable x is passed to the function f as a “by reference” parameter, then the function f will be able to change the value of the variable x (although it may nevertheless refrain from doing so.)

Although there is more than one way to implement the two different ways of passing parameters, conceptually the difference is that, when arguments are passed “by value,” only the value of the argument is passed in to the function being called, whereas when arguments are passed “by reference,” the address of the argument (typically, the address of a variable) is passed in to the function being called. In the latter case, the called function may use that address to assign a new value to the variable–although it need not do so.

In fact, many languages implement “pass by value” semantics by physically passing in the address of the arguments that are passed “by value,” but nevertheless disallowing assignment to function parameters. As Peter Deutsch famously said, “you can cheat as long as you don’t get caught.” However, neither the CLR nor the DLR’s LINQ expression compiler cheat in that way: Unless a parameter is declared to require “pass by reference” semantics, arguments are passed by copying their values into the activation frame of the function being invoked.

So the problem for languages implemented on the CLR, such as Essence#, that have no syntax for specifying that parameters will be passed “by reference,” is that there’s no straightforward way to invoke the methods of CLR types that have any “pass by reference” parameters, where the intended usage is that the called method will, as part of its normal operation, assign a new value to an argument that is passed by reference.

And there’s an additional complication: Some methods that use “pass by reference” parameters will neither need nor use the initial value of the parameter, because they only set its value, and never “read” it. Conversely, others will use the initial value to compute and set a new value for the argument. And yet others may do one or the other with the same parameter, depending on the state of the receiver and the value of other parameters.

The C# language attempts to reduce the ambiguity regarding the usage model of “by reference” parameters by requiring that the method definer use different syntax to declare parameters, depending on whether the method will or will not use the initial value of a “by reference” parameter to compute and then set the argument’s new value. In the C# syntax, a parameter that is passed by reference must have one of two keywords as a prefix: out or ref.

The C# out keyword causes the compiler to require that the method that declares the parameter must first assign a value to the parameter before the parameter can be used–and before the method can return. That makes it impossible to use or access the initial value of the parameter. In contrast, the C# ref keyword causes the compiler to require that whoever invokes the method must first assign a value to the parameter.

But C# is not the only language used on the CLR, and the CLR itself does not enforce any constraints on the usage of “by reference” parameters. A .Net language’s compiler may do so, but such is not required by the CLR. That means there may exist .Net types that have methods that use “pass by reference” parameters that do not abide by the rules of C#.

The reflection metadata provided by the CLR does reveal, for each method parameter, whether it is a “pass by value” or a “pass by reference” parameter.  It may also reveal whether the intended usage of a “pass by reference” parameter is as a C#-style out parameter. But compilers are not required to provide that particular piece of metadata. The parameter may be used as an out parameter even if the reflection metadata makes no such claim. The only information provided by the reflection metadata that must be accurate is whether or not the parameter requires that its corresponding argument be passed “by value” or “by reference.”

So for now, Essence# deals with .Net method parameters that require that arguments must be passed by reference in one of two ways:

If the actual argument at runtime is anything other than a block or delegate, the dynamic binding subsystem will create and initialize a new variable that will be passed as the argument. Unless the reflection metadata claims that the parameter is an “out” parameter, the variable will be initialized to have the same value as the argument specified by the invoking code. The method will then be invoked with that new variable as the argument corresponding to the “by reference” parameter. When the method completes, the value of the created variable that was passed in as the actual argument will be assigned to the expression that represents the originally-specified argument. That assignment may cause an exception, and it may fail to update the original variable. I’m researching improvements, perhaps using the DLR’s Meta Object Protocol.

However, if the actual argument at runtime is a block or delegate, then the dynamic binding subsystem will create and initialize a new variable that will be passed as the argument (just as it does for the other case.) The method will then be invoked with that new variable as the argument corresponding to the “by reference” parameter (still essentially the same procedure as in the other case.) But when the method completes, the procedure is different: The one-argument block or delegate that was the originally-specified argument will be invoked, with the newly-created variable as its argument. That permits the caller to access the value of the “out” parameter, because its value will be the value of the block (or delegate) argument. Note, however, that the block or delegate must have an arity of one, and if it’s a delegate, its parameter type must be assignable from the parameter type of the invoked method’s parameter.

Example Usage

Let’s assume that we want to invoke the method Get as defined in the C# class named ValueModel shown below:

class ValueModel {
        private static readonly Object notAValue = new Object();

        protected Object value = notAValue;

        public bool HasValue {
                get {return value != notAValue;}
        }

        public void Set(Object newValue) {
                value = newValue;
        }

        public bool Get(out Object value) {
                if (HasValue) {
                        value = this.value;
                        return true;
                }
                return false;
        }
}

If we have an instance of the above ValueModel class in a variable named model (in an Essence# block or method,) we can invoke the Get method as follows:

[:model |
        | value |
        (model get: [:v | value := v])
                ifTrue: [value doSomethingUseful]
]

Or more simply, we could just do:

[:model | model get: [:value | value doSomethingUseful]]

Important: If the .Net method being invoked returns a boolean value, then the block will only be invoked if the .Net method returns true. But if the .Net method has a return type of void, or has a return type other than Boolean, then the block will always be invoked.