Invoking .Net methods that have “out” or “ref” parameters

The latest release of Essence# introduces the ability to invoke .Net methods that have “out” or “ref” parameters. This post will explain what those are, why supporting them is a technical challenge, and how to pass such parameters to .Net methods that use them.

Conceptually, there are only two ways to pass parameters to functions: By value or by reference. Although there are (usually rather old or even ancient) programming languages that pass all parameters “by reference,” most programming languages pass parameters “by value” in the default case. And some languages, such as Essence#, pass all parameters “by value.”

The following example will be used to illustrate the difference in the semantics:

var x = 3;
var y = f(x);

If the variable x is passed to the function f as a “by value” parameter, then it will be impossible for the function f to change the value of the variable x. However, if the variable x is passed to the function f as a “by reference” parameter, then the function f will be able to change the value of the variable x (although it may nevertheless refrain from doing so.)

Although there is more than one way to implement the two different ways of passing parameters, conceptually the difference is that, when arguments are passed “by value,” only the value of the argument is passed in to the function being called, whereas when arguments are passed “by reference,” the address of the argument (typically, the address of a variable) is passed in to the function being called. In the latter case, the called function may use that address to assign a new value to the variable–although it need not do so.

In fact, many languages implement “pass by value” semantics by physically passing in the address of the arguments that are passed “by value,” but nevertheless disallowing assignment to function parameters. As Peter Deutsch famously said, “you can cheat as long as you don’t get caught.” However, neither the CLR nor the DLR’s LINQ expression compiler cheat in that way: Unless a parameter is declared to require “pass by reference” semantics, arguments are passed by copying their values into the activation frame of the function being invoked.

So the problem for languages implemented on the CLR, such as Essence#, that have no syntax for specifying that parameters will be passed “by reference,” is that there’s no straightforward way to invoke the methods of CLR types that have any “pass by reference” parameters, where the intended usage is that the called method will, as part of its normal operation, assign a new value to an argument that is passed by reference.

And there’s an additional complication: Some methods that use “pass by reference” parameters will neither need nor use the initial value of the parameter, because they only set its value, and never “read” it. Conversely, others will use the initial value to compute and set a new value for the argument. And yet others may do one or the other with the same parameter, depending on the state of the receiver and the value of other parameters.

The C# language attempts to reduce the ambiguity regarding the usage model of “by reference” parameters by requiring that the method definer use different syntax to declare parameters, depending on whether the method will or will not use the initial value of a “by reference” parameter to compute and then set the argument’s new value. In the C# syntax, a parameter that is passed by reference must have one of two keywords as a prefix: out or ref.

The C# out keyword causes the compiler to require that the method that declares the parameter must first assign a value to the parameter before the parameter can be used–and before the method can return. That makes it impossible to use or access the initial value of the parameter. In contrast, the C# ref keyword causes the compiler to require that whoever invokes the method must first assign a value to the parameter.

But C# is not the only language used on the CLR, and the CLR itself does not enforce any constraints on the usage of “by reference” parameters. A .Net language’s compiler may do so, but such is not required by the CLR. That means there may exist .Net types that have methods that use “pass by reference” parameters that do not abide by the rules of C#.

The reflection metadata provided by the CLR does reveal, for each method parameter, whether it is a “pass by value” or a “pass by reference” parameter.  It may also reveal whether the intended usage of a “pass by reference” parameter is as a C#-style out parameter. But compilers are not required to provide that particular piece of metadata. The parameter may be used as an out parameter even if the reflection metadata makes no such claim. The only information provided by the reflection metadata that must be accurate is whether or not the parameter requires that its corresponding argument be passed “by value” or “by reference.”

So for now, Essence# deals with .Net method parameters that require that arguments must be passed by reference in one of two ways:

If the actual argument at runtime is anything other than a block or delegate, the dynamic binding subsystem will create and initialize a new variable that will be passed as the argument. Unless the reflection metadata claims that the parameter is an “out” parameter, the variable will be initialized to have the same value as the argument specified by the invoking code. The method will then be invoked with that new variable as the argument corresponding to the “by reference” parameter. When the method completes, the value of the created variable that was passed in as the actual argument will be assigned to the expression that represents the originally-specified argument. That assignment may cause an exception, and it may fail to update the original variable. I’m researching improvements, perhaps using the DLR’s Meta Object Protocol.

However, if the actual argument at runtime is a block or delegate, then the dynamic binding subsystem will create and initialize a new variable that will be passed as the argument (just as it does for the other case.) The method will then be invoked with that new variable as the argument corresponding to the “by reference” parameter (still essentially the same procedure as in the other case.) But when the method completes, the procedure is different: The one-argument block or delegate that was the originally-specified argument will be invoked, with the newly-created variable as its argument. That permits the caller to access the value of the “out” parameter, because its value will be the value of the block (or delegate) argument. Note, however, that the block or delegate must have an arity of one, and if it’s a delegate, its parameter type must be assignable from the parameter type of the invoked method’s parameter.

Example Usage

Let’s assume that we want to invoke the method Get as defined in the C# class named ValueModel shown below:

class ValueModel {
        private static readonly Object notAValue = new Object();

        protected Object value = notAValue;

        public bool HasValue {
                get {return value != notAValue;}
        }

        public void Set(Object newValue) {
                value = newValue;
        }

        public bool Get(out Object value) {
                if (HasValue) {
                        value = this.value;
                        return true;
                }
                return false;
        }
}

If we have an instance of the above ValueModel class in a variable named model (in an Essence# block or method,) we can invoke the Get method as follows:

[:model |
        | value |
        (model get: [:v | value := v])
                ifTrue: [value doSomethingUseful]
]

Or more simply, we could just do:

[:model | model get: [:value | value doSomethingUseful]]

Important: If the .Net method being invoked returns a boolean value, then the block will only be invoked if the .Net method returns true. But if the .Net method has a return type of void, or has a return type other than Boolean, then the block will always be invoked.

Advertisements

Appe’s New ‘Playgrounds’: Back To The Future, One More Time

So Wired thinks Apple’s Playgrounds development paradigm is revolutionary?

Not so much: Smalltalk programmers have been coding with analogous capabilities since before Steve Jobs ever saw his now famous demo at Xerox PARC.

The Smalltalk development environment now has close to 40 years of maturity and experience behind it. And its cool features from 1979 still haven’t all been added to the IDEs of other languages, to say nothing of the ones added since then.

Ref: http://www.wired.com/2014/07/apple-swift/

Essence#’s Predecessor: Iron Smalltalk

Before Essence#, there was IronSmalltalk.

When I started work on Essence#, on or around 9 Jan 2014, I had never heard of Iron Smalltalk.  That name, of course, would be a rather obvious choice, given names such as “Iron Python,” “Iron Ruby” and “Iron Scheme” for other DLR-hosted languages. And yes, I not only considered using “IronSmalltalk” as the language’s name, I actually did use it initially.

Nor do I believe that either of my two advisers/consultants on the project (Craig Latta and Peter Lount) had ever heard of an actual ‘IronSmalltalk’ either (as a real implementation of the language, and not as a concept.) If they had, they certainly didn’t mention it. And Craig Latta was in the middle of a multi-week visit here with me where I live during the time I started the Essence# project, and I was speaking by phone with Peter Lount on a daily basis about the project (and we’re still doing that.) So both of them would have had plenty of chances to tell me about the “other” IronSmalltalk while I was still using that name for what will now always be known as Essence#.

In any case, by mid February (2014,) I changed the name–not because I had discovered that there already was an IronSmalltalk,  but because I didn’t want to have the word “Smalltalk” in the language’s name (but this post isn’t about that, so I’ll explain why I came to that view some other time.)

Fast forward to late May 2014: That’s when I discovered that there was an actual Smalltalk implementation based on the DLR other than mine, and that it was named IronSmalltalk.

Even now, I don’t know all that much about it. I haven’t browsed the code other than to have looked at the folders and file names using the CodePlex source code browsing applet.

Why not? Partly because I’m just too busy implementing Essence#, partly because it’s almost certainly way too late to make any significant architectural changes to Essence# based on whatever I might learn by reading Todor Todorov’s code (he’s the author,) and partly because I just don’t want to plagiarize it (even though it’s open source.)

Most of what I know about IronSmalltalk’s design and implementation I learned just by watching the video of Todor Todorov’s ESUG 2011 presentation on IronSmalltalk. From that, I can tell that the author of IronSmalltalk a) knows what he’s talking about, b) made some of the same architectural decisions I did, but c) did some things rather differently.

For example, IronSmalltalk uses CLR Strings as the direct implementation of Smalltalk Strings, but Essence# does not. The reason is because CLR Strings are intrinsically immutable. Yes, the ANSI Smalltalk Standard requires that String literals be immutable–but that’s not a problem for Essence#, because any Essence# object can be made immutable. And Strings have traditionally been mutable in Smalltalk. Interestingly, IronSmalltalk and IronPython both decided to adopt CLR Strings as their native String objects, while IronRuby and Essence# both decided to implement their own Strings. Although Essence# can and does use CLR Strings also (and I assume IronRuby does the same.)

Just so you know: The primary reason that Essence# doesn’t use CLR Strings and doesn’t use CLR arrays as its “native” implementation of Strings or of Arrays is because the #become: primitive is intrinsically not implementable on the CLR. By implementing both Strings and Arrays as a wrapper over native CLR arrays, much of the pain of not having a #become: primitive is eliminated: The double indirection makes it possible to resize Strings and Arrays “in place” without needing to use #become:. The fact that Smalltalk Strings have traditionally been mutable was a secondary consideration.

Another example involves the DLR’s dynamic binding protocol: IronSmalltalk does it the way the DLR documentation strongly recommends, and Essence# does not. So what’s the recommended way, and why doesn’t Essence# do it that way?  Glad you asked:

The Canonical DLR DynamicMetaObject Protocol For Binding Abstract Operations To Concrete Behavior

The officially recommended (“canonical”) protocol for dynamic binding using the DLR’s DynamicMetaObject Protocol can be expressed as follows:

1. Each operand of an operation (e.g., a message send, although in other languages there are many other possibilities, because most languages do so many things using special syntax) is asked to provide a DynamicMetaObject to act as its agent for participating in the DLR’s meta-object protocol for dynamic binding. If the object is unable to do so, a default DynamicMetaObject is created for it (sort of like a court-appointed public defender.)

2. The DynamicMetaObject that is the agent for the object that is responsible for dynamically (at run time) determining the semantics of the operation is asked to bind the abstract operation (whatever that may be) to a specific physical implementation. Note that identifying which object that is–or perhaps which set of objects–is the responsibility of each DLR-based language’s dynamic binding subsystem.

For example, in the case of most OO languages, the DynamicMetaObject that is the agent for the object that is the receiver of a message would be asked to provide a binding for that message–in other words, an invocable function and associated “binding restriction.” A “binding restriction” is a predicate that can be evaluated to discover whether the binding (the invocable function) is still valid, given the current set of operands. Typically, if the “type” or “class” of the “receiver” isn’t the same as it was when the binding was computed, then the binding is no longer valid and must be recomputed.

3. If the DynamicMetaObject is able to provide a binding, then that binding will be used as the physical implementation of the abstract (logical) operation, for as long as the binding restriction predicate says that the binding is still valid.

4. However, if the DynamicMetaObject of the object nominally responsible for defining the semantics of an operation is not able to provide a binding (e.g, what’s the semantics of the DLR-standard “DeleteMember” abstract operation when the receiver is a Smalltalk object, or of the DLR-standard “Invoke” abstract operation when the receiver is null?), then the responsibility for providing a binding is redirected to the dynamic binding subsystem of the language that compiled the code that’s undergoing dynamic binding. This “host language” (to use the term the DLR uses) may be able to provide a binding, or it too may fail.

5. If the host language can provide a reasonable binding based on its own rules and semantics, it will do so. Otherwise, as a last resort, the binding that will be used would typically report or raise an error (although the DLR imposes no such requirement.) Note that different host languages may bind the exact same abstract operation on the exact same object differently: For example, one language might bind any operation applied to null by providing an invocable function that raises the NullReferenceException, whereas another language might instead provide a method it finds in the method dictionary of its UndefinedObject class, and yet another language might provide a function that’s just a no-op.

So that’s the standard binding protocol, and that’s what IronSmalltalk, IronPython and IronRuby do. And it’s what most languages do when they use the DLR’s DynamicMetaObject Protocol.

Of course, the point of that canonical (“officially recommended”) protocol is to ensure that objects have the right of first refusal in determining the concrete semantics of the abstract operations applied to them. And that’s a commendable and worthy goal of any programming language and/or execution environment.

But it’s not what Essence# does.

The Essence# Dynamic Binding Protocol

1. The first step of the Essence# dynamic binding protocol is identical to that of the canonical protocol. It pretty much has to be: The code that implements it is part of the DLR itself. Although it would be possible to circumvent it but still use the DLR’s DynamicMetaObject Protocol for dynamic binding, it would be a lot more work, and would be much more likely to break if the DLR’s API is ever changed.

2. If the receiver of the message is a native Essence# object, then the object’s class will be asked for the method whose selector matches the message that was sent. If the class has such a method (either in its own local method dictionary, in the method dictionary of the trait or composite trait it’s using, or if its superclass can provide a matching method,) then that method will be used as the binding’s invocable function, and the receiver’s CLR type and Essence# class will be used as the binding restriction (actually, the version ID of the class will be used, because the method might be removed from the class at some time in the future.) If no matching method can be provided by the receiver’s class, then the binding subsystem will ask the class for its #doesNotUnderstand: method. If it can provide such a method, then that will be used. Otherwise, it will provide a function for the binding that directly raises the MessageNotUnderstood exception (so defining the #doesNotUnderstand: message is entirely optional.)

3. If the receiver of the message is not a native Essence# object (i.e., it’s an instance of some non-Essence# CLR type,) then the Essence# Object Space for that execution context will be asked to find or dynamically construct the Essence# class that represents instances of that CLR type. That operation cannot fail–a new Essence# class will always be defined, if it does not already exist. And there can be only one such class for each CLR type in any particular Essence# Object Space. The object’s class will then be asked for a method that matches the message selector, which happens in the same way as it would for a native Essence# object. The only difference is what happens if the class cannot provide a matching method:

4. In cases where the Essence# class for a foreign object does not have a method that matches the message selector, the DynamicMetaObject of the receiver will then be asked to provide a binding.  If it is able to do so, then that binding will be used.

5. Otherwise, if the receiver’s DynamicMetaObject fails to provide a binding, the Essence# dynamic binding subystem takes over the responsibility for doing so. And that’s where much of the magic happens which enables Essence# to interoperate so smoothly and effectively with languages that aren’t primarily based on the DLR (i.e, C#, F#, VisualBasic, etc.) That’s also where you’ll find the bulk of the code that implements the Essence# run time system.

So that’s what Essence# does to implement dynamic binding. The remaining question is why it does it that way, even though it’s not what the authors of the DLR strongly recommend.

The reason is both quite simple, and quite profound: The difference between the standard naming conventions of different programming languages in general, and the very stark differences between the naming conventions of Smalltalk-like languages and every other programming language in existence.

You see, the DLR’s protocol is based on the fallacy that the names of operations are the same in different languages, and based on the false assumption that what one language does by using a syntactical construct other languages also do by using some syntactical construct. Neither of those assumptions hold universally even between languages that use the same operation naming syntax, such as identifier + [“(” + {argument} + “)”] + “;”. But of course, Smalltalk-based languages don’t even share that meta-syntax with other languages, let alone the names themselves. And Smalltalk-based languages do almost everything by sending messages, and almost nothing by using special syntax, such as appending “()” to indicate “invoke a function” or prepending “(typeName)” to convert a value from one type to another.

So that’s why the first step Essence# takes in finding a binding when a Smalltalk message is sent to a foreign object is to ask the object’s Essence# class to interpret the message: It’s an Essence# (Smalltalk) message, not a message whose name or syntactical form is highly likely to mean anything to the objects of any other language.

What could or should the objects of any other language do when asked to find a binding for messages such as #displayOn:at:, #~~, or even just #value? Who could reasonably expect a CLR delegate with no arguments to know that it should respond to that message by invoking itself?

Worse, message (method) names such as #~~ and #displayOn:at: aren’t even legal in other languages. Even if they were, they aren’t likely to have the same semantics, because different languages usually have different standard libraries which use different names for the same thing.

Yes, Microsoft’s standard languages for .Net all use the same standard library-but that’s a special case. The DLR was intended to be used by dynamic languages with pre-existing standard libraries that are quite different than .Net’s BCL, and was not intended to be used by languages designed and implemented by Microsoft as native residents of the .Net framework. It therefore was a mistake to design the CLR’s dynamic binding protocol based on that false assumption. And that’s true without even considering the difference between Smalltalk’s keyword message names and the naming conventions used by all other languages.

The bottom line is that the Essence# dynamic binding protocol handles both the difference between Smalltalk’s message name syntax and also the difference between Smalltalk’s traditional and canonical message name semantics and that of the .Net BCL, and does so in a way that minimizes the risk that a message name will be misinterpreted just because it accidentally matches the name of a foreign function.

Essence# also provides a solution to the inverse problem of Essence# (Smalltalk) objects going on excursions to the homelands of foreign programming languages, and having foreign operations applied to them. And I admittedly haven’t documented that yet. Nevertheless, this post is already too long, so I’ll leave that topic to another time.

Announcing the Essence# blog

I’ve launched a new blog dedicated to Essence#. It will be all about Essence# and closely-related topics. So, “all Essence#, all the time.”