Practical Function Inference within the NoSyn Programming Language

Motivation

Function inference allows for a function to have multiple overloads for the same parameter types, but return values of different types depending on the context in which the function was used. Function inference forms an integral part of NoSyn’s extension framework, allowing the user of the language to create a DSL or general purpose language within NoSyn with great capability to customise its syntax.

Usage Example

Consider the following expression:

2:6

The user may wish this to mean concatenating the two numbers into a list of integers. But there are also languages in which this symbol is used to denote the start and the end of an array slice:

//GO CODE
a[2:6]

Within Go, the 2:6 expression is explicitly handled within the compiler’s parser, forming part of the language syntax. In NoSyn however, this is treated like any ordinary expression. The rules for what is a valid expression within square brackets is specified by the user of the language using a combination of operator overloads and special type aliases, called closed aliases.

NoSyn provides the user with the ability to overload all operators and create their own operators from a specified list of possible symbols. Within freestanding NoSyn (the NoSyn language without any of its standard library) no operators are defined by default, including basic maths operators, boolean operators and even variable assignment.

In this example I am adding functionality to a NoSyn program to make the colon operator act as both a shorthand for creating arrays (1:2:3 becomes [1,2,3]) as well as a slicing operator, similar to that featured within Go, when used within square bracket array accessors. As previously stated, by default, NoSyn does not assign any meaning to any operators. This includes using square brackets. So to create array accessors:

T bracketop_[]_<T>([T] array, Int index) {
  //A native function (written in the compiler target language) which returns the value present at the position `index` on `array`
  native_indexAccess(array, index);
};
[T] bracketop[]_<T>([T] array, [Int] index) {
  //A native function which returns a sub-array of `array` defined by the first and second values of the `index` array
  native_sliceArray(array, index);
};

This creates the basic capability of the language to access an array as well as make array slices. Now functionality must be provided for the colon operator:

//NO SYN code
[T] infix_:_<T>(T a, T b) {
  native_formArray(a,b); //Forms an array of [a,b]
};
[T] infix_:_<T>(T x, [T] xs) {
  native_formArray(x,xs); //Forms an array of [x, xs...] where `x` is prefixed onto `xs`
};

This now creates the entire capability that we set out to create, making the colon operator work both as a short hand for array creation, and as an array slicing operator:

// For the purposes of these examples as well as all following examples. An assumption is being made that
// the equals sign has been overloaded to act as the assignment operator.
[Int] createdArray = 1:2:3:4; // : used for array creation
[Int] slicedArray = createdArray[1:3]; // : used for array slicing

This implementation of colon operator functionality comes with a serious flaw however:

[Int] slicedArray = a[2:4:5:6]

This should be considered a syntax error. But it is not.

Since the operator overload for [] is expecting to see an array, 2:4:5:6 is a perfectly plausible expression to put inside the square brackets. Of course the error could be picked up in the native_sliceArray expression but this would mean that the check would only be made at run-time, not compile time.

This is where function inference can become very useful.

Function Inference Example

//NO SYN code
[T] operator_:_<T>(T a, T b) {
  native_formArray(a,b)
}
(Int, Int) operator_:_(Int l, Int r) {
  (l,r)
}
[T] operator_:_<T>(T x, [T] xs) {
  native_formArray(x,xs)
}

T operator_[]_<T>([T] array, Int index) {
  native_indexAccess(array, index)
}
[T] operator_[]_<T>([T] array,  (Int, Int) arraySlicer) {
  native_sliceArray(array, index)
}

//Assuming 'a' is an Integer array.
[Int] slicedArray = a[2:6]

This new implementation allows for both element concatenation and array slicing,while maintaining the ability to throw an error at compile time when a[1:2:3] is provided. This code still has a problem however, the operator overload that has been created returns a tuple of two integers. This is not ideal, since the syntax that was specifically designed for generating an array slice would also be used in other contexts:

//Unwanted ability to create regular tuples
(Int, Int) vector = 20:30
//Unwanted ability to use tuples within the array access
[Int] slicedArray = a[(2,6)]

In order to prevent this usage of the colon operator, a closed alias can be used:

alias closed ArraySlicer = (Int, Int)

ArraySlicer operator_:_(Int a, Int b) {
  (l,r)
}

operator_[]_<T>([T] array, ArraySlicer arraySlicer) {
  native_sliceArray(array, arraySlicer)
}

Using this implementation. The colon operator overload function will only be inferred if the context in which it is used is specifically of the type ArraySlicer and not simply (Int, Int).

Algorithm for Function Inference

Function inference uses a type inference algorithm to work out which function overload to use. Type inference is typically used to save the programmer time by not requiring types on variables or functions. Function inference, by contrast, expects a certain amount of information about the context it is being used in. As such, NoSyn does not allow the user to specify a variable without also specifying the type of that variable. Functions similarly must indicate what the return type is, although template types are still valid. This constraint employed in using function inference is used to reduce the ambiguity which can arise while using such a type system. Later I will explore ways in which type inference of variables may be possible alongside function inference, and why such a feature may not be wanted.

Formal Definition of Function Inference Algorithm

Function Definitions Used Within Algorithm Formula

Ω(r,p)[y′]⇒k
Θz⇒n
Φx⇒m
y†
y′
* is the wildcard operator.
intersection

Algorithm Formula

Λ(r,p)[y]⇒k

The algorithm for function inference can be written as:

Λx[y]:= Λx[y] :=

let p:=[∀(α,β).ΘΛ(α,∗)[β]|zip(ΘΩx[y′],y†)] in

Λ(ΦΩ(x∩(∗,p))[y′],p)[y]

The function Λx[y] calls recursively until Ω(x,p)[y′] reduces to only a single possible function overload. If Ω(x,p)[y′] never reduces to a single function overload, the function call is ambiguous and a compile error should occur.

Example Inferences

Context Deduction

Function inference works on the basis of deducing the context in which a function is being used. All function calls are expressions and can be built up into larger expressions.

Using these rules we can deduce that given the following statement:

foo(10)

A slightly less simple program

//foo_IntNothing
Nothing foo(Int a) {..}
//foo_IntInt
Int foo(Int a) {..}
//bar_Int
Int bar() {..}
//bar_Float
Float bar() {..}

foo(foo(bar())) //Expression A

Expression A is an example of where function inference is required to find the correct function to be used. If you take the sub-expressions of expression A out of context, the functions they refer to cannot be known:

Horizontal Inference

With the previous example, the correct function overloads could be inferred by working in a top down fashion from the parent expression foo(foo(bar())) down to the leaf sub-expression bar(). This can be referred to as vertical inference in the sense that by looking at the context of an expression or its sub-expressions it is possible to infer the type of the expression.

Horizontal Inference means that the type of a sub-expression on the same level as the current one has an effect on the type which this sub-expression could be. Such inference is achieved by working up and down the expression tree gradually eliminating the possible types of expressions until all are resolved down to a single type.

Nothing foo(Int a, Double a) {..} //foo_IntDoubleNothing
Nothing foo(Int a, Char a) {..} //foo_IntCharNothing
Nothing foo(Double, Int a) {..} //foo_DoubleIntNothing
Int bar() {..} //bar_Int
Char bar() {..} //bar_Char
Int cello() {..} //cello_Int
Double cello() {..} //cello_Double

foo(bar(), cello()) //Expression B

Supporting Type Inference with Function Inference

Local Inference

Local inference is a feature that is now ubiquitous across almost every modern programming language including Java, C++, Go as well as functional languages. Local type inference is common across all languages because it is very easy to implement. Compile time type checking needs to be performed anyway, so the ability to perform local inference usually comes for free from the type checking implementation. Unfortunately local inference poses problems when used in NoSyn. Whereas in other languages the return type of a function with a given set of parameters types is always the same, this is not the case when using NoSyn. As such, the type of a specified variable must be given in order to differentiate the type that is to be returned from the given function. It would of course be possible to have local inference in cases in which there is only one possible return type, but should the user wish to add an additional function overload with a different return type, they would then need to go back and re-fix these assignments to prevent ambiguity. The cost of such a feature would likely outweigh the benefits, unless the programmer was very careful with how the write their code.

An alternative way in which local inference could be obtained in a way which would cut down on ambiguity would be to use control flow analysis. By seeing how a variable is used in the rest of the program, it is possible to deduce what the type actually is. Although this removes ambiguity, it also may make the code more difficult to read. Any programmer that wants to deduce the type of the variable would also have to perform a control flow analysis, checking multiple areas of the code to work out the actual type of the variable rather than looking up one function definition. It also may be the case that performing such a control flow analysis process on all the variables in the source code is quite computationally intensive, resulting in a slow compilation process.

Global Inference

Global inference refers to the ability to infer the parameter and return types of a function without the programmer having to specify any of them. The ability to perform global inference is much more rare among languages, it tending to be reserved only for functional languages that do not allow function overloading. Function overloading cannot be present in such languages because the types of the parameters are computed by the context that they are used within the function body. If a function within the body has overloading, then attempting to deduce the parameter types would not be possible, since it may satisfy multiple overloads. Since NoSyn does have function overloading, global inference is not possible within the language.

Template Types

Global inference is not possible meaning the type of parameters and return types cannot be deduced in advance for functions which do not have type signatures on their return values and parameters. However, writing typeless functions could be done in NoSyn by using templating. Template functions are functions that do not get compiled into the program until they are actually used in the program. The types of a template function do not need to be specified or known in advance. Instead, the compile generates a version of the function based on the context that it is being used. In other languages this is a relatively simple process. In C++, when a function gets used in context, the parameter types can be deduced by the parameter expressions alone. This is because an expression in C++ will always resolve to the same type regardless of the context in which it is being used. The same is not true for NoSyn. Due to function inference, the parameters to a function would ideally know the context in which they are being used in order to reduce the chance of an ambiguity error. Examining the body of the template function can be done in order to minimise ambiguity errors. Functions used within the body of the template function may have more restrictive type signatures allowing for the possible types on the parameters to be reduced. The return context of the function can also be used to restrict possible interpretations of the function further.