mimium: a self-extensible programming language for sound and music

We propose a programming language for music named mimium, which combines temporal-discrete control and signal processing in a single language. mimium has an intuitive imperative syntax and can use stateful functions as Unit Generator in the same way as ordinary function definitions and applications. Furthermore, the runtime performance is made equivalent to that of lower-level languages by compiling the code through the LLVM compiler infrastructure. By using the strategy of adding a minimum number of features for sound to the design and implementation of a general-purpose functional language, mimium is expected to lower the learning cost for users, simplify the implementation of compilers, and increase the self-extensibility of the language. In this paper, we present the basic language specification, semantics for simple task scheduling, the semantics for stateful functions, and the compilation process. mimium has certain specifications that have not been achieved in existing languages. Future works suggested include extending the compiler functionality to combine task scheduling with the functional paradigm and introducing multi-stage computation for parametric replication of stateful functions.


Background
Programming environments for computer music, which are software packages that enable the use of a digital computer to create music programmatically [8], have been continuously developed since the early days of computers, such as Max [18], Puredata [19] and SuperCollider [11].
Among these programming environments, conventional languages contain multiple layers internally for discrete event control and signal processing by composing Unit Generator(UGen): a fundamental component of signal processing and description of UGen [8].
In conventional environments, there is a limitation in terms of extensibility in the lower-level description of signal processing. The user can describe signal processing by combining many built-in UGens, such as filter or oscillator, provided in the language. If the user wants to define a new UGen that cannot be expressed by combining existing UGens (for example, a nonlinear oscillator), they must use generalpurpose languages such as C, to describe the algorithm in such environments.
For this type of problems, languages focusing on describing UGens, such as Faust [17], Kronos [14], Soul [27], Vult [21], and Gen~(an embedded language on Max) were developed. For example, Faust can be used to output UGen binaries for Max, Puredata, and SuperCollider via C++ code or can be used as an original UGen in Max by compiling Faust code on memory using LLVM. These languages do not have a scheduler for high-level event control. In languages such as Faust, such discrete values are defined as external values controlled outside the program (for example, via GUI, MIDI, OSC).
The advantage of using multiple languages is that it maintains a balance between efficiency (in terms of coding by user) and generality of possible expressions. In addition, it allows the user to choose the level of complexity according to the task [8].
Multi-language paradigms, however, actually lead to other problems. For example, sometimes, the user must use slightly different operators for similar expressions, which may reduce the efficiency of the programming process for beginners. For instance, while describing the addition of two inputs in Max and Puredata, the user must choose the right one from two different objects according to its data type between [+] for control and [+ ∼] for audio. SuperCollider also requires the user to select multiple methods for the same SinOsc object that generates a sine wave, such as SinOsc.kr when the required time-domain resolution is slow (e.g., LFO) and SinOsc.ar when it is used as an audio signal, depending on the processing load required.
In addition, practically, while a multi-language paradigm can strike a balance between generality of expression and efficiency of programming, the learning cost is high as users must learn separate languages for each domain. Considering that domain-specific-languages (DSLs) have high training costs [26], if the languages can be unified without losing generality and efficiency, the training costs can be reduced.
Improving self-extensibility is one of the important topics in the design of programming languages for sound and music. Dannenberg argues that introducing ready-made solutions to a language specification will ultimately limit the expressiveness of the language itself. The language should therefore increase its expressiveness, and it is better to develop individual solutions as libraries on the language [2].
ChucK allows users to define their own UGen in the ChucK language itself using a language extension ChuGen [22]; however, as ChucK itself is a virtual machine based interpreter language implemented on C, its runtime performance is inferior to that of UGen written in C++ for the same processing method. Furthermore, the data type of the input/output for UGen is distinct from a general numeric type, and the user must use the ChucK operator (=>) to represent connections between UGens.
In the Lisp-based live-coding environment Extempore, users can compile native binaries during runtime on a dedicated language called xtlang through the compiler-infrastructure LLVM [7], and the entire code including signal processing equivalent to UGen can be written within the Extempore environment while maintaining high runtime performance. It is, however, necessary to use two different languages: a dynamically typed language(Scheme) for control processing and a statically typed language(xtlang) for signal processing.
Kronos Meta-Sequencer [15], an extended specification of Kronos that was developed to unify Score, Orchestra(Composition of UGens) and Instrument(Description of UGen) languages through the preparation of syntactic sugars that combine the design pattern of Temporal Recursion [24] and IO Monads; however, Kronos can also be seen as a two-layered design of a dynamically typed meta-language that generates statically typed program [16, p34].

Introducing mimium
Granted the above background, we introduce mimium (minimal-musical-medium) 1 , a full-stack music programming language, which can describe everything from lowlevel signal processing to discrete event processing in unified semantics. Table 1 shows a comparison of the language specifications of mimium and existing languages. mimium realizes discrete-time event description and signal processing in unified semantics and achieves high execution speed via JIT compilation equivalent to UGen written in lower-level languages such as C++. The user does not need to be aware of hardware management such as memory allocation and release, which are determined statically during compilation.
In the following section, we describe the detailed language design and implementation of the running environment of mimium.
First, we introduce the basic syntax, showing that there awareness of hardware such as memory management is not required, and that type inference allows users to omit type annotations for variables. Next, we present the general architecture of mimium's running environment (compiler and runtime), showing that mimium code can be immediately compiled into native binaries through LLVM and executed without losing run time performance, even for signal processing.
In addition, we describe two characteristic features of mimium that allow describing continuous signal processing and discrete control processing in unified semantics. The first is the syntax for a deterministic task scheduling at the sample level and the implementation of the scheduler. The second is a description of the semantics used to define the UGen for signal processing on the language and its compilation process, comparing it to the existing paradigm in terms of the data structure of a pair of functions and internal state variables.
In the discussion section, we address two problems: (1) although mimium can describe discrete control and signal processing in unified semantics, the way it describes discrete processing is more likely to be imperative, and the functional paradigms used for signal processing are very different from one another, and (2) the current implementation cannot express a parametric replication of stateful functions for signal processing unlike Faust and Kronos. We will also explain the possibility of using multi-stage computation as a solution to the aforementioned problems.
2 Design of mimium

Basic Syntax and Semantics
The basic syntax of mimium is based on the Rust language [6], which can be written like a general imperative programming language. The main reason for this is that the syntax of the Rust is similar to conventional languages, and the relatively short reserved words are suitable for fields that perform quick prototyping like music. It also has the side effect of being close to the existing syntax of the language, which makes it easy to reuse the syntax highlighting from existing languages.
Listing 1 shows the list of the basic syntax. A formal definition of the language is written in Appendix A. Declaring a variable is done automatically by assigning some value to a variable with a new name withiin a scope of function. When declaring a variable, value type can be explicitly specified by providing the type name after a colon. When the type name is omitted, it can be inferred from context. Data types include void (an empty-value type), numeric (no distinction between integer/decimal type and internally a 64-bit float by default), and string as primitive types as well as aggregate types such as function, tuple, and array types. User-defined type aliases can also be declared.
The basic syntax includes function definitions, function calls, and conditional using if-else statements. mimium also incorporates the functional paradigm, allowing if statements to be used as expressions that can return values directly. This is achieved by having a syntax that allows multiple statements (assignment syntax or function execution) enclosed in a {} to be used as an expression that provides the value of the return expression of the last line (return can also be omitted). Similarly, function definitions are defined as syntax sugars for the assignment syntax of anonymous functions.
mimium is a statically-typed language, which means that the types of all variables and functions are determined during compilation. Type inference is based on Hindley-Milner inference systems (currently monomorphic).
In addition, for faster DSP processing, memory allocation and deallocation are determined statically at compile-time, and the runtime has no garbage collection.

Basic DSP in mimium
In mimium, when the user defines a function named dsp, it becomes an entry point to exchange audio input and output with an audio driver. The example is in Listing 2. In this case, the type of the dsp function must be a function type that takes a tuple of any number of floats and also returns a tuple of any number of floats. Each element of the tuple corresponds to input & output channels of the audio driver. The example of Listing 2 is a code that receives two channels of input from the audio driver, mixes them, and returns duplicated signals for the left and right channels.
The built-in functions in mimium include basic arithmetic operations, mathematical functions such as trigonometric and exponential functions defined in libc's math.h, built-in stateful functions such as delay and mem (one-sample delay), loadwav function for loading wav files using libsndfile [9], and print function for debugging. The filters and oscillators can all be defined as libraries by combining these functions. Figure 1 shows the architecture of a compiler and runtime of mimium.

Architecture
The structure of the compiler is similar to that of a general functional language, based on the implementation in mincaml [28] and implemented on C++.
Text data of source codes is first parsed into an abstract syntax tree, and after removing the syntax sugar, the AST becomes transformed into a lambda calculus-based tree structure. Then, type inference and type checking are performed to determine all variable types. The AST is converted with the type information into a single-static-assignment form imperative intermediate representation where all variables are assigned only once. Considering that nested function definitions are still allowed at this stage, a closure conversion is performed to remove free variables from the function definition.
State variable detection for mimium's unique specification of stateful function (described in Section 3.2) is performed between the closure transformation and the lower-level code (LLVM IR) generation. The transformer outputs the state variables used by the function as data in a tree structure (State Tree in the figure) with the node of the called stateful function names and the type of the state variables of the function, taking the dsp function as the entry point of the signal processing. Finally, the LLVM IR is generated based on the closure transformed IR and the State Tree.
The runtime consists of three parts: the execution engine, which receives the LLVM IR and compiles it into a native binary in memory; the audio driver, which handles input/output communication with the audio device; and the scheduler, which keeps information about the function and the logical time of the specified execution time. The audio driver currently uses RtAudio [23], a cross-platform library for C++ that abstracts audio devices through the operating system's API. The execution engine passes the dsp function, which is the entry point for signal processing, to the audio driver. The audio driver, in turn, commands the scheduler to advance the logical time. The scheduler is responsible for executing tasks as well as responding to requests from the execution engine to register tasks and obtain the internal time.
Only two functions of the LLVM IR compiled in mimium depend on the runtime system. One for registering tasks and another for getting the internal time. Almost 2 all other code is compiled on memory and executed; therefore, it can have the same execution speed as processing written in low-level languages such as C.

Scheduling with @ Operator
To describe events that occur discretely in the temporal direction in mimium, we used a design pattern called temporal recursion, which was introduced in Impromptu(a prior work of Extempore) [24] and used in several languages such as Overtone [1] and Kronos Meta-Sequener [15]. The design pattern describes repetitive event processing as a function that calls itself recursively with a time delay.
A concrete example is shown in Listing 3. When a numeric value is given after the @ operator following the function call, the function is not executed immediately. Instead, it is registered to a task queue with a priority keyed by time, and the execution context returns to the next statement. The runtime checks the task queue before processing each sample demanded on the audio driver clock, and if the key of the first task has reached the current time, it executes them first before processing the audio signal. The time is the absolute time as the runtime started executing each sample. The user can describe relative time using the keyword now to obtain the current logical time from the runtime as same way as prior works.
The function can be executed at regular intervals by calling itself recursively with a time specification within a function. In the case of Listing 3, the variable ntrigger is rewritten Listing 4. Equivalent code to Listing 3 in Extempore every time the function Nloop is called. Listing 4 is an equivalent code in Extempore to Listing 3. Extempore uses a special function called callback for temporal recursion, while mimium introduces a special operator @ to improve readability.
mimium uses synchronous scheduling based on logical time, similar to ChucK for the simplicity of implementation. The logical time based scheduling will result in inaccurate for processes that involve IO exchanges such as sending/receiving MIDI and OSC even once; however, if the process is closed only in the language, accurate processing can be guaranteed on a sample-by-sample basis; the implementation is simple, and the execution cost is relatively low.
In contrast, Extempore uses asynchronous scheduling by dividing the event scheduling threads. In the case of asynchronous scheduling, processes involving IO can be processed accurately in real-time; however, as the implementation depends on the OS task scheduler, it requires individual support for each OS, and the execution cost is relatively high.

Stateful Function for Signal Processing
In this section, we describe appropriate semantics and data structures for expressing UGen.

Comparing Semantics for UGen between Data
Structures. A UGen takes a series of input data, processes it in some way, and outputs it. At the first glance, this seems to be possible as a pure function, but in reality, we must use a data structure of a function and a set of variables.
For example, a pure function is sufficient if it only adds or multiplies the inputs; however, to represent signals that cannot be expressed as a map f (t) to time t, such as some filters and nonlinear oscillators, UGen must have an internal state. Therefore, to represent UGen in a general-purpose language, it must be represented as a data structure that combines functions such as objects and closures with internal states; however, if the user wants to use multiple objects or closures, they must instantiate them once and then call the actual process. mimium has semantics, which allow us to use UGen as if it is a normal function, without having to create a dedicated data type for signal processing. Further, we will see how to represent phasor, which is a sawtooth-wave-like UGen that increases from 0 to 1 at a constant rate and returns to 0 again, in objects, closures, Faust, and Vult, and then we present a semantics for UGen as a stateful function in mimium and its compilation.
Object. Object is a data structure that contains a set of member variables as well as a set of member functions (methods) that modify the variables and send messages to other objects. In the case of an object, the internal state is defined as a member constant. To use it, the user must instantiate it beforehand and then call the main processing method. Listing 5 shows the pseudo-code in C++.
Closure. Closure is a feature available in languages with a lexical scope that allows function definitions within functions. For example, the user can define function A that defines multiple local variables a, b, c. . . and returns another function B that refers to the variables a, b, c as free variables. Function A is a higher-order function that returns function B, and executing A is equivalent to creating an instance in an object.
The problem with using closures to describe signal processing is that it becomes difficult to determine the lifetime of variables at compile time. Languages that can use closures are often implemented with a garbage collection for automatic memory allocation and release, but it is difficult to bring GC to DSP languages [3] where functions are executed 48000 times per second in real-time. SuperCollider implements a GC that can work on real-time systems, and Extempore solves this problem by requiring the user to specify the lifetime with manual memory management. It means that either the user or the developer must bear the implementation cost.
The following example(Listing 6) is a pseudo-code in Java-Script 3 .
Functional Representation. In the description of signal processing in Faust and Kronos, a minimum set of functions with internal states represented by delays and a one-sample delay implicit in recursive connections are prepared as builtin functions to enable algebraic combinatorial expressions in UGen without reading and writing temporary variables.
As shown in Listing 7, unlike objects and closures, there is no need to instantiate them first, and temporary variables for the phasor are automatically allocated for each function call after compilation.
What is symbolized in these languages is, in Faust, a unit generator with input and output (a constant is a function with no input and one output, + operator is a function with two inputs and one output, and so on), and an input/output list of a processor in Kronos. In these languages, the symbols do not correspond to data on a specific memory address as in ordinary languages. For this reason, it is difficult to use these languages as self-extensible systems.
In the Vult language [21], if the user declares a variable with the keyword mem and not the usual variable declaration var in a function definition, the destructively changed value will be kept over time series so that it can represent the internal states of the UGens. This feature allows the user to 3 It is difficult to use this for signal processing practically as JS works with GC, but we used JS to show an example because it is imperative, easy to read, and closure can be used. represent the connection of a UGen with an internal state as if it were a normal function application and does not need to be instantiated in advance as in Faust.
In both Faust and Vult, functions with an internal state can be expressed directly without first instantiating them. Instead, the initialization of the internal state is determined at the time of function definition, and the initial value cannot be determined via a constructor when creating an instance. s In other words, by taking advantage of the fact that the initial value of the internal state is almost always zero or an array of zeroes, which constitutes domain-specific knowledge in signal processing, functions with internal states can be expressed in the same syntax as normal function definition and application.
By expressing all the stateful functions with a limited number of built-in stateful functions (delays, table lookups) and feedback connections as in Faust, stateful functions can be mixed with the normal function application grammar, eliminating the need to create an instance of the function once; thereby removing redundancy from the code. In addition, the user can use the keyword self in the function definition to refer to the return value returned by the function in the previous sample.
self is a reserved word that can only be used in function definitions. self is initialized with 0 and allows us to get the previous return value of the function. Listing 9 is the simplest use of self , a function that increments from 0 to 1 per sample.
By applying this method, we can define the UGen phasor, which we have seen as examples in objects and closures and as functions, as shown in Listing 10. In this example, the user does not need to declare variables in the function, and there is no need to instantiate when using the function. Additionally, the use of a recursive connection is closed within the unit of the function, unlike the representation of recursive connections as the infix operator ∼ in Faust.
Further, for users who are already familiar with the dataflow and functional paradigms, mimium provides the pipeline operator | > as a syntax that makes it easier to interpret stateful functions as connections between processors. The pipeline operator is used in several functional language specifications such as F# [13] and allows programmers to rewrite a nested function call h(g(f (arg))) as arg| > f | > g| > h. Listing 11 is an example of defining a sine wave oscillator using both regular function calls and pipeline operators.
The equivalent codes to Listing 11 are shown in Listing 12 in Faust and in Figure 2 in Max, which describes the flow of data from left to right in the same way as the graphical connection of UGen. In addition to the sequential composition operator in Faust(:), there are operators with similar functions in other languages, such as ChucK operator (=>) in ChucK language, but the difference is that mimium's pipeline operators are semantically equivalent to function calls.  The transformation is done as follows. First, all the function calls contained in dsp function definition are searched in order, and if the function is defined in mimium, the compiler further looks up its definition recursively to create a dependency tree of function calls.

Compilation Process of Stateful
Finally, if the function definition refers to self or the call of a built-in stateful function such as mem or delay, then the function becomes a stateful one, and the function that calls the stateful function is also determined as stateful.
After creating the tree, the function definition is rewritten such that the argument of the stateful function is a pointer to a tuple-type variable that lists all the state variables used in the function. The function call part should be rewritten in the same way, that is, to ensure that the state variables become explicit arguments. As an example, the code that uses the built-in delay function and the two types of self is shown in Listing 13.
The pseudo-code converted from this code to a form in which state variables are explicitly given as arguments is shown in 14.

Discussion
To summarize, mimium can describe temporal-discrete control and signal processing in unified semantics, including the definition of UGen as a stateful function, and the user can write code without being aware of the hardware. In addition, almost all of the code is compiled on memory through LLVM, so that the execution speed is equivalent to that of a low-level language. For writing discrete processing, the @ operator can be used to specify the time to execute a function, and by combining it with the temporal recursion design pattern, it is possible to abstract events that occur repeatedly in the time domain. For the description of signal processing, by hiding state variables and combining only feedback connections and limited built-in functions with states as in Faust, functions with internal states can be expressed in the same syntax as normal function definitions and applications.

Comparison to Related Works
Compared to the existing environment, mimium brings the following advantages: By taking an architecture that adds minimal musical features and semantics to the specification and implementation of a general-purpose programming language, it keeps the implementation simple and allows the user to focus on musical tasks without losing the self-extensibility of the programming language.
In fact, mimium can be used like a general-purpose scripting language when the source does not use scheduling or stateful functions. The compiler structure of mimium is the same as that of a general functional language except for the stateful function conversion part.
Extempore is similar to this approach in this aspect, allowing all description in a single environment; however, user must use two different language: Scheme and xtlang. xtlang requires the user to understand manual memory management and complex type signatures including pointers when defining UGen as a closure. Although a manual memory management is not always a negative point as Extempore is an environment for full-stack live programming that is not limited to music, it is generally essential to make hardware management such as memory and threads unnecessary or optional in the language specification, in terms of the language made for music, so that the user can focus on musical tasks as suggested by McCartney, the developer of SuperCollider, argues [11, p61].
Kronos (and Meta-Sequencer) is also similar language that focuses on self-extensibility. Kronos is more strict functional language based on System F , and it is more expressive as it can describe generic signal processor by parameterizing inputs and outputs of processor as lists. Its internal representation, however, is a graph structure [16, p23] like Faust. An internal representation in mimium is AST and SSA-form IR, more like to IR of general programming languages.

Remaining Problems
The following issues remain in mimium when using it practically as a unified language for music. First, the way of describing discrete events using a task scheduler is much more like an imperative paradigm that is apart from the functional design pattern in signal processing.
When using @ operators, deferred functions are not executed immediately; this inevitably leads to the use of voidtype functions with no return value and with side effects (destructive assignment of variables), following the form of imperative programming. Its programming style is far apart from the notation of connecting signal processing between the return value and the argument between functions. The combination of closures and temporal recursion, as in Listing 15, would allow us to abstract discrete values as functions and confine side effects within the function; however, this is not possible in the current implementation because the lifetime of a local variable defined in a function is closed within that function definition. If the compiler can statically determine how long a variable captured in a closure can survive by performing lifetime analysis [20], it would be possible to abstract discrete values without changing the language specification itself.
Another problem is that parametric replication of functions with states is not possible in the current implementation that can be realized in Faust using a pattern matching technique. Consider the example code in Listing 16 that inputs an arbitrary number of filters and adds the outputs together. Here, we assume that the function filter is a stateful function of some kind.
In the current implementation, the compiler cannot compile the code correctly because the compiler cannot determine how many instances of the state variable for the filter are needed statically. To solve this problem, partial application of the constant N to the function should be performed before the conversions of state variables.
In the future, the compiler will need to be modified to introduce a constant folding step between type inference and stateful function conversion.
The current semantics, furthermore, has a problem that the type system does not distinguish whether the argument is a constant or not. For example, if a function that returns some time-varying float is passed to N in function filterbank, it is allowed at the type checker level, but it fails at the constant folding stage. Semantically, this constant folding can be seen as describing two stages of computation in a single source code: one that determines the data flow of signal processing at the compile time, and the other that runs at run time.
This situation is similar to the paradigm of multi-stage computation such as templates/constexpr in C++ and Meta-ML [29]. Introducing a type system for multi-stage computation would solve the problem that the type checker cannot distinguish whether a variable is a constant or not, because it can distinguish the stage of computation (in this case compile-time and runtime) a variable belongs to.
In addition, because multi-stage computation can be used as an expressive macro [4], it is possible to build more specialized DSLs for specific expressions on mimium, just like developing DSLs built on top of SuperCollider, for instance, TidalCycles [12], FoxDot [5], and IXI [10], but in the same language system not like server/client model.
In addition, mimium's DSP is based on the sample-bysample format similar to Faust, and it is not possible to write functions such as FFT and granular synthesis that process multiple samples as vectors at once. Considering that in Kronos, this can be achieved by adding a built-in function to convert the sample rate, mimium also requires new semantics for the block computation.

Conclusion and Future Work
In this paper, we have described the design and implementation of mimium, a new programming language for music. mimium is characterized by the fact that it combines the discrete processing in the time domain and the signal processing that has been a problem in music programming languages.
As a language specification, music and signal processing can be written without considering the hardware, as memory management is not required and type inference is available.
In other words, mimium can enable the easy distribution of music generated by a program without fixing it via recording or rendering; therefore, it has the potential to serve as an infrastructure for codes as the musical medium.
To make it easy to use it as a practical tool for such applications, we are working on implementing environment variables (values that change depending on the execution environment even for the same code, such as the sample rate), enhancing IO such as MIDI and OSC support, enriching the library, and developing a mechanism to simplify code distribution, such as a package manager.
Additionally, a more formal definition of the languages and the type system, and consideration of a benchmark are the remaining issues.