 Spade Compiler Architecture

Spade is split into several crates with the `main` function residing in
`spade-compiler`. However, this crate is mainly glue code for the code in the
other crates.

An overview of the compilation flow is shown below

<img style="width:100%;" src="misc/architecture.svg">

Each rounded rectangle is a step in the compilation process, which generally
lives in a crate with a similar name. Each blue "note" is an artefact, created
by one of these steps, which also usually resides in a similarly named crate.

Below is a short description of what each of the artefacts are for, followed by
a description of roughly how the passes work.

## Intermediate Representations

### AST (Abstract Syntax Tree)

The AST is a tree based representation of the program structure, free of any
details from the source code such as white space or comments. The structure of
the AST is guaranteed to be `valid`, i.e. you cannot have a keyword in place
of an identifier, if statements have both a condition and a branch etc.


### HIR (High Level Intermediate Representation)

The structure of the HIR is similar to that of the AST, but the primary
difference is that name scoping and 'kind' has been checked here, and variables have
been replaced with IDs. The IDs `NameID(u64, String)` are a globally unique ID
assigned during AST lowering, along with a "descriptive name" to make debugging
the output code easier. There are no guarantees about this name.

All other rules of the language (apart from those which require type
information) are also guaranteed to be upheld in the HIR, for example, the
number of arguments to a function call is guaranteed correct.

### MIR (Medium Level Intermediate Representation)

The MIR has lost almost all the structure of the program and therefore only consists of `entities` (and in the future, other top level constructs like pipelines and functions) which in turn are lists of `statements`. Statements `bindings` or `registers` where each `register` or `binding` has only one simple expression. For example, `let x = a + b + c` in HIR looks like

```rust
_0: Int<16> = a + b;
x: Int<16> = _0 + x;
```

MIR as generated by the compiler is guaranteed to be correct. It has no generic
types and no type errors. Error checking might be added, but these will cause a
panic and should never be visible to the user


## Passes

### Parser

The parser goes from input text to the AST. It does this in 2 phases: `lexing`
and `parsing`. Lexing converts characters to `tokens` and almost all of the
work is done by the `logos` crate. To add more tokens, they only need to be
added to the `TokenKind` enum.

The stream of tokens is parsed by the parser which is a hand written recursive
descent parser. Each AST node has roughly one parsing function which tries to
parse a node of that kind.

The `#[trace_parser]` proc macro automatically gives a traceback of a parse
session allowing easier debugging.

### Global collection

The global collection pass walks through modules in the AST, collecting items
at the top level, like entities, other modules, types etc. These are added to
the `symbol_table` in the outermost scope.

### AST Lowering

AST lowering checks program semantics to generate HIR. It does this by
recursively traversing the AST, replacing identifiers and paths with `NameID`'s.

Here, the names used are checked to be the right "kind" (`entity`, `type`, `variable`)
by looking them up in the symtab.

AST lowering is done in 3 passes over the AST. The first two collect global symbols,
i.e. symbols visible at the top level of the program and is done in `global_symbols.rs`

The first pass collects the "left hand sides" of all declarations, i.e. it adds the names
of types and function-like items to the symtab. 

The second pass visit the "right hand side" of the items. Here: enum members are added,
type definitions are lowered to their HIR counterparts, function-likes are visited to
create "heads", containing the declaration but not the implementation

In the third pass, function-like items are visited and fully expanded to their HIR
counterparts.

#### Symbols, names and modules



### Type inference

Type Inference works on the HIR and is based on a unification algorithm.
Whenever a new typed thing (sub-expression, variable etc.) is found, a new
"type variable" and "type equation" is added. When a typed thing is used, for
example as an operand, the type system "unifies" the type equations associated
with that type. For example `x + y` unifies `x` with `int` (for now, in the
future, it should be unified with any `Addable` type), and then `x` is unified
with `y` to ensure that the `lhs` and `rhs` match

### HIR Lowering

HIR lowering goes from HIR to MIR. Here, the tree structure of each entity is
flattened into a series of MIR statements. Each expression and variable is
assigned its own MIR statement and they are all concatenated together for the
whole entity. Use of generic parameters is detected and alerted to here.


### MIR codegen 

MIR codegen is the final step of the compilation process where the MIR code is
converted to verilog. Each MIR statement gets roughly one variable or register
along with an assignment of the correct value.



## Misc. useful stuff

Here is a description of a few things that are seen throughout the compiler

### The Loc<T> struct

Almost all language constructs stem from a location in the original source code
`Loc<T>` encodes this information in a way that is relatively easy to propagate
through the compiler.

It provides several methods to transform the internals while keeping the
information such as `map` `map_ref`, `try_map` and so on.

For example, a HIR lowering step generally looks like this to preserver location info:
```rust
fn visit_some_node(input: ast::SomeNode) -> Result<ast::SomeNode, Error> {
    //...
}

fn some_other_visitor(...) -> Result<..., Error> {
    let hir_some_node = some_node.try_map(visit_some_node)?;
    ...
}
```

### Error reporting

Error reporting is done using the `codespan` and `codespan_reporting` crate.
Each intermediate step defines its own `Error` type and has its own function
for printing that error using `codespan_reporting`.

### Type representations

Types have several representations in the compiler, both between phases and for
different "kinds" of types.

The kinds include the following:

- Declarations
    - `enum X`
    - `struct Y`
- Generic arguments 
    - `enum Z<T>`: a generic type
    - `enum Z<#T>`: a generic number
- Specifications in type signatures or used to define other types
    - `x: X`
    - `z: bool`
    - `w: Z<int<8>>`
    - `(X, Y)`
- Type inferer internals (partially known types etc. `int<[type variable]>`)

In the AST, declarations contain a name, optionally some generic arguments and
a body. Specs are a `path` followed by 0 or more generic arguments

During AST -> HIR lowering the different kinds of types are handled quite differently.

### Declarations

Declarations go through 2 passes: collection and elaboration. Type collection
looks at the "left hand side" of the types and creates corresponding symtab
entries which just contain a name and a list of generic arguments to be passed. In
the future, these will also contain type constraints. After initial HIR lowering,
these are not touched until types are concretised before MIR lowering

### Specifications

Type specifications appear in the AST and HIR representations. In the AST, they
are only names with optional generic arguments while in the HIR, they are
"Declared" types, i.e. concrete types declared elsewhere, or generics present
in the current scope. Generic arguments have been partially checked for
correctness, in particular, generics can only appear on declared types, and the
amount of arguments is correct.

### Type inference types

The type inference code is loosely based on [this excellent lecture](type_inference_lecture)

Type inference runs once per item and tries to infer the type of everything in
the item which is typed. This includes variables, of course, but also each
sub-expression.

Each typed thing (`TypedExpression`) is assigned a type equation, with the LHS
containing the typed thing and the RHS containing a `TypeVar`. These equations
are solved using unification, to (hopefully) remove all `TypeVar::Unknown`
instances. If `Unknown` types remain after this process is done, user must
specify some type signatures

[type_inference_lecture]: https://www.youtube.com/watch?v=xJXcZp2vgLs

### Types in HIR to MIR lowering

Types in the HIR, and in type inference are "language level types" while types
in the MIR are backend level types. For example, while clocks are their own
type in the HIR, in the MIR they are lowered to booleans.


Type inference does not care about the details of the types, essentially it
only carers about the left hand side of the definition. The type arguments, the
constraints etc., but not whether or not a type is an enum or a struct. During
MIR to HIR lowering however, this information is needed. Thus, types from the
TypeState produced by the type inferer are converted to `ConcreteType`s. At
this point, items will also be monomorphised, i.e. generic items are turned
into concrete items with the right types.

`ConcreteType`s are then mapped to `MIR` types. During this conversion, it can
be assumed that the type arguments are correct and valid, i.e. given a
`ConcreteType::Single{base: Int, params}`, params is guaranteed to be a single
integer.
