Processing math: 100%


Intro to Type System

1. Simple Typed λ-calculus

1.1. Syntax

Recall that the syntax of pure λ-calculus formally defined as follows

e:=xλx.ee1 e2

Definition. Basic on this, we can define simple typed lambda calculus (STLC). The biggest difference is that we now have two inductively defined expressions: terms and types.

Typesτ:=unitτ1τ2Termse:=()xλx:τ ee1 e2

where

  • ():unit
  • λx:τ.e is a function that takes one input of type τ and evaluates e.

By convention, the function arrow is right-associative, so τ1τ2τ3 is the same as τ1(τ2τ3). This convention matches left-associative function application.

The operational semantics of this language is unchanged from CBV-calculus, counting () as a value which has no reduction rule. That is, we have the following definitions.

Definition. Values in STLC

v:=()λx:τ.e

Definition. Evaluation Contexts of STLC

E:=[]E ev E

Definition. Reduction Rules of STLC

eeE[e]E[e](λx:τ.e)ve[xv]

One other small change is that the presence of () means there are now closed -terms can get stuck. For example

()(λx:unit.x)

is a perfectly good closed λ-term that is also stuck. We would like to eliminate these, and the type system will allow us to do so.

1.2. Typing Rules

The typing rules for STLC are an inductive relation on three inputs:

  • a typing context Γ:VarType that maps variables to types
  • an expression e; and
  • a type τ.

Γe:τ means that we can prove that expression e has type τ in typing context Γ using the typing rules. Specially, e:τ is a short-hand for e:τ.

Specifically, typing rules for STLC is the four rules as follows,

Γ():unitΓ(x)=τΓx:τΓ,x:τ1e:τ2Γ:λx:τ1.e:τ1τ2Γe1:τ1τ2e2:τ1Γe1 e2:τ2

Γ,x:τ means exactly the same as Γ[xτ]

1.3. Expressive power

It’s reasonable to reconsider the types of natural numbers using Church’s encoding,

ˉ1:=λfx.f x

The application typing rule indicates that f:τ1τ2 and x:τ1, so ˉ1:τ1τ2τ1τ2. Since

ˉ0:=λfx.x

Then we have ˉ0:τ1τ2τ1τ1. which means τ1=τ2. Then we can show using mathematical induction that

nN.ˉn:ττττ

However with the defined function

add:=λmn.n succ m

we could found that add is not of type τττ in this case.

It is for this reason that most typed language include things like numbers and booleans directly as language primitives.

Another example is the Ω operator where

Ω:=(λx.x x)(λx.x x)

One can found that there is no way to give this a type by showing that we cannot give a type to λx.x x. In fact, we will later see that we cannot write down any nonterminating program in STLC (this is called type-soundness)

Definition. Type soundness
A type system is called type-sound if well-typed programs don’t get stuck.

2. Products, Sums, and Datatypes

Typesτ:=unitτ1τ2Termse:=()xλx:τ ee1 e2Valuesv:=()λx:τ e (closed)

Recall that the type system of STLC guaranteed type soundness, but it restricted us substantially. For instance, our encoding of numbers as Church numerals no longer worked the way we wanted. Instead, we will see how to add various other types to the language as primitives.

1.1. Integers

We extend the syntax of both types, terms, and values to STLC by defining integers. We use to denote arithmetic operations (+,,,exp).

τ:= int e:=ne1e2v:=n

we then need to extend corresponding semantics and update the type system with rules for integers where

E:=EevEm=n1n2(mathematically)n1n2n

and

Γn:intΓe1:intΓe2:intΓe1e2:int

By using the manually defined integers, we do not have the full power of Church numerals, which allowed us to define bounded for loops, but we have gotten a substantial feature back.

1.2. Pairs

Another useful data structure that we previously encoded as pairs.

τ:=τ1×τ2e:=(e1,e2)proj1 eproj2 ev:=(v1,v2)

1.3. Records

Labτ:={1:τ1,,n:τn}e:={1=e1,,n=en}e.v:={1=v1,,n=vn}

1.4. Sums

3. Type Inference

Languages like Haskell and OCaml perform type inferences. For instance, in the expression

(f 3)

the language knows that must be a function that has type intα for some type variable α.
If there are other occurrences of f, those will provide different constraints, and these constraints must all have a common solution for the program to have a type.

3.1. Unification

Definition. Unification

Briefly, unification is the process of taking two terms and finding a variable substitution that will take the two input terms to the same result.

Type inference and pattern matching in many languages are instances of unification. Pattern matching is generally done by unifying language expressions, while type inference is done by unifying type expressions.

Definition. Unifier

A unifier is a substitution that unifies a given collection of (unordered) terms. That is, given a collection

{s1t1,s2t2,,sntm}

of terms, a unifier is a substitution

γ()=[u1v1,u2v2,,unvn]

such that i{1,2,m},γ(si)=γ(ti). where {uk}nk=1 and {vk}nk=1 are two collections of subterms.

As a notation, we will write = [xs,yt,] to indicate the parallel substitution, not a sequential one. In particular, given a substitution like x[xg(y),yz] would produce g(y), which is different then the sequential behavior of g(z).

It’s obvious that if we can find a unifier between two terms, we can then find a unifier between any finite number of terms.
Even when unifiers exist, they are not necessarily unique. However, if a unifier does exist, we can show that there is always a most general unifier (MGU) that is unique up to renaming.

Definition. Most General Unifiers (MGU)

A substitution γ is a most general unifier for s and t if

  • is a unifier for s and t, and
  • any other unifier γ for s and t is a refinement of . That is, we can get γ from γ by doing more substitutions.

Algorithm. Robinson’s unification algorithm (1965)

Define a recursive function Unif that takes a set of equality constraints and produces their MGU (if exists) as follows

Unif():=IdUnif({xx}E):=Unif(E)Unif({f(s1,,sn)=f(t1,,tn)}E):=Unif({s1t1,,sntn}E)Unif({xt}E):=[xt];Unif(E[xt]) if xFV(t)

If E is a set of equality constraints, then E[xt] denotes the result of apply the substitution [xt] to all terms in E.

The only nontrivial part if the fourth rule, we actually create part of the unifier substitution. Note the sequential composition operator, so this is the substitution [xt] sequentially composed with the result of Unif(E[xt]).

3.2. Type Inference

Definition. Most General Type (MGT)

A type of a term is called most general if any other type of this term is a substituted instance of this type.

Note that type inference is no more than unification between type expressions. Still using STLC, Robinson’s algorithm gives MGT.

Suppose we want to infer the type of a λ-term e. WLOG, assume eh as no variables bound more than once and no variable with a binding occurrence that also occurs free. (Barendregt variable convention allows us to make such an assumption)

Now let e1,,en be an enumeration of all subterms of e. First create a unique type variable αi to each ei for 1in , and also a different unique type variable βx for each variable x that occurs in e. Then take the following constraints.

Syntactical FormConstraintei=()αiunitei=xαiβxei=λx.ejαiβxαjei=ej ekαjαkαi

If there are more types like int and bool in this language, we can also design rules for these types, which is similar to the ones we’ve shown.