14 KiB
Exact Table Types
Summary
In Luau, we currently support two kinds of table types: sealed, inexact table types (the kind you use as parameter types) and unsealed, exact table types (the kind the type inference engine uses when checking table literals and subsequent assignments to them in the same scope). The RFCs for unsealed table types use language like precision, but is unclear about their exact meaning. This RFC aims to clarify the language and syntax around different facets of table types, and proposes the introduction of sealed, exact table types and a migration plan to make them the default table type.
Motivation
In the current design of table types, we have conflated two axes of properties, namely whether tables are sealed vs. unsealed and whether tables are exact vs. inexact. In the interest of both making the language clearer and in making typechecking behavior easier to implement and explain, we want to disentangle these concepts. For the sake of precision, this RFC aims to always specify both facets, but in practice, we believe programmers will only really be concerned with whether or not a table type is exact.
The first distinction, sealed vs. unsealed, deals with state during typechecking. Namely, a sealed table type is one that will not be changed during typechecking. So, if we have a binding with a sealed table type, we know that the type of the binding itself will not change. By contrast, unsealed table types are stateful, and an assignment to a binding can grow the set of properties on that type. All unsealed tables are sealed when they leave the scope they were created in, so unsealed tables should be thought of as a technique used for typechecking imperative-style table initialization code.
The second distinction, exact vs. inexact, deals with how the table type relates to other table types. In particular, an exact table type indicates that the table has only the properties listed in its type, while an inexact table type indicates that the table has at least the properties listed in its type. This corresponds to whether or not width subtyping is allowed on the type.
With this language, we currently have two table types which are determined contextually, and have no
syntactic difference. When a table type is used as the type of a function parameter, e.g. function getx(t : {x: number}) return x end
, we treat is as sealed and inexact meaning the type of the
parameter will not change, but call sites to the function will permit tables that have more
properties than the ones listed. Within the body of the function however, the typechecker should
only permit the use of properties explicitly listed in the type since it was annotated. If the type
was instead inferred, a dual condition should be true --- the properties explicitly listed in the
type are precisely the ones whose existence the function depends on. Meanwhile, when a table type is
used for a local definition, it is given an unsealed and exact type meaning we always know
precisely the set of properties it has, but that the type itself is stateful and assignments are
allowed to grow the set of properties the table has.
This alludes to a gap of two sorts of table types: _sealed, exact tables and unsealed, inexact tables. We believe the latter is undesirable in the sense that there does not appear to be a compelling use case for forgetting some of the structure of the table you're building statefully. The main consequence of doing so would be that you would be unable to make further assignments to a property after forgetting it was present. The former, however, is the novel proposal for this RFC.
Design
A sealed, exact table type is effectively the most strict we could be about a particular table. It says that the table does not allow width subtyping, and therefore must have exactly the properties described by the type, and further says that the type itself is final meaning assignments to missing properties would subsequently be rejected. To imagine why this is useful, we can consider a small fragment of a hypothetical vector math library:
type Vec2 = { x: number, y: number }
type Vec3 = { x: number, y: number, z: number }
function length(pt: Vec2)
return math.sqrt(pt.x^2 + pt.y^2)
end
If the table type Vec2
here is inexact
, we would allow length
to be applied to a table of type
Vec3
. However, in this case, our domain knowledge tells us that length
's behavior is incorrect
for Vec3
since it does not take into account the z
-component. While we could design a function
that instead worked for both, it may be reasonable to instead say that this function only operates
on the exact table. While this particular example is admittedly a bit contrived,
similar proposals for other languages like TypeScript have collected many examples
where API designers wanted to be able specify exact types. One general pattern of use cases is
state machines where you might define a number of types representing individual states and then a
union over all such types. With only inexact types, the type system permits undesirable combinations
that can introduce unwanted behavior in code using the state machine.
type StartState = {
state: "Start",
startData: number,
}
type OtherState = {
state: "Other",
otherData: string,
}
type State = StartState | OtherState
-- but we can construct a `State` that is neither a `StartState` nor an `OtherState`.
local notRight: State = { state: "Start", startData: 42, otherData: "foo" }
Since this proposal would involve having two sorts of sealed table types, exact and inexact, and
their difference is not something we can distinguish contextually in general (as was the case was
sealed and unsealed), we have to propose a syntax for them. This proposal posits that the best
path forward is to make sealed, exact tables the default table that you get when you write, e.g. { x: number, y: number }
as an annotation on a parameter or in a type alias. To annotate a table as
inexact and therefore open to having additional properties, we would include an ellipsis in the
type, e.g. { x: number, y: number, ...}
. This was originally proposed as an alternative design in
the width subtyping proposal.
We propose these syntaxes ({field: type}
and {field: type, ...}
) for exact and inexact tables
respectively because they are fairly lightweight, support anonymous usage, and represent a fairly
mild change from the syntax that we already have. Alternative syntax options like interface
in
TypeScript would represent a bigger deviation from our current syntax of types.
There are a few apparent benefits to this approach. First, it means that a programmer designing an API has to make a conscious decision that "yes, in fact, it is sensible to treat all of the unlisted properties of this parameter as being unrelated to the behavior of this function." The type checker should prefer approaches that, by default, rule out more possible bugs in API usage while keeping explicit acceptance of the looser behavior natural and low friction. The motivation for taking this approach really being that, as a programmer, you are forced to notice when a program you want to succeed produces an error, but you can remain blissfully unaware when programs you want to be rejected succeed silently.
Secondly, though relatedly, making all table types inexact-by-default seems to exacerbate confusions
when considering both the meaning of and the subtyping relations between table types. Consider, for
example, the inexact table type {...}
(currently written {}
in the inexact-by-default world).
Right away, the syntax of the type matches the syntax of the value form for the empty table. This
makes it tempting to think of the type as something like "the empty table type" which one might
expect has a single inhabiting value, namely the empty table. However, with inexact-by-default, this
{}
type is really the type of every single table and it functions as a sort of top type for
tables. If you do mistakenly read it (even momentarily) as the empty table type, you can then be
surprised by the fact that it is not a subtype of any more specific table, such as { x: number, y: number }
. This makes sense with the understanding in mind that it is the type of all tables,
rather than just the empty table, but then you may be surprised that it is a subtype of {[any]: any}
, but is not a subtype of, say, {[any]: number}
. These sorts of surprises and confusions have
come up in internal discussions of subtyping for Luau, and the syntax appears to not be doing us any
favors here.
Thirdly, we argue that it gives the most coherent and intuitive syntax for distinguishing exact vs.
inexact tables since we believe that being able to distinguish between them and choose the
appropriate one is important. Ellipses have an already established connotation of omission which
maps exactly to how sealed, inexact types are omitting some of the properties that the tables
actually have (since tables with arbitrarily many additional properties can be accepted at an
inexact table type). The author is unaware of existing syntax that carries the same strong
connotation, but for exactness. One proposal discussed in the alternatives would be to follow Flow
in using {| x: number, y: number |}
to mean a sealed, exact table.
Though we are proposing to make the default interpretation of table annotations sealed and exact, we maintain that the sensible default inference behavior is to default to inferring inexact table types for unannotated table parameters. Consider the following example:
function dostuff(t)
...
t.x = 4
...
t.y = 5
...
end
In this case, we expect that t
will be given the inferred type {x: number, y: number, ...}
.
However, if we also passed t
to another function that requires an exact table, we can propagate
this requirement to dostuff
.
function dostuff(t)
...
t.x = 4
...
t.y = 5
...
dootherstuff(t)
...
end
function dootherstuff(pt: {x: number, y: number})
...
end
So, in this case, t
would be given the exact table type {x: number, y: number}
during inference.
Note that this is still true even if dostuff
did not refer to some or all of the properties of t
outside of passing it into dootherstuff
. That is, the type will be the same even if the code was
as follows:
function dostuff(t)
return dootherstuff(t)
end
function dootherstuff(pt: {x: number, y: number})
...
end
If instead the table is passed to a function requiring an exact table type with different
properties than the ones used in dostuff
, we will emit an error indicating that they are in
conflict.
function dostuff(t)
...
t.z = 4
...
...
dootherstuff(t)
...
end
function dootherstuff(pt: {x: number, y: number})
...
end
In this case, t
has two constraints: one indicating that it should be the exact table type
{x: number, y: number}
and the other indicating that it should have the property z
. These
constraints cannot be reconciled, and so we will get an error at the call site of dootherstuff
.
Drawbacks
The main drawback of doing this is that the existing RFC for sealed table
subtyping was already implemented as of March 1st, meaning that this is ultimately
a breaking change. As such, it would be important to manage the change as a gradual rollout to
mitigate problems. Since Luau currently only has sealed, inexact table types we can start by adding
the syntax for explicit inexact tables (e.g. {x: number, y: number, ...}
) and implement a warning
that explains the default behavior will change in the future and to change it to the explicit
inexact syntax if the programmer would like to keep the current behavior. At a later date, we can
enable exact-by-default, but should provide an explicit suggestion in the error message to tell the
programmer that if they intended to allow tables with additional properties, they can change the
signature of the function to include the ...
.
Alternatives
There are two primary alternatives to this proposal. First, we could stick with the decision that
was already made in the sealed table subtyping proposal and leave sealed, inexact
tables as the default table type. We then have either the choice of doing nothing else or adding
a new type syntax for sealed, exact tables. If we do nothing, we are reducing the ability for API
designers to rule out certain classes of bugs like the one presented with vector types. The
implicit assumption of width subtyping, after all, is that properties not listed in the type are
orthogonal to the function using that type, but of course, this assumption is sometimes wrong. It is
possible still that inexactness is so often necessary that we see little safety benefit from making
exactness available. If we add a new syntax for sealed, exact tables instead, the most sensible
syntax would likely follow Flow in including pipes, so {| x: number, y: number |}
would be the
type of a sealed, exact table. This approach does not require a migration path, but would likely
reflect the belief that the added safety of exact table types is only rarely necessary. The author
believes it is worth considering, however, that Flow has also been migrating to
exact-by-default for their object types.