First draft of shared self types RFC

This commit is contained in:
ajeffrey@roblox.com 2023-03-13 17:56:24 -05:00
parent 1fa8311a18
commit 0a84f94634

345
rfcs/shared-self-types.md Normal file
View file

@ -0,0 +1,345 @@
# Shared self types
## Summary
This RFC proposes sharing `self` types between method definitions which share a metatable.
## Motivation
Currently, metamethods are type-inferred independently, and so give
completely separate types to `self`. This has poor ergonomics, as type
errors for inconsistent methods are produced on method calls rather
than method definitions. It also has poor performance, as the independent
self types have separate memory footprints, and there is idiomatic code with
exponential blowup in the size of the type graph.
For example, `Point` class can be simulated using metatables:
```lua
--!strict
local Point = {}
Point.__index = Point
function Point.new()
local result = {}
setmetatable(Point, result)
result.x = 0
result.y = 0
return result
end
function Point:getX()
return self.x
end
function Point:getY()
return self.Y
end
function Point:abs()
return math.sqrt(self:getX() * self:getX() + self:getY() * self:getY())
end
type PointMT = typeof(Point)
type Point = typeof(Point.new())
```
Currently, this code is problematic, since there is no connection between the types
of the `Point` metamethods. For example, the inferred type for `Point.getX`
is `<a>({ x : a }) -> a`, rather than the expected `(Point) -> number`.
Even worse, the method `Point.abs` does not type-check, since the type
of `self.x * self.x` is unknown. If Luau had subtyping constraints and type families
for overloaded operators, the inferred type would be something like:
```lua
type PointMT = {
new : () -> Point,
getX : <a>({ x : a }) -> a,
getY : <a>({ y : a }) -> a,
abs : <a, b, c>(a) -> number where
a <: { getX : (a) -> b, getY : (a) -> c }
Add<Mul<b, b>, Mul<c, c>> <: number,
}
type Point = {
x : number,
y : number,
}
```
but this type is not great ergonomically, since this type may be presented to
users in type hover or type error messages, and will surprise users
expecting a simpler type such as:
```lua
type PointMT = {
new : () -> Point
getX : (Point) -> number,
getY : (Point) -> number,
abs : (Point) -> number
}
type Point = {
x : number,
y : number,
@metatable PointMT
}
```
This is the type inferred by *shared self types*. Rather than
inferring the `self` type separately for each metamethod declared on a
table, and for each use of `setmetatable` the same type is used.
Unfortunately, while this change is fairly straightforward for
monomorphic types like `Point`, it is problematic for generic classes
such as containers. For example:
```lua
local Set = {}
Set.__index = Set
function Set.new()
return setmetatable({
elements={}
}, Set)
end
function Set:add(el)
self.elements[el] = true
end
function Set:contains(el)
return self.elements[el] != nil
end
type SetMT = typeof(Set)
```
In this case, the expected type would be something like:
```lua
type SetMT = {
new : <E>() -> Set<E>,
add : <E>(Set<E>, E) -> (),
contains : <E>(Set<E>, E) -> boolean
}
type Set<E> = {
@metatable SetMT
elements : { [E] : boolean }
}
```
Inferring this type is beyond the scope of this RFC, though. Initially, we propose only inferring `self` monotypes,
in this case:
```lua
type SetMT = {
new : () -> Set,
add : (Set, unknown) -> (),
contains : (Set, unknown) -> boolean
}
type Set = {
@metatable SetMT
elements : { [unknown] : boolean }
}
```
and propose allowing explicit syntax for declaring the shared self type:
```lua
type Set:self<E> = { [E] : boolean }
```
This type (and its generic type parameters) are used as the type of
`self` in methods declared using `function Set:m()` declarations.
In cases where shared self types are just getting in the way, there
are two work-arounds. The shared self type can be declared to be
`any`, which will silence type errors:
```lua
type Foo:self = any
```
The self type can be declared explicitly:
```lua
function Foo.m(self : Bar) ... end
```
## Design
For each table type `T`, intoduce:
* the self type parameters of `T`, a sequence of generic type and typepack variables,
* the self type definition of `T`, a type which can use the type parameters of `T:self`, and
* the self type of `T`, which is the self type definition of `T` extended with `@metatable T`.
These can be declared explicitly:
```lua
type t:self<As> = U
```
which defines (when `t` has type `T`):
* the self type parameters of `T` to be `As`,
* the self type definition of `T` to be `U`, and
* the self type of `T` to be `U` extended with `@metatable T`.
For example,
```lua
type SetMT = typeof(Set)
type Set:self<E> = { [E] : boolean }
```
declares:
* the self type parameters of `SetMT` to be `E`,
* the self type definition of `SetMT` to be `{ [E] : boolean }`, and
* the self type of `SetMT` to be `{ [E] : boolean, @metatable SetMT }`.
If there is no explicit declaration of the self type of `T`, then
* the self type parameters of `T` are empty,
* the self type definition of `T` is a free table type `U`,
* the self type of `T` is a metatable type, whose metatable is `T`, and whose table is `U`.
Then free table type is inferred in the usual fashion.
Self types are used in two ways: in calls to `setmetatable`, and in metamethod declarations.
In calls to `setmetatable(t, mt)`:
* if `mt` has type `MT`,
* and `MT` has self type parameters `As`, self type definition `T` and self type `S`,
* and `T` has (final) type `T [ Ts/As ]`
* then `setmetatable(t, mt)` has type `S [ Ts/As ]`.
As currently, this has a side-effect of updating the type state of `t` from
'T [ Ts/As ]` to `S [ Ts/As ]`.
For example, in `setmetatable(Set, { elements = {} }`, we have:
* `Set` has type `SetMT`
* and `SetMT` has self type parameter `E`, self type definition` { elements : { [E] : boolean } }`, and self type `{ elements : { [E] : boolean }, @metatable SetMT }`,
* and `{ elements = {} }` has type `{ elements : { [X] : boolean } }` for a free `X`,
* so `setmetatable(Set, { elements = {} }` has type `{ elements : { [X] : boolean }, @metatable SetMT }`.
as a result `Set.new` has type `<a>() -> { elements : { [a] : boolean }, @metatable SetMT }`.
As another example, in `setmetatable(Point, result)`:
* `Point` has type `PointMT`,
* and `PointMT` has no self type parameters, a free self type definition (call it `T`), and a self type whose table type is `T` and whose metatable is `PointMT`,
* and `result` has final type `{ x : number, y : number }`,
* so (by unifying `T` with `{ x : number, y : number }`) `setmetatable(Point, result)` has type `{ x : number, y : number, @metatable PointMT }`.
As a side-effect, the type state of `result` is updated to be `{ x : number, y : number, @metatable PointMT }`,
and unification causes the self type of `PointMT` to be `{ x : number, y : number, @metatable PointMT }`.
Note that this relies on the type of `result` being `{ x : number, y : number }`, which is why we use the final
long-lived type of `t` rather than its current type state.
In method declarations `function mt:m`:
* if `mt` has type `MT`,
* and `MT` has self type parameters `As` and self type `S`,
* then give the `self` parameter to `MT.m` the type `S [ Xs/As ]` for fresh free Xs (these types are quantified as all other free types are).
For example, in the method `Set:add(el)`:
* `Set` has type `SetMT`,
* and `SetMT` has self type parameter `E`, and self type `{ elements : { [E] : boolean }, @metatable SetMT }`,
* so `self` has type `{ elements : { [X] : boolean }, @metatable SetMT }` when type checking the body of `Set:add(el)`.
At this point, type inference proceeds as usual:
* `el` id given fresh free type `Y`,
* the statement `self.elements[el] = true` will unify `X` and `Y`,
* quantifying will result in type `<a>({ elements : { [X] : boolean }, @metatable SetMT }) -> ()`.
## Drawbacks
### Partially-constructed objects
Shared self types capture an idiom where tables with metatables have
all of their fields initialized *before* any methods are called. In
cases where methods are called before all the fields are initialized,
this will result in optional types being inferred. For example:
```lua
function Point.new()
local result = setmetatable(Point, {})
result.x = 0
print(result:getX())
result.y = 0
print(result:getY())
return result
end
```
the call to `result:getY()` uses the *current* type state of `result`, which is `{ x : number, @metatable PointMT }`.
Unification will then cause `Point` to consider `y` to be optional:
```lua
type PointMT = {
new : () -> Point
getX : (Point) -> number,
getY : (Point) -> number?,
abs : (Point) -> number -- with a type error!
}
type Point = {
x : number,
y : number?,
@metatable PointMT
}
```
Since `y` has type `number?` rather than `number`, the `abs` method will fail to type check.
As a workaround, developers can declare different self types for different methods:
```lua
function Point.getX(self : { x : number }) : number
return self.x
end
function Point.getY(self : { y : number }) : number
return self.y
end
```
resulting in:
```lua
type PointMT = {
new : () -> Point
getX : ({ x : number }) -> number,
getY : ({ y : number }) -> number,
abs : (Point) -> number -- without a type error!
}
type Point = {
x : number,
y : number,
@metatable PointMT
}
```
or can switch off type checking `self` by declaring `type Point:self = any`.
### Methods called on both tables and metatables.
This is a similar problem, caused by calling methods directly on
metatables as well as tables. For example calling `Point:abs()` will
result in inferring that both `x` and `y` are optional,
## Alternatives
There are the usual bike shedding opportunities for new syntax.
We could do nothing, but at a performance and ergonomics cost.
We could introduce special syntax for classes or records, though this doesn't address type checking current code.