mesh-normal-scalar correctly fills sequential values in the output for
triangle cone function, but mesh-normal-vector accidentally reuses the
loop index, which results in writes to every third index of the array
(1, 4, etc.).
This is both slower (as the table turns into a hash map), and incorrect,
especially as we have a scalar version of the benchmark that does the
right thing.
Note: there's a bunch of inefficiencies in the benchmark code that I
have not fixed (around field access mostly, e.g. writing to `v.n` and
then immediately reading it again). These are not ideal for performance,
but they can be valuable to keep as is because this redundancy is common
in real-world code, and it would be nice to see codegen optimizations
eliminating most of that overhead. This one, however, is a straight up
bug, and sparse arrays should not really be the thing this benchmark
Instead of doing the dot product related math in scalar IR, we lift the
computation into a dedicated IR instruction.
On x64, we can use VDPPS which was more or less tailor made for this
purpose. This is better than manual scalar lowering that requires
reloading components from memory; it's not always a strict improvement
over the shuffle+add version (which we never had), but this can now be
adjusted in the IR lowering in an optimal fashion (maybe even based on
CPU vendor, although that'd create issues for offline compilation).
On A64, we can either use naive adds or paired adds, as there is no
dedicated vector-wide horizontal instruction until SVE. Both run at
about the same performance on M2, but paired adds require fewer
instructions and temporaries.
I've measured this using mesh-normal-vector benchmark, changing the
benchmark to just report the time of the second loop inside
`calculate_normals`, testing master vs #1504 vs this PR, also increasing
the grid size to 400 for more stable timings.
On Zen 4 (7950X), this PR is comfortably ~8% faster vs master, while I
see neutral to negative results in #1504.
On M2 (base), this PR is ~28% faster vs master, while #1504 is only
about ~10% faster.
If I measure the second loop in `calculate_tangent_space` instead, I
On Zen 4 (7950X), this PR is ~12% faster vs master, while #1504 is ~3%
On M2 (base), this PR is ~24% faster vs master, while #1504 is only
about ~13% faster.
Note that the loops in question are not quite optimal, as they store and
reload various vectors to dictionary values due to inappropriate use of
locals. The underlying gains in individual functions are thus larger
than the numbers above; for example, changing the `calculate_normals`
loop to use a local variable to store the normalized vector (but still
saving the result to dictionary value), I get a ~24% performance
increase from this PR on Zen4 vs master instead of just 8% (#1504 is
~15% slower in this setup).
### What's New?
* Fragment Autocomplete: a new API allows for type checking a small
fragment of code against an existing file, significantly speeding up
autocomplete performance in large files.
### New Solver
* E-Graphs have landed: this is an ongoing approach to make the new type
solver simplify types in a more consistent and principled manner, based
on similar work (see:
* Adds support for exporting / local user type functions (previously
they were always exported).
* Fixes a set of bugs in which the new solver will fail to complete
inference for simple expressions with just literals and operators.
### General Updates
* Requiring a path with a ".lua" or ".luau" extension will now have a
bespoke error suggesting to remove said extension.
* Fixes a bug in which whether two `Luau::Symbol`s are equal depends on
whether the new solver is enabled.
Internal Contributors:
Co-authored-by: Aaron Weiss <>
Co-authored-by: Andy Friesen <>
Co-authored-by: David Cope <>
Co-authored-by: Hunter Goldstein <>
Co-authored-by: Varun Saini <>
Co-authored-by: Vighnesh Vijay <>
Co-authored-by: Vyacheslav Egorov <>
> What's new?
* Fragment Autocomplete: a new API allows for type checking a small
fragment of code against an existing file, significantly speeding up
autocomplete performance in large files.
> New Solver
* E-Graphs have landed: this is an ongoing approach to make the new type solver
simplify types in a more consistent and principled manner, based on
similar work (e.g.:
* Adds support for exported / local user type functions.
* Fixes a set of bugs in which the new solver will fail to complete
inference for simple expressions with just literals and operators.
> General
* It is now an explicit runtime error to `require` a path with a ".lua" or
".luau" extension, and the error message will suggest removing the extension.
* Fixes a bug in which whether two `Symbol`s are equal depends on
whether the new solver is enabled.
Tested and working with the test case in the aforementioned issue, along
with the full defs of luau-lsp with no issues or type errors
In normal Luau files, you can use type aliases and type functions before
they are declared. The same extends to declaration files, **except** in
the new solver. The old solver perfectly allows this, and in fact
intentionally adds it:
db809395bf/Analysis/src/TypeInfer.cpp (L1711-L1717)
This causes *much* headache and pain for external projects that make use
of declaration files; namely, luau-lsp generates them from MaximumADHD's
API dump, which is not ordered by dependency. This means silent
error-types popping up everywhere because types are used before they are
declared. The workaround would be to make code to manually reorder class
definitions based on their dependencies with a bunch of code, but this
is clearly not ideal, and won't work for classes dependent on each
The solution used here is the same as is used for type aliases - the
name binding for the class is given a blocked type before running the
rest of constraint generation on the block. Questions remain:
- Should the logic be split off of `checkAliases`?
- Should a bound type be used, or should the (blocked) binding type be
directly emplaced with the class type? What are the ramifications of
emplacing with the bound versus the raw type? One ramification was
initially ran into through an assertion because the class
`superTy`/`parent` was bound, and several pieces of code assume it is
not, so it had to be made followed.
- Is folllowing `superTy` to set `parent` the correct workaround for the
assertions thrown, or should the code expecting `parent` to be a
ClassType without following it be modified instead to follow `parent`?
- Should `scope->privateTypeBindings` also be checked for the duplicate
error? I would presume so, since having a class with the same name as a
private alias or type function should error as well?
The extraneous whitespace changes are clang-format ones done
automatically that should've been done in the last release - I can
remove them if necessary and let another sync or OSS cleanup commit fix
Brings behavior to parity with the old solver by filling in
definitionLocation and definitionModuleName for Luau-consuming
programs/libraries to use.
* New `vector` library! See
for details
* Replace the use of non-portable `strnlen` with `memchr`. `strnlen` is
not part of any C or C++ standard.
* Introduce `lua_newuserdatataggedwithmetatable` for faster tagged
userdata creation of userdata with metatables registered with
Old Solver
* It used to be the case that a module's result type would
unconditionally be inferred to be `any` if it imported any module that
participates in any import cycle. This is now fixed.
New Solver
* Improve inference of `table.freeze`: We now infer read-only properties
on tables after they have been frozen.
* We now correctly flag cases where `string.format` is called with 0
* Fix a bug in user-defined type functions where table properties could
be lost if the table had a metatable
* Reset the random number seed for each evaluation of a type function
* We now retry subtyping arguments if it failed due to hidden variadics.
Co-authored-by: Aaron Weiss <>
Co-authored-by: Alexander McCord <>
Co-authored-by: Vighnesh <>
Co-authored-by: Aviral Goel <>
Co-authored-by: David Cope <>
Co-authored-by: Lily Brown <>
Co-authored-by: Vyacheslav Egorov <>
Co-authored-by: Junseo Yoo <>
## What's new
* Added `` function to the standard library, based on
* `FileResolver` can provide an implementation of
`getRequireSuggestions` to provide auto-complete suggestions for
## New Solver
* In user-defined type functions, `readproperty` and `writeproperty`
will return `nil` instead of erroring if property is not found
* Fixed incorrect scope of variadic arguments in the data-flow graph
* Fixed multiple assertion failures
Internal Contributors:
Co-authored-by: Aaron Weiss <>
Co-authored-by: Hunter Goldstein <>
Co-authored-by: Varun Saini <>
Co-authored-by: Vighnesh Vijay <>
Co-authored-by: Vyacheslav Egorov <>
# General Updates
Fix an old solver crash that occurs in the presence of cyclic
## New Solver
- Improvements to Luau user-defined type function library
- Avoid asserting on unexpected metatable types
- Properties in user defined type functions should have a consistent
iteration order - in this case it is insertion ordering
# Runtime
- Track VM allocations for telemetry
Co-authored-by: Aaron Weiss <>
Co-authored-by: Andy Friesen <>
Co-authored-by: Hunter Goldstein <>
Co-authored-by: James McNellis <>
Co-authored-by: Varun Saini <>
Co-authored-by: Vighnesh Vijay <>
Co-authored-by: Vyacheslav Egorov <>
Co-authored-by: Aaron Weiss <>
Co-authored-by: Alexander McCord <>
Co-authored-by: Andy Friesen <>
Co-authored-by: Aviral Goel <>
Co-authored-by: David Cope <>
Co-authored-by: Lily Brown <>
Co-authored-by: Vyacheslav Egorov <>
Co-authored-by: Junseo Yoo <>