luau/CodeGen/include/Luau/AssemblyBuilderA64.h

275 lines
11 KiB
C
Raw Normal View History

// This file is part of the Luau programming language and is licensed under MIT License; see LICENSE.txt for details
#pragma once
#include "Luau/RegisterA64.h"
#include "Luau/AddressA64.h"
#include "Luau/ConditionA64.h"
#include "Luau/Label.h"
#include <string>
#include <vector>
namespace Luau
{
namespace CodeGen
{
Sync to upstream/release/566 (#853) * Fixed incorrect lexeme generated for string parts in the middle of an interpolated string (Fixes https://github.com/Roblox/luau/issues/744) * DeprecatedApi lint can report some issues without type inference information * Fixed performance of autocomplete requests when suggestions have large intersection types (Solves https://github.com/Roblox/luau/discussions/847) * Marked `table.getn`/`foreach`/`foreachi` as deprecated ([RFC: Deprecate table.getn/foreach/foreachi](https://github.com/Roblox/luau/blob/master/rfcs/deprecate-table-getn-foreach.md)) * With -O2 optimization level, we now optimize builtin calls based on known argument/return count. Note that this change can be observable if `getfenv/setfenv` is used to substitute a builtin, especially if arity is different. Fastcall heavy tests show a 1-2% improvement. * Luau can now be built with clang-cl (Fixes https://github.com/Roblox/luau/issues/736) We also made many improvements to our experimental components. For our new type solver: * Overhauled data flow analysis system, fixed issues with 'repeat' loops, global variables and type annotations * Type refinements now work on generic table indexing with a string literal * Type refinements will properly track potentially 'nil' values (like t[x] for a missing key) and their further refinements * Internal top table type is now isomorphic to `{}` which fixes issues when `typeof(v) == 'table'` type refinement is handled * References to non-existent types in type annotations no longer resolve to 'error' type like in old solver * Improved handling of class unions in property access expressions * Fixed default type packs * Unsealed tables can now have metatables * Restored expected types for function arguments And for native code generation: * Added min and max IR instructions mapping to vminsd/vmaxsd on x64 * We now speculatively extract direct execution fast-paths based on expected types of expressions which provides better optimization opportunities inside a single basic block * Translated existing math fastcalls to IR form to improve tag guard removal and constant propagation
2023-03-03 20:21:14 +00:00
namespace A64
{
enum FeaturesA64
{
Feature_JSCVT = 1 << 0,
};
class AssemblyBuilderA64
{
public:
explicit AssemblyBuilderA64(bool logText, unsigned int features = 0);
~AssemblyBuilderA64();
// Moves
void mov(RegisterA64 dst, RegisterA64 src);
void mov(RegisterA64 dst, int src); // macro
// Moves of 32-bit immediates get decomposed into one or more of these
void movz(RegisterA64 dst, uint16_t src, int shift = 0);
void movn(RegisterA64 dst, uint16_t src, int shift = 0);
void movk(RegisterA64 dst, uint16_t src, int shift = 0);
// Arithmetics
void add(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, int shift = 0);
void add(RegisterA64 dst, RegisterA64 src1, uint16_t src2);
void sub(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, int shift = 0);
void sub(RegisterA64 dst, RegisterA64 src1, uint16_t src2);
void neg(RegisterA64 dst, RegisterA64 src);
// Comparisons
// Note: some arithmetic instructions also have versions that update flags (ADDS etc) but we aren't using them atm
void cmp(RegisterA64 src1, RegisterA64 src2);
void cmp(RegisterA64 src1, uint16_t src2);
void csel(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, ConditionA64 cond);
Sync to upstream/release/572 (#899) * Fixed exported types not being suggested in autocomplete * `T...` is now convertible to `...any` (Fixes https://github.com/Roblox/luau/issues/767) * Fixed issue with `T?` not being convertible to `T | T` or `T?` (sometimes when internal pointer identity is different) * Fixed potential crash in missing table key error suggestion to use a similar existing key * `lua_topointer` now returns a pointer for strings C++ API Changes: * `prepareModuleScope` callback has moved from TypeChecker to Frontend * For LSPs, AstQuery functions (and `isWithinComment`) can be used without full Frontend data A lot of changes in our two experimental components as well. In our work on the new type-solver, the following issues were fixed: * Fixed table union and intersection indexing * Correct custom type environments are now used * Fixed issue with values of `free & number` type not accepted in numeric operations And these are the changes in native code generation (JIT): * arm64 lowering is almost complete with support for 99% of IR commands and all fastcalls * Fixed x64 assembly encoding for extended byte registers * More external x64 calls are aware of register allocator * `math.min`/`math.max` with more than 2 arguments are now lowered to IR as well * Fixed correctness issues with `math` library calls with multiple results in variadic context and with x64 register conflicts * x64 register allocator learnt to restore values from VM memory instead of always using stack spills * x64 exception unwind information now supports multiple functions and fixes function start offset in Dwarf2 info
2023-04-14 19:06:22 +01:00
void cset(RegisterA64 dst, ConditionA64 cond);
// Bitwise
void and_(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, int shift = 0);
void orr(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, int shift = 0);
void eor(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, int shift = 0);
void bic(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, int shift = 0);
void tst(RegisterA64 src1, RegisterA64 src2, int shift = 0);
void mvn(RegisterA64 dst, RegisterA64 src);
Sync to upstream/release/572 (#899) * Fixed exported types not being suggested in autocomplete * `T...` is now convertible to `...any` (Fixes https://github.com/Roblox/luau/issues/767) * Fixed issue with `T?` not being convertible to `T | T` or `T?` (sometimes when internal pointer identity is different) * Fixed potential crash in missing table key error suggestion to use a similar existing key * `lua_topointer` now returns a pointer for strings C++ API Changes: * `prepareModuleScope` callback has moved from TypeChecker to Frontend * For LSPs, AstQuery functions (and `isWithinComment`) can be used without full Frontend data A lot of changes in our two experimental components as well. In our work on the new type-solver, the following issues were fixed: * Fixed table union and intersection indexing * Correct custom type environments are now used * Fixed issue with values of `free & number` type not accepted in numeric operations And these are the changes in native code generation (JIT): * arm64 lowering is almost complete with support for 99% of IR commands and all fastcalls * Fixed x64 assembly encoding for extended byte registers * More external x64 calls are aware of register allocator * `math.min`/`math.max` with more than 2 arguments are now lowered to IR as well * Fixed correctness issues with `math` library calls with multiple results in variadic context and with x64 register conflicts * x64 register allocator learnt to restore values from VM memory instead of always using stack spills * x64 exception unwind information now supports multiple functions and fixes function start offset in Dwarf2 info
2023-04-14 19:06:22 +01:00
// Bitwise with immediate
// Note: immediate must have a single contiguous sequence of 1 bits set of length 1..31
void and_(RegisterA64 dst, RegisterA64 src1, uint32_t src2);
void orr(RegisterA64 dst, RegisterA64 src1, uint32_t src2);
void eor(RegisterA64 dst, RegisterA64 src1, uint32_t src2);
void tst(RegisterA64 src1, uint32_t src2);
// Shifts
void lsl(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
void lsr(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
void asr(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
void ror(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
void clz(RegisterA64 dst, RegisterA64 src);
void rbit(RegisterA64 dst, RegisterA64 src);
// Shifts with immediates
// Note: immediate value must be in [0, 31] or [0, 63] range based on register type
void lsl(RegisterA64 dst, RegisterA64 src1, uint8_t src2);
void lsr(RegisterA64 dst, RegisterA64 src1, uint8_t src2);
void asr(RegisterA64 dst, RegisterA64 src1, uint8_t src2);
void ror(RegisterA64 dst, RegisterA64 src1, uint8_t src2);
// Load
// Note: paired loads are currently omitted for simplicity
void ldr(RegisterA64 dst, AddressA64 src);
void ldrb(RegisterA64 dst, AddressA64 src);
void ldrh(RegisterA64 dst, AddressA64 src);
void ldrsb(RegisterA64 dst, AddressA64 src);
void ldrsh(RegisterA64 dst, AddressA64 src);
void ldrsw(RegisterA64 dst, AddressA64 src);
void ldp(RegisterA64 dst1, RegisterA64 dst2, AddressA64 src);
// Store
void str(RegisterA64 src, AddressA64 dst);
void strb(RegisterA64 src, AddressA64 dst);
void strh(RegisterA64 src, AddressA64 dst);
void stp(RegisterA64 src1, RegisterA64 src2, AddressA64 dst);
// Control flow
void b(Label& label);
void bl(Label& label);
void br(RegisterA64 src);
void blr(RegisterA64 src);
void ret();
// Conditional control flow
void b(ConditionA64 cond, Label& label);
void cbz(RegisterA64 src, Label& label);
void cbnz(RegisterA64 src, Label& label);
void tbz(RegisterA64 src, uint8_t bit, Label& label);
void tbnz(RegisterA64 src, uint8_t bit, Label& label);
// Address of embedded data
void adr(RegisterA64 dst, const void* ptr, size_t size);
void adr(RegisterA64 dst, uint64_t value);
void adr(RegisterA64 dst, double value);
// Address of code (label)
void adr(RegisterA64 dst, Label& label);
// Floating-point scalar moves
// Note: constant must be compatible with immediate floating point moves (see isFmovSupported)
void fmov(RegisterA64 dst, RegisterA64 src);
void fmov(RegisterA64 dst, double src);
// Floating-point scalar math
void fabs(RegisterA64 dst, RegisterA64 src);
void fadd(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
void fdiv(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
void fmul(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
void fneg(RegisterA64 dst, RegisterA64 src);
void fsqrt(RegisterA64 dst, RegisterA64 src);
void fsub(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2);
// Floating-point rounding and conversions
void frinta(RegisterA64 dst, RegisterA64 src);
void frintm(RegisterA64 dst, RegisterA64 src);
void frintp(RegisterA64 dst, RegisterA64 src);
void fcvt(RegisterA64 dst, RegisterA64 src);
void fcvtzs(RegisterA64 dst, RegisterA64 src);
void fcvtzu(RegisterA64 dst, RegisterA64 src);
void scvtf(RegisterA64 dst, RegisterA64 src);
void ucvtf(RegisterA64 dst, RegisterA64 src);
// Floating-point conversion to integer using JS rules (wrap around 2^32) and set Z flag
// note: this is part of ARM8.3 (JSCVT feature); support of this instruction needs to be checked at runtime
void fjcvtzs(RegisterA64 dst, RegisterA64 src);
// Floating-point comparisons
void fcmp(RegisterA64 src1, RegisterA64 src2);
void fcmpz(RegisterA64 src);
void fcsel(RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, ConditionA64 cond);
// Run final checks
bool finalize();
// Places a label at current location and returns it
Label setLabel();
// Assigns label position to the current location
void setLabel(Label& label);
Sync to upstream/release/568 (#865) * A small subset of control-flow refinements have been added to recognize type options that are unreachable after a conditional/unconditional code block. (Fixes https://github.com/Roblox/luau/issues/356). Some examples: ```lua local function f(x: string?) if not x then return end -- x is 'string' here end ``` Throwing calls like `error` or `assert(false)` instead of 'return' are also recognized. Existing complex refinements like type/typeof and tagged union checks are expected to work, among others. To enable this feature, `LuauTinyControlFlowAnalysis` exclusion has to be removed from `ExperimentalFlags.h`. If will become enabled unconditionally in the near future. * Linter has been integrated into the typechecker analysis so that type-aware lint warnings can work in any mode `Frontend::lint` methods were deprecated, `Frontend::check` has to be used instead with `runLintChecks` option set. Resulting lint warning are located inside `CheckResult`. * Fixed large performance drop and increased memory consumption when array is filled at an offset (Fixes https://github.com/Roblox/luau/issues/590) * Part of [Type error suppression RFC](https://github.com/Roblox/luau/blob/master/rfcs/type-error-suppression.md) was implemented making subtyping checks with `any` type transitive. --- In our work on the new type-solver: * `--!nocheck` mode no longer reports type errors * New solver will not be used for `--!nonstrict` modules until all issues with strict mode typechecking are fixed * Added control-flow aware type refinements mentioned earlier In native code generation: * `LOP_NAMECALL` has been translated to IR * `type` and `typeof` builtin fastcalls have been translated to IR/assembly * Additional steps were taken towards arm64 support
2023-03-17 19:20:37 +00:00
// Extracts code offset (in bytes) from label
uint32_t getLabelOffset(const Label& label)
{
LUAU_ASSERT(label.location != ~0u);
return label.location * 4;
}
void logAppend(const char* fmt, ...) LUAU_PRINTF_ATTR(2, 3);
uint32_t getCodeSize() const;
// Resulting data and code that need to be copied over one after the other
// The *end* of 'data' has to be aligned to 16 bytes, this will also align 'code'
std::vector<uint8_t> data;
std::vector<uint32_t> code;
std::string text;
const bool logText = false;
const unsigned int features = 0;
// Maximum immediate argument to functions like add/sub/cmp
static constexpr size_t kMaxImmediate = (1 << 12) - 1;
// Check if immediate mode mask is supported for bitwise operations (and/or/xor)
static bool isMaskSupported(uint32_t mask);
// Check if fmov can be used to synthesize a constant
static bool isFmovSupported(double value);
private:
// Instruction archetypes
void place0(const char* name, uint32_t word);
Sync to upstream/release/572 (#899) * Fixed exported types not being suggested in autocomplete * `T...` is now convertible to `...any` (Fixes https://github.com/Roblox/luau/issues/767) * Fixed issue with `T?` not being convertible to `T | T` or `T?` (sometimes when internal pointer identity is different) * Fixed potential crash in missing table key error suggestion to use a similar existing key * `lua_topointer` now returns a pointer for strings C++ API Changes: * `prepareModuleScope` callback has moved from TypeChecker to Frontend * For LSPs, AstQuery functions (and `isWithinComment`) can be used without full Frontend data A lot of changes in our two experimental components as well. In our work on the new type-solver, the following issues were fixed: * Fixed table union and intersection indexing * Correct custom type environments are now used * Fixed issue with values of `free & number` type not accepted in numeric operations And these are the changes in native code generation (JIT): * arm64 lowering is almost complete with support for 99% of IR commands and all fastcalls * Fixed x64 assembly encoding for extended byte registers * More external x64 calls are aware of register allocator * `math.min`/`math.max` with more than 2 arguments are now lowered to IR as well * Fixed correctness issues with `math` library calls with multiple results in variadic context and with x64 register conflicts * x64 register allocator learnt to restore values from VM memory instead of always using stack spills * x64 exception unwind information now supports multiple functions and fixes function start offset in Dwarf2 info
2023-04-14 19:06:22 +01:00
void placeSR3(const char* name, RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, uint8_t op, int shift = 0, int N = 0);
void placeSR2(const char* name, RegisterA64 dst, RegisterA64 src, uint8_t op, uint8_t op2 = 0);
void placeR3(const char* name, RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, uint8_t op, uint8_t op2);
void placeR1(const char* name, RegisterA64 dst, RegisterA64 src, uint32_t op);
void placeI12(const char* name, RegisterA64 dst, RegisterA64 src1, int src2, uint8_t op);
void placeI16(const char* name, RegisterA64 dst, int src, uint8_t op, int shift = 0);
void placeA(const char* name, RegisterA64 dst, AddressA64 src, uint8_t op, uint8_t size, int sizelog);
void placeB(const char* name, Label& label, uint8_t op);
void placeBC(const char* name, Label& label, uint8_t op, uint8_t cond);
void placeBCR(const char* name, Label& label, uint8_t op, RegisterA64 cond);
void placeBR(const char* name, RegisterA64 src, uint32_t op);
void placeBTR(const char* name, Label& label, uint8_t op, RegisterA64 cond, uint8_t bit);
void placeADR(const char* name, RegisterA64 src, uint8_t op);
void placeADR(const char* name, RegisterA64 src, uint8_t op, Label& label);
void placeP(const char* name, RegisterA64 dst1, RegisterA64 dst2, AddressA64 src, uint8_t op, uint8_t opc, int sizelog);
Sync to upstream/release/572 (#899) * Fixed exported types not being suggested in autocomplete * `T...` is now convertible to `...any` (Fixes https://github.com/Roblox/luau/issues/767) * Fixed issue with `T?` not being convertible to `T | T` or `T?` (sometimes when internal pointer identity is different) * Fixed potential crash in missing table key error suggestion to use a similar existing key * `lua_topointer` now returns a pointer for strings C++ API Changes: * `prepareModuleScope` callback has moved from TypeChecker to Frontend * For LSPs, AstQuery functions (and `isWithinComment`) can be used without full Frontend data A lot of changes in our two experimental components as well. In our work on the new type-solver, the following issues were fixed: * Fixed table union and intersection indexing * Correct custom type environments are now used * Fixed issue with values of `free & number` type not accepted in numeric operations And these are the changes in native code generation (JIT): * arm64 lowering is almost complete with support for 99% of IR commands and all fastcalls * Fixed x64 assembly encoding for extended byte registers * More external x64 calls are aware of register allocator * `math.min`/`math.max` with more than 2 arguments are now lowered to IR as well * Fixed correctness issues with `math` library calls with multiple results in variadic context and with x64 register conflicts * x64 register allocator learnt to restore values from VM memory instead of always using stack spills * x64 exception unwind information now supports multiple functions and fixes function start offset in Dwarf2 info
2023-04-14 19:06:22 +01:00
void placeCS(const char* name, RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, ConditionA64 cond, uint8_t op, uint8_t opc, int invert = 0);
void placeFCMP(const char* name, RegisterA64 src1, RegisterA64 src2, uint8_t op, uint8_t opc);
void placeFMOV(const char* name, RegisterA64 dst, double src, uint32_t op);
Sync to upstream/release/572 (#899) * Fixed exported types not being suggested in autocomplete * `T...` is now convertible to `...any` (Fixes https://github.com/Roblox/luau/issues/767) * Fixed issue with `T?` not being convertible to `T | T` or `T?` (sometimes when internal pointer identity is different) * Fixed potential crash in missing table key error suggestion to use a similar existing key * `lua_topointer` now returns a pointer for strings C++ API Changes: * `prepareModuleScope` callback has moved from TypeChecker to Frontend * For LSPs, AstQuery functions (and `isWithinComment`) can be used without full Frontend data A lot of changes in our two experimental components as well. In our work on the new type-solver, the following issues were fixed: * Fixed table union and intersection indexing * Correct custom type environments are now used * Fixed issue with values of `free & number` type not accepted in numeric operations And these are the changes in native code generation (JIT): * arm64 lowering is almost complete with support for 99% of IR commands and all fastcalls * Fixed x64 assembly encoding for extended byte registers * More external x64 calls are aware of register allocator * `math.min`/`math.max` with more than 2 arguments are now lowered to IR as well * Fixed correctness issues with `math` library calls with multiple results in variadic context and with x64 register conflicts * x64 register allocator learnt to restore values from VM memory instead of always using stack spills * x64 exception unwind information now supports multiple functions and fixes function start offset in Dwarf2 info
2023-04-14 19:06:22 +01:00
void placeBM(const char* name, RegisterA64 dst, RegisterA64 src1, uint32_t src2, uint8_t op);
void placeBFM(const char* name, RegisterA64 dst, RegisterA64 src1, uint8_t src2, uint8_t op, int immr, int imms);
void place(uint32_t word);
struct Patch
{
enum Kind
{
Imm26,
Imm19,
Imm14,
};
Kind kind : 2;
uint32_t label : 30;
uint32_t location;
};
void patchLabel(Label& label, Patch::Kind kind);
void patchOffset(uint32_t location, int value, Patch::Kind kind);
void commit();
LUAU_NOINLINE void extend();
// Data
size_t allocateData(size_t size, size_t align);
// Logging of assembly in text form
LUAU_NOINLINE void log(const char* opcode);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, int shift = 0);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst, RegisterA64 src1, int src2);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst, RegisterA64 src);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst, int src, int shift = 0);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst, double src);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst, AddressA64 src);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst1, RegisterA64 dst2, AddressA64 src);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 src, Label label, int imm = -1);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 src);
LUAU_NOINLINE void log(const char* opcode, Label label);
LUAU_NOINLINE void log(const char* opcode, RegisterA64 dst, RegisterA64 src1, RegisterA64 src2, ConditionA64 cond);
LUAU_NOINLINE void log(Label label);
LUAU_NOINLINE void log(RegisterA64 reg);
LUAU_NOINLINE void log(AddressA64 addr);
uint32_t nextLabel = 1;
std::vector<Patch> pendingLabels;
std::vector<uint32_t> labelLocations;
bool finalized = false;
bool overflowed = false;
size_t dataPos = 0;
uint32_t* codePos = nullptr;
uint32_t* codeEnd = nullptr;
};
Sync to upstream/release/566 (#853) * Fixed incorrect lexeme generated for string parts in the middle of an interpolated string (Fixes https://github.com/Roblox/luau/issues/744) * DeprecatedApi lint can report some issues without type inference information * Fixed performance of autocomplete requests when suggestions have large intersection types (Solves https://github.com/Roblox/luau/discussions/847) * Marked `table.getn`/`foreach`/`foreachi` as deprecated ([RFC: Deprecate table.getn/foreach/foreachi](https://github.com/Roblox/luau/blob/master/rfcs/deprecate-table-getn-foreach.md)) * With -O2 optimization level, we now optimize builtin calls based on known argument/return count. Note that this change can be observable if `getfenv/setfenv` is used to substitute a builtin, especially if arity is different. Fastcall heavy tests show a 1-2% improvement. * Luau can now be built with clang-cl (Fixes https://github.com/Roblox/luau/issues/736) We also made many improvements to our experimental components. For our new type solver: * Overhauled data flow analysis system, fixed issues with 'repeat' loops, global variables and type annotations * Type refinements now work on generic table indexing with a string literal * Type refinements will properly track potentially 'nil' values (like t[x] for a missing key) and their further refinements * Internal top table type is now isomorphic to `{}` which fixes issues when `typeof(v) == 'table'` type refinement is handled * References to non-existent types in type annotations no longer resolve to 'error' type like in old solver * Improved handling of class unions in property access expressions * Fixed default type packs * Unsealed tables can now have metatables * Restored expected types for function arguments And for native code generation: * Added min and max IR instructions mapping to vminsd/vmaxsd on x64 * We now speculatively extract direct execution fast-paths based on expected types of expressions which provides better optimization opportunities inside a single basic block * Translated existing math fastcalls to IR form to improve tag guard removal and constant propagation
2023-03-03 20:21:14 +00:00
} // namespace A64
} // namespace CodeGen
} // namespace Luau