mirror of
https://github.com/luau-lang/luau.git
synced 2025-04-04 10:50:54 +01:00
Create profile.md
Add profiler documentation.
This commit is contained in:
parent
e0c4f33217
commit
7c76a5a70a
1 changed files with 61 additions and 0 deletions
61
docs/_pages/profile.md
Normal file
61
docs/_pages/profile.md
Normal file
|
@ -0,0 +1,61 @@
|
|||
---
|
||||
permalink: /profile
|
||||
title: Profiling
|
||||
toc: true
|
||||
---
|
||||
|
||||
One of main goals of Luau is to enable high performance code. To help with that goal, we are relentlessly optimizing the compiler and runtime - but ultimately, performance of their
|
||||
code is in developers' hands, and is a combination of good algorithm design and implementation that adheres to the strengths of the language. To help write efficient code, Luau
|
||||
provides a built-in profiler that samples the execution of the program and outputs a profiler dump that can be converted to an interactive flamegraph.
|
||||
|
||||
To run the profiler, make sure you have an optimized build of the intepreter (otherwise profiling results are going to be very skewed) and run it with `--profile` argument:
|
||||
|
||||
```
|
||||
$ luau --profile tests/chess.lua
|
||||
OK 8902 rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
|
||||
OK 2039 r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 0
|
||||
OK 2812 8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 0
|
||||
OK 9467 r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1
|
||||
OK 1486 rnbq1k1r/pp1Pbppp/2p5/8/2B5/8/PPP1NnPP/RNBQK2R w KQ - 1 8
|
||||
OK 2079 r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10
|
||||
Profiler dump written to profile.out (total runtime 2.034 seconds, 20344 samples, 374 stacks)
|
||||
GC: 0.378 seconds (18.58%), mark 46.80%, remark 3.33%, atomic 1.93%, sweepstring 6.77%, sweep 41.16%
|
||||
```
|
||||
|
||||
The resulting `profile.out` file can be converted to an SVG file by running `perfgraph.py` script that is part of Luau repository:
|
||||
|
||||
```
|
||||
$ python tools/perfgraph.py profile.out >profile.svg
|
||||
```
|
||||
|
||||
This produces an SVG file that can be opened in a browser (the image below is clickable):
|
||||
|
||||
[](/assets/images/chess-profile.svg)
|
||||
|
||||
In a flame graph visualization, the individual bars represent function calls, the width represents how much of the total program runtime they execute, and the deepness of the
|
||||
red color represents how expensive they are based on their speed and how many times they are called: Dark Red bars are often the best targets for optimization, followed by
|
||||
Bright Red, Dark Orange, Light Orange, Dark Yellow, Light Yellow. This is a fantastic visualization technique that allows you to hone in on the specific bottlenecks affecting
|
||||
your program performance, optimize those exact bottlenecks, and then re-generate the profile data and visualizer, and look for the next set of true bottlenecks (if any).
|
||||
|
||||
Hovering your mouse cursor over individual sections will display detailed function information in the status bar and in a tooltip. If you want to Search for a specific named
|
||||
function, use the Search field in the upper right, or press Ctrl+F.
|
||||
|
||||
Notice that some of the bars in the screenshot don't have any text. In some cases, there isn't enough room in the size of the bar to display the name.
|
||||
You can hover your mouse over those bars to see the name and source location of the function in the tool tip, or double-click to zoom in on that part of the flame graph.
|
||||
|
||||
Some tooltips will have a source location for the function you're hovering over, but no name. Those are anonymous functions, or functions that were not declared in a way that
|
||||
allows Luau compiler to track the name. To fill in more names, you may want to make these changes to your code:
|
||||
|
||||
`local myFunc = function() --[[ work ]] end` -> `local function myFunc() --[[ work ]] end`
|
||||
|
||||
Even without these changes, you can hover over a given bar with no visible name and see it's source location.
|
||||
|
||||
As any sampling profiler, this profiler relies on gathering enough information for the resulting output to be statistically meaningful. It may miss short functions if they
|
||||
aren't called often enough. By default the profiler runs at 10 kHz, this can be customized by passing a different parameter to `--profile=`. Note that higher
|
||||
frequencies result in higher profiling overhead and longer program execution, potentially skewing the results.
|
||||
|
||||
This profiler doesn't track leaf C functions and instead attributes the time spent there to calling Luau functions. As a result, when thinking about why a given function is
|
||||
slow, consider not just the work it does immediately but also the library functions it calls.
|
||||
|
||||
This profiler tracks time consumed by Luau thread stacks; when a thread calls another thread via `coroutine.resume`, the time spent is not attributed to the parent thread that's
|
||||
waiting for resume results. This limitation will be removed in the future in the future.
|
Loading…
Add table
Reference in a new issue