Add back thread identity section as thread userdata instead

Thread userdata shouldn't be overlooked - even though security contexts are
Roblox-specific, the thread userdata can still be used to implement a
permissions model for sandboxing, whether this is based on security contexts
(like Roblox, or like Ring 0/3), on capabilities, or maybe even something else.
I think it is worth it to document this in the sandboxing page.

For everyone's future reference:

* Every lua_State has an associated userdata pointer at the end. It's just a
  pointer at the end called userdata:

    void* userdata;

  This pointer is never accessed or set by Luau itself, except for when
  first initializing the state, where it is set to NULL.

  This userdata can be set by the host after creating the thread, and is 100%
  invisible to Luau code. C functions exposed by the host can access the
  userdata by L->userdata, and, safe in the knowledge that it cannot be tampered
  with, use it to validate the thread.

* The global_State (the VM, as opposed to just the main thread) contains a field
  called cb:

    lua_Callbacks cb;

  The struct is currently defined as:

    /* Callbacks that can be used to reconfigure behavior of the VM dynamically.
     * These are shared between all coroutines.
     *
     * Note: interrupt is safe to set from an arbitrary thread but all other
     * callbacks can only be changed when the VM is not running any code */
    struct lua_Callbacks
    {
        /* gets called at safepoints (loop back edges, call/ret, gc) if set */
        void (*interrupt)(lua_State* L, int gc);

        /* gets called when an unprotected error is raised (if longjmp is
         * used) */
        void (*panic)(lua_State* L, int errcode);


        /* gets called when L is created (LP == parent) or destroyed (LP ==
         * NULL) */
        void (*userthread)(lua_State* LP, lua_State* L);

        /* gets called when a string is created; returned atom can be retrieved
         * via tostringatom */
        int16_t (*useratom)(const char* s, size_t l);


        /* gets called when BREAK instruction is encountered */
        void (*debugbreak)(lua_State* L, lua_Debug* ar);

        /* gets called after each instruction in single step mode */
        void (*debugstep)(lua_State* L, lua_Debug* ar);

        /* gets called when thread execution is interrupted by break in another
         * thread */
        void (*debuginterrupt)(lua_State* L, lua_Debug* ar);

        /* gets called when protected call results in an error */
        void (*debugprotectederror)(lua_State* L);
    };

  Assuming you cache the global_State when creating the main thread (which you
  should - it's just L->global), you can set g->cb->userthread to a C function
  to define what should happen when a new thread is created or destroyed.

  This can be used to:
  * Inherit userdata from parent threads; great for permission models.
  * Run destructors on userdata when threads are collected; great for resources.

That is a lot of information to digest, but if a Luau dev reads this, they'll
probably have said "yep, exactly" at least once. Documenting this small aspect
of the VM will be another step towards making it a bit friendlier for people to
start using Luau.

Obviously, the more technical documentation will remain here (in this commit &
pull request), but even the public facing documentation (which developers, like
me, will reference) should contain information just like the thread identity
section that was removed.

Hopefully, this is a suitable replacement, as thread userdata IS available in
the open-source version of Luau - and is used by Roblox for its "extra space" to
store things like thread identity.
This commit is contained in:
LoganDark 2021-11-07 07:20:42 -08:00 committed by GitHub
parent c6de3bd2e4
commit 4373f65fbd
Signed by: DevComp
GPG key ID: 4AEE18F83AFDEB23

View file

@ -46,6 +46,16 @@ This is using the VM feature that is not accessible from scripts, that prevents
By itself this would mean that code that runs in Luau can't use globals at all, since assigning globals would fail. While this is feasible, in Roblox we solve this by creating a new global table for each script, that uses `__index` to point to the builtin global table. This safely sandboxes the builtin globals while still allowing writing globals from each script. This also means that short of exposing special shared globals from the host, all scripts are isolated from each other.
## Thread userdata
Environment-level sandboxing is sufficient to implement separation between trusted code and untrusted code, assuming that `getfenv`/`setfenv` are either unavailable (removed from the globals), or that trusted code never interfaces with untrusted code (which prevents untrusted code from ever getting access to trusted functions). When running trusted code, it's possible to inject extra globals from the host into that global table, providing access to special APIs.
However, in some cases it's desirable to restrict access to functions that are exposed both to trusted and untrusted code. For example, both may have access to `game` global, but `game` may expose methods that should only work from trusted code.
To help with this, each thread in Luau has a userdata pointer, which can only be set or accessed by the host. This pointer can identify the calling thread in a unique way, and also store the thread's permissions. Trusted functions can read the userdata to validate the permissions of the calling thread and allow or deny access accordingly. The host can also set up a callback for newly created threads to inherit their userdata from the parent thread. This makes it possible to provide APIs to trusted code while limiting the access from untrusted code.
For example, Roblox uses thread userdata to assign a security identity to each thread, which is a numerical constant that defines which types of trusted functions the thread is allowed to perform. Untrusted code receives an untrusted context, trusted code receives a trusted context, and trusted functions verify the identity of the calling thread by checking the userdata. Other permission models are possible as well, due to the flexibility of the API (userdata can be anything).
## `__gc`
Lua 5.1 exposes a `__gc` metamethod for userdata, which can be used on proxies (`newproxy`) to hook into garbage collector. Later versions of Lua extend this mechanism to work on tables.