Skip to content

Latest commit

 

History

History
324 lines (252 loc) · 13.9 KB

TODO.md

File metadata and controls

324 lines (252 loc) · 13.9 KB

TODO

  • BAD Bytecode VM should check whether we have the right values on the stack. Can be removed in release mode.

  • IDEA Make structs non-gc'd and make object a GC'd equivalent? Objects can be used to handle by-ref situations but structs can be used in most cases

    • Objects can also perhaps implement interface type stuff as below
  • IDEA Rather than having arrays or indexing syntax built in, maybe just have a primitive "slice" type? Pretty much the Zig approach

    • A slice + encoding/iterator covers like 90% of cases
    • If we want these things to be efficiently encoded, we should just start having multiple values on the stack
    • A slice is just a length int value + a TINY_VAL_SLICE_PTR so we don't have to box every slice and we also don't have to blow up the size of every Tiny_Value
    • But if we're gonna support unboxed structs and stuff, then we might as well use an OCaml-like value representation (use a high bit to represent whether anything is an Object* vs a primitive)
      • There are some complications with this model; can no longer have true 64-bit ints without boxing
      • Might also be troublesome for embedded systems where we'd want integer type not to be the same width as pointer type
        • Oh but actually Tiny_Value is already always at least pointer width
    • We can support unboxed arrays via the extern system
  • IDEA First class namespaces/modules would be pretty sick; easy way to soup up the macro system by making them truly hygenic

    • Basically like OCaml, but I believe we can make the language simpler somehow using them
    • Types can be namespaces? Or namespaces can have types?
    • Basically formalize the "protocol" thing we've been doing with the indexing syntax
  • IDEA What if all struct are structurally typed? Assuming structs

  • IDEA Since it's become such a common pattern to call methods like arr->aint_len(), maybe a shorthand like arr:len() which just compiles to {type(arr)}_len() would simplify a lot of code

  • BAD No way to signal errors. Would be nice if we had error in addition to options like int!

  • REFACTOR Unify symbol types for foreign and regular functions. Both of them have indices, argument types, and return types. The only difference is ellipsis and the way that the compiled code interacts with them

    • Actuallyyyyy, this is not that straightforward; foreign functions just track tags for their args, no names, for example
    • Could do with some simplifying helpers though
  • BAD No for..in type thing

    • Another protocol thing? Just anything that has a _len and a _get_index would work for now
    • Could also do a true "iterator" thing where it expects a _iter, _iter_next to be defined and goes from there?
      • 90% of use cases should work with just an integer IMO; the iterator case should probably be explicit
      • Nullable types could make it easier to do the _iter_next thing
      • I could have an unboxed integer iterator easily by just stuffing it into a pointer (light native) or even literally just having it be an integer at runtime (since Tiny_Value technically boxes it anyways)
  • BAD No function pointers

    • Allow specifying types like func(int, int) void which just get turned into func__int__int__void in line with the simple flat type system
    • You can then use functions like values
    • No closures for now; this means that function pointers can be unboxed
    • Only non-ellipsis foreign functions can be used as values
    • We can add closures on top of this easily later
      • How would the upvalues work? In some other languages, all upvalues work as though they were created in the same scope as the closure, so you can have things like a counter in the outer scope
      • Golang deals with this by allocating the upvalues on the heap; we may have to box upvalues too to make this work
      • Could stuff all upvalues into a single (anonymous?) struct and then have the function receive that as the first parameter
  • BAD All types are nullable?

  • BAD No debugger of any kind

  • BAD No language server

    • Thinking I'll just write this one in Python (I know) and bind to Tiny using (autogenerated) ctypes
    • There are a lot of reasonable libraries for LSP in python and as a first pass this is fine
    • A lot of functions come from C bindings so we need some way to bring those in easily
      • Could have a function which dumps bound functions/symbols to a buffer which the language server can load up and then provide completions for
        • This doesn't work with macros, which are the primary mechanism for e.g. generic data structures
      • Maybe the people embedding tiny are responsible for compiling a DLL that has a CreateLanguageServerState function exposed. Our python language server loads this DLL, calls that function, and uses the resulting state to figure out what functions/types are exposed.
        • I feel like this is the most correct approach; macros can do a lot of stuff (introduce new types, new functions) in unpredictable ways so there's no getting around some sort of live approach
      • Alternatively, you dump all the definitions every time your executable runs? Then we don't even need to do C bindings at all, we just use the definition JSON?
        • Could be useful for other tools too
        • What happens if the code fails to compile though? Can't dump symbols then
          • I guess that's fine, you need to have a baseline which compiles
        • Just point the LSP to an executable that dumps the symbols you care about for a given file, and the LSP will run it as you make changes
  • COOL Trailing commas in function calls are fine! That makes it really easy to do the "parameter pack" thing below kinda generically

  • BAD In macros, when we're doing codegen on top of functions, sometimes we want to "forward" parameter packs, and this ends up being really annoying (see CallWaitMacro and Delegate stuff)

  • BAD Would be nice to have a generic AST visitor function that takes in some context

  • IDEA Be able to launch other StateThread and cooperatively yield from them

    • They don't share heaps though so what do we do about that?
    • Should we instead just have an actual coroutine/fiber thing that just has a stack?
  • IDEA Adopt Zig's break value syntax to support expression-oriented blocks

a := 10
x := {
    if a > 5 {
        break 10
    } else {
        break 20
    }
}
  • TEST Short circuting &&

  • TEST Short circuiting ||

  • TEST Break and continue in for loops

  • TEST Got rid of null-terminated strings but didn't really add too many tests

  • BAD Right now always crashes if the same symbol is bound twice

  • BAD Cannot store any context void* with modules/bound functions

  • BAD You can return any, but you cannot assign any to anything; it is more like "unknown" I guess? Except in the event that you're returning it lol???

  • BAD Array out of bounds asserts in debug and segfaults in real

  • BAD Tiny_Value is 16 bytes because we store type tag even though technically we can get away with only boxing "any" or reference types (see the snow-common branch)

    • Likely a big performance win
    • Will need typed bytecode
  • BAD Cannot cast reference types to any

  • BAD Accessing null values causes segfault

  • BAD No type aliases (strong)

  • BUG The following compiles without error (no checking of whether function returns value)

    • This would require some amount of control flow analysis though (for the non-trivial case)
    • Could just always require the top-level block to contain a return expression? Could catch 90% of these bugs
func test(): int {}
  • BAD No error types/error info
    • Add "error" primitive type which is just an error code (int? string? it's own primitive type) and a pointer to a context object
  • BAD No runtime type information on struct object (could probably stuff a uint16_t struct ID somewhere at least?)
    • Could even just get rid of teh nfields since I think that's only used for pretty printing and debug checks
  • BAD No interface to allocate/deallocate using the thread's context allocator
  • BAD No standard regex
  • BAD No syntax highlighting
  • BAD Pretty printing %q in printf
  • BAD No panics??
  • BAD Memory unsafety introduced by varargs functions (unless we do runtime type checking)
    • For example, i64_add_many
  • BAD Sometimes error messages point to the wrong line (off by one?)
  • BAD No functions-as-values (not even without captures)
    • Could patch this hole with runtime polymorphism but ehhhhh
    • Could also technically implement this with a C library haha
  • BAD No ranges/range-based loops
  • BAD No multiline comments
  • BAD No builtin array or dict
    • Mainly for type safety; parametric polymorphism (at the library level only?) could solve this
    • The library-only parapoly prevents the script code from becoming too complex
    • Builtin array or dict will probably cover most use cases though (see Golang)
  • BAD No 64-bit integers
  • BAD No char type
  • BAD No polymorphism of any kind
  • BAD We don't pass in alignment to user-provided allocation function
  • Refactor VM from compiler

  • First class types (store a "type" which is just an integer in the VM)

  • Once "any" is safe, we can have typed bytecode instructions; OP_ADD_INT, OP_EQUAL_STRING etc

    • Untagged values
  • Function overloading

  • "Method" sugar (first argument to func matches type of x => x.func())

  • Do not exit anywhere in the library; user code must be able to handle errors

  • Less haphazard allocation: Allow the user to supply a malloc, use Arenas

  • Function return type inference (Kind of unsafe at times tho)

  • Interfaces? There is currently no ways of doing polymorphism and I think the golang approach to interfaces is pretty nice:

interface Stringable
{
    to_string(): str
}

struct Vec2
{
    x: float
    y: float
}

func to_string(v: Vec2): str {
    return strcat(ntos(v.x), ",", ntos(v.y))
}

func stringify(s: Stringable): str {
    return s.to_string()
}

func thing(v: Vec2) {
    // This still works because there might be functionality that was created to work with certain
    // interfaces that you don't want to repeat for your particular struct
    v.stringify()
}

  • Anonymous functions (closures)

HARD TO IMPLEMENT AND MAYBE OVERSTEPPING THE BOUNDARIES OF A SCRIPTING LANGUAGE:

  • SIMPLE GENERICS (See how C# does it):
// NOTE: This is not a C++ template; the types are just used for type-checking at the call site;
// the use of a generic type inside the body of the function is simply not allowed (i.e. trying to
// access a property on an object of that type for example)
// SYNTAX FOR COMPILE-TIME PARAMETER IS NOT FINAL
func get(e: entity, $t): t { ... }

// e is an entity; this will produce an ItemComponent; cast is not necessary
e.get(ItemComponent)

Of course, if it's implemented at the tiny level, then generics must also be implemented at the C level:

// $$t means pass in the actual type value as well (so we can inspect it in the C code to get the appropriate component for example)
Tiny_BindFunction(state, "get(entity, $$t): t", Lib_GetComponent);
Tiny_RegisterType(state, "array($t)");

// $t is matched with the t of the array passed in
Tiny_BindFunction(state, "get(array($t), int): t");

Done

  • BAD No array syntax, no way to iterate arrays
  • Make CurTok local or at least thread local
  • BAD Cannot access type information in C binding functions
    • Would be mega useful to make things type-safe
    • For example, could allow you to succinctly define a delegate binding
      • Just need the function name, nothing else
  • BAD No ternary operator
  • IDEA Rescript style pipe operator
arr := array_int()

// Equivalent to array_int_push(arr, 10)
arr->array_int_push(10)
arr->array_int_push(20)

arr->array_int_get(10)
  • BAD Cannot do . subscript on cast expression values
  • BAD No designated struct init
  • BUG Comparing structs does nothing
  • BUG Assigning to arguments doesn't seem to work, repro
func mutate_arg(i: int) {
    i = 10
    // Prints 5
    printf("%i\n", i)
}

mutate_arg(5)
  • BAD NULL TERMINATED STRINGS??
  • BUG The following compiles without error
func split_lines(s: str): array {
    return ""
}
  • BAD || does not short circuit

  • BAD %c is not handled as a format specifier

  • BUG && doesn't short circuit?

  • BUG continue doesn't seem to run the "step" part of the for loop

  • Make sure VM bytecode instructions and data are aligned properly

  • Safer "any" type: must be explicitly converted

  • NOTE Slices no longer needed since index sugar added

    • BAD No slices! I think the quickest way to support indexing arbitrary sequences is to allow them to produce a slice

      • Actually this may not be that great after all because this necessitates having generic types (e.g. one for slice)

      • What if I just allow native values to be indexable (along with string)? Just add getIndex and setIndex to Tiny_NativeProp

      • Alternatively (and this works better with our macro system) allow for operator overloading (maybe just the indexing operator)

        • Something like func __index_str(s: str, i: int) { return stridx(s, i) }, nothin fancy

          • "hello"[0] -> stridx("hello", 0)
          module M = struct
              let ( a + b ) = add a b
          end
          
          open M
          
          let open M in
          10 + 10
          
          array_push(a, 10)
          
          a->array_push(10) // turns
          
        • Would be nice if we inlined these (a function that just consists of returning the result of another function)

        • Maybe for now only allow adding operator overloads in C (but they should be BindFunction, not via NativeProp; maybe BindOperator)

        • In order to make it explicit that you're calling an overloaded operator, we can have them be like str.[0] (note the .). We can do this for + and - so a .+ b (this could also solve the problem of which side we get the operator from like +. would mean grab the overload from the rhs, and so on)