Skip to content

Prototypes for a system of experimental Forth/PostScript-like languages

License

Notifications You must be signed in to change notification settings

TOGoS/TScript34

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TScript34

Sub-projects

P0001
Java…tokenizer, thing
  • Status: Some unit tests that pass
P0002
C# even-more-simplified interpreter
  • Status: Works; used to implement parts of MapTranslator35
P0003
Test cases
P0004
Some compiler written in Janet, or something
  • Status: ???
P0005
A minimal lisp, implemented in Java
  • Status: incomplete
P0006
Tiny single-class VM challenge
  • Status: Works, but minimally useful
  • A forthlike VM using String[] to store the program
P0007
Demonstration of dynamically generated/loaded class files
  • Status: Concept proved to work.
P0008
‘Java Command Runner 36’
  • Exists both separately and as a subproject of TScript34 because I can’t make up my mind which it should be. Currently the standalone project is ahead, and has a cleaner history than the p0008/master branch in this repository, so probably use that.
P0009
Single-class Java implementation of TScript34.2 interpreter
  • Program encoded as int[] with a table of Object[] constants
  • Stack is Object[], with Object[] { SPECIAL_MARKER, tag, … } used to denote non-literal values (i.e. values represented in an indirect way, e.g. ‘value of expression X’, ’
  • Status: Concept proved, has some unit tests.
  • May be expanded upon to provide more operations, though I would still like to keep the core very small, so need to come up with some extension mechanism.
P0010
Java classes to formalize schemas for ‘indirect value’ / ‘thunk’ / ‘meaning-tagged value’ objects
P0011
Hopefully-more-minimal-than-P0005 lisp-in-Java, because I still want one, and P0005 got bogged down with continuation stuff.
P0012
Proof-of-concept Maven library and thing to use it
P0013
CPS-based interpreter demo
P0014
TEF parser java library
P0015
RDF parser java library
P0016
An implementation of Lox
P0017
A design for a hybrid command/functional language
P0018
Java library to help publish Maven packages
P0019
A TScript34.2 interpreter based on P0009
P0020
Process[Like] management experiment in Java

OIDs

1.3.6.1.4.1.44868.261.34
This project
1.3.6.1.4.1.44868.261.34.n
Sub-project t
1.3.6.1.4.1.44868.261.34.10.t
Indirect value representation t; see ./P0010/src/main/java/net/nuke24/tscript34/p0010/IndirectValueTags.java for the list of tags!

Language(s)

I should probably specify it, huh? With test cases and stuff.

Well, maybe for now it’s enough to say that the language is intended to be compatible with PostScript. i.e. a program that is valid in either PostScript or in TScript34 should mean the same thing and have the same result in either. That said, some PostScript programs might not be valid TScript34 programs, and vice-versa.

Common Syntax

Languages aside from the ‘syntaxless’ TS34.2 (which is line-based) and PostScript clones (though they may extend the syntax to support line comments) may share common tokenization rules.

#!/shebang/line
# line comment
#SPECIAL-DIRECTIVE

foo-bar:baz/quux#quuux # Bareword, includinbg some punctuation characters;
                       # Note that '#' only starts a line comment when precedded
		       # by whitespace or the beginning of a line

[abc 123] # Square braces are self-delimiting: `[` `abc` `123` `]`
(asd 123) # So are parentheses
{asd 123} # And so are curly braces, except in TCL-like languages,
          # where they act like nestable double quotes.

# Single and double quotes follow the same tokenization rules

'quoted symbol\n' # Single quotes mean 'treat as a symbol'
                  # (except in Lispy languages, where 'foo means (quote foo)
"quoted string\n" # Double quotes mean literal string

‹hello \ ‹there›› # Nestable symbol quoting without escapes
«hello \ «there»» # Nestable literal quoting without escapes

‹› and «» are called ’guillaments’.

Alternate quote styles

The single and double regular and nestable quotes are the same characters with the semantics as defined by the TOGVM-PHP language and SchemaSchema. Other unicode quotes might allow nesting with escape sequences, or other permutaions of nestable/escapable/supporting interpolations or not (see https://github.com/TOGoS/TOGVM-Spec/blob/master/test-vectors/tokens/quotes.txt).

However, that seems to lead to some ambiguity: at which level are the escapes decoded? The answer is probably: at the outermost quotation, since that is the most straightforward. But that might seem surprising and/or not the most useful interpretation to someone writing with them. Therefore I am punting by simply disallowing them, for now. The following quote characters should be reserved; i.e. recoignized but unsupported (for now):

`backticks`
‘nestable single quotes’
“nestable double quotes”
「Japanese single quote」
『Japanese double quote』
〈Japanese angle quote〉
《Japanese double-angle quote》
【Whatever this is】
〔This other one〕
〖More crazy unicode quotes〗
〘Yet more of them!〙
〚Holy crap, so many weird quote characters〛

(the last few were simply copied from https://en.wikipedia.org/wiki/CJK_Symbols_and_Punctuation for completeness; I have never thought about using them or what they would mean)

FAQ

What the %!&*@ is this?

A collection of projects, some entirely experimental, that are vaguely related in that they share the goal of defining minimal, cross-platform programming language interpreters, VMs, or compilers.

PostScript?

Some of the sub-projects attempt to define or implement a small PostScript-based language specification.

The goal is to have a very easy-to-implement cross-platform core that can bootstrap nicer languages (e.g. scheme, more fleshed-out PostScript, etc).

Why PostScript and not Forth, Scheme, TCL …

Being a concatenative stack-based language means very little ‘parsing’ is needed; tokens are tokenized and fed directly to the interpreter.

Feel free to implement higher-level languages using TScript34. Actually that’s kind of its purpose.

PostScript seems like a more elegant language than Forth, with ‘{ procedures }’ as first-class objects, somewhat more conventional operation names, symmetrical string syntax ‘(foo)’ instead of ‘” foo”’, and fewer assumptions that it is running very close to the metal.

Why the focus on state machines / ‘reactive’ / ‘push-based’ parsers?

Because I want them to be ‘stackless’.

(See https://kyju.org/blog/piccolo-a-stackless-lua-interpreter/)

Basically I was bitten by the continuation-passing-style bug long long ago, and find the idea of a parsing function taking a whole thread hostage distasteful.

Why?

  • A non-IO function blocking on I/O breaks the single-responsibility principle; now callers need to know not only your functional API, but also the blocking behavior of I/O streams
  • Not relying on any given I/O system makes processing functions more generally useful

The ‘Danducer’ pattern breaks all stream-processing routines down into state machines that ‘never block’ (except to do computation), but only handle input by returning output, an updated version of themselves, and whether they are waiting for more input.

Other languages/VMs to consider implementing

WebAssembly

Might be slightly less ‘minimal’ than what I’m going for, here, though admittedly I haven’t tried it.

TODO: Read https://www.javaadvent.com/2022/12/webassembly-for-the-java-geek.html

It is compelling.

The Uxn/Varvara ecosystem is a personal computing stack based on a small virtual machine that lies at the heart of our software, and that allows us to run the same application on a variety of systems.

Sounds very similar to what I’m going for, so why not!

Related

He’s the author of PuTTY. He talks about what I call ‘the reader-writer problem’ and how coroutines solve it in ’use cases

Some guy on HN seems to be after something similar

I’ve been working on something centered around extensibility, or metaprogramming, coming from a strictly imperative angle, with the belief that anything else (functional, relational/logic based, whatever) can be built on top of that.

A few guiding principles are:

  • simplicity above all, with as few fundamental elements as possible
  • the parser is a separate issue, just write your own syntax to avoid the most divisive bikeshed element of PL design, or pick the C like or ALGOL like one out of the box. You very likely want your own syntax anyway as you write extensions.
  • every language element, from modules down to function calls, are first class, ie have an (implementing) type, can be stored in variables and used in expressions, be introspected and evaluated/deployed.
  • runs at compile time, compiles at run time (code generation/partial evaluation/dynamic code)
  • generates C, Java, Python and various bytecodes to maximise interoperability, code availability and deployability
  • has no standard runtime or standard library of its own, is entirely parasitic on other environments

Even if it ends up being completely useless, it’s a really interesting exercise in design.

(HN comment)

Candy - a functional language with assertions in place of types

Seems similar to what I was thinking w.r.t. a scheme-like where you could define constraints like so:

(define (divide a b)
  (assert (is-number a))
  (assert (is-number b))
  (assert (is-nonzero b))
  (...logic to do the division here))

That said, not sure if it follow the other principles I have in mind about there being no types distinct from behavior. The README indicates there are some ‘predefined types’, such as int, text, list, struct. Can I define my own thing that ‘looks like’ a list?

(My current thinking is that lists should just be values that can be ~car~red and ~cdr~ed and ~cons~ed.)

Rye - A mostly-pure, low-syntax homoiconic scriping langyuage

Relevant to my thought that we can “just use monads” for I/O:

These are not the only language features that can be used to model effects, and other features also fall into one of these buckets. For example, monads are also statically typed and lexically scoped. However, a major objection to monads is that they model effects in a specifically layered way, so that for example there is a distinction between an IO<Result<T, E>> and a Result<IO<T>, E>. Coroutines on the other hand are order-independent: all coroutine that yield Pending and Exception have the same type, there is no distinction of order. The same is true of effect handlers.

A Forthlike language with some interesting ideas. $foo defines (in a lexical scope delimited by..parens, I think) the name foo to mean the thing on top of the stack. 'foo quotes the symbol, and ^foo puts the thing identified by foo onto the stack instead of executing it. 'foo pop and 'foo push store and load the value defined by foo, respectively.