Virtual machine, - revisited

After implementing the first version of the virtual machine, here is some of my own notes for the next version.

Virtual machine

I work with a clear division of words and chunks.

Currently the chunks are handled as the offset of the chunk into the heap. It would be nicer to change this to be a word pointer, to the beginning of the content of the chunk.

The chunk has three words of meta-data

Notice that the gc-word is always zero, except during garbage collection. This means that the first byte after the data is always zero, and strings in chunks can be read as zero-terminated c-strings.

Atoms

At the beginning/back of the stack, is where global atoms/defines, which can be changed, are saved. This works well with the garbage collector, as the stack is the root / only non-persistent/immutable part of the memory.

Functions/code

The code/functions lives on the heap, and may thus be relocated during garbage collection. Instead of just having an instruction pointer, it thus makes sense to see a code address as a function pointer plus an offset into the function code.

A frame pointer will also be useful, as it allows RET-instruction, and ARG1, ARG2, … LOCAL1, … as single byte instructions.

Interaction with the virtual machine

Interaction is done by sending a block of bytecodes to the virtual machine, which is then executed. The virtual machine should have instructions for building function definitions etc. in the heap.

PUSH_BYTE CLOSURE 0                ; write bytecodes for reading first object in closure
PUSH_BYTE CALL_CLOSURE 1           ; write bytecode for printing a value
PUSH_BYTE RET                      ; write bytecode return, - total number of bytecodes are 4
"Hello world!"                     ; pushes reference to heap object "Hello world!" 
GLOBAL #0 "print" DICT_GET         ; get print-function in the table of globals
5                                  ; length of bytecodes
2                                  ; length of closure
FUNCTION_TYPE NEW                  ; create a function object
GLOBAL #0
"hello"                            ; intended global name of function
LOCAL0                             ; the created function
ASSOC
SET_GLOBAL #0
RET

Create a new function using the global function “print” to print hello world, and save it as “hello” in global0.

The above code is written in a semi-assembler. #4 means 4 as a literal bytecode. 17 means the bytecodes that pushes 17 onto the stack (in this case it could be PUSH_BYTE #17). "foo" means the bytecode that pushes the string “foo” to the heap (in this case 102 111 111 3 0 STRING_TYPE NEW).

The bytecode instruction set has not yet stabilised.

List of globals

Not finalised, - lives at the bottom of the stack.

Dictionary data type

The language should have a built-in persistent dictionary data types, I intend to make a couple of different implementations to measure which is best.

Simple tree implementation

As a base test case.

TRIE (with path compression)

Each chunk already has the type, the number of bytes in the data section bytelen, and the number of pointers ptrlen encoded, so a Compressed Trie node can be implemented as:

HAMT

The implementation would be similar to CHAMP.

I would probably make a slight change in the memory layout, so instead of having a separate nodemap and datamap, I would have a single bitmap, with two bits per entry:

In addition to the bitmap, there would also be an array of references, - where the position for the current hash-part can be found with popcnt as in the standard HAMT/CHAMP.

This layout makes it easier to compact the hamt on delete, in my opinion. The general characteristics should be similar to CHAMP.