RASMO

RASMO - Transforming Assembly into BrainFuck

BrainFuck is an esoteric programming language consisting of a linear memory traditionally of 30,000 one-byte cells, and 8 instructions allowing to manipulate this memory:

• + : Increments the current cell by 1
• - : Decrements the current cell by 1
• > : Moves the pointer indicating the current cell to the right
• < : Moves the pointer indicating the current cell to the left
• . : Prints the content of the current cell as an ASCII character to the standard output
• , : Reads a character from the standard input and stores it in the current cell
• [ : If the currently pointed cell contains the value zero, the program jumps directly to the corresponding ]
• ] : Returns to the corresponding [

As you can see, this language is extremely minimalist, and yet Turing-complete (if we omit the finite memory), which means that theoretically, any program can be written in the Brainfuck language. Nevertheless, as you probably suspect (and as the name of the language suggests), programming in Brainfuck quickly becomes a really hard task if you plan to make programs even a little advanced.
RASMO allows you to write programs in a language close to a kind of assembly language, and then transform this code directly into Brainfuck code.

But before being an assembler for Brainfuck, RASMO was a simple virtual machine created for the humorous purpose of programming directly with 0s and 1s in a text file, a program that was natively understood by the RASMO virtual machine.
I then decided, for an unknown and probably dangerous reason for my mental health, to start a project aiming to automatically create a Brainfuck program from a RASMO program.

Download the final tools:
• Windows Tools
• Linux Tools

The RASMO Virtual Machine

A RASMO binary program is organized into instructions. It is a sequence of instructions. An instruction itself is composed of 3 numbers: the opcode (indicating which instruction to execute), the first argument, and the second argument.

The opcode is a number between 0 and 31 and is therefore coded on 5 bits. Each of the two arguments is a number between 0 and 255 and is therefore coded on 8 bits. An instruction is therefore ultimately coded on 5 + 8 + 8 = 21 bits.

The RASMO language:

Since it is not easy to program in binary, I have defined a "language" that substitutes for raw binary and allows for easier programming, but therefore requires transformation before execution.

Thus, instead of writing:

00111 00000000 01100001

00101 00000000 00000000

we write:

set 0 97
printnum 0 0

As the arguments of the instructions are numbers between 0 and 255, all the numbers you will manipulate in RASMO will be positive integers between 0 and 255.

Here is the entirety of the RASMO instructions:

00000 ADD : sets the cell x to x + y
00001 SUB : sets the cell x to x - y
00010 MUL : sets the cell x to x y
00011 DIV : sets the cell x to x // y
00100 MOD : sets the cell x to x % y
00101 PRINTNUM : displays x as a number
00110 PRINT : displays x as a character (according to the ASCII table)
00111 SET : puts y in cell x
01000 COP : copies the content of the cell pointed to by the cell at address x into the cell pointed to by the cell at address y
01001 INPUT : copies the character entered by the user into the cell at address x
01010 INPUTNUM : copies the number entered by the user into the cell at address x
01011 PUSH : puts the cell pointed to by x in the stack
01100 POP : puts the first element of the stack in the cell pointed to by x and removes it from the stack
01101 LABEL : registers the label with the name x,y
01110 GOTO : Goes to the instruction following the label with the name x,y
01111 GT : sets the cell x to (number at the cell pointed to by x) < (number at the cell pointed to by y)
10000 LT : sets the cell x to (number at the cell pointed to by x) > (number at the cell pointed to by y)
10001 GTE : sets the cell x to (number at the cell pointed to by x) >= (number at the cell pointed to by y)
10010 LTE : sets the cell x to (number at the cell pointed to by x) <= (number at the cell pointed to by y)
10011 EQ : sets the cell x to (number at the cell pointed to by x) == (number at the cell pointed to by y)
10100 NOTEQ : sets the cell x to (number at the cell pointed to by x) != (number at the cell pointed to by y)
10101 AND : sets the cell x to (number at the cell pointed to by x) and (number at the cell pointed to by y)
10110 OR : sets the cell x to (number at the cell pointed to by x) or (number at the cell pointed to by y)
10111 XOR : sets the cell x to (number at the cell pointed to by x) xor (number at the cell pointed to by y)
11000 NOT : sets the cell x to not (number at the cell pointed to by x)
11001 JUMP : jumps to label y if number at adress x is zero
11010 CALL : goto label x,y and returns here if a RETURN instrction was encountered
11011 RETURN : returns to label if call was used
11100 INCR : adds y to the cell pointed to by x
11101 DECR : subtracts y from the cell pointed to by x
11110 RAND : sets the cell pointed to by x to a random number between MEMOIRE[256CHUNK + x] and MEMOIRE[256CHUNK + y]. Note: Does not work during Brainfuck transformation
11111 SETCHUNK : sets the current chunk to the chunk indicated by the value in cell x

x and y respectively designate the first and second arguments given after the instruction. As you have noticed, some instructions (like pop for example) only require one argument. However, in reality, they must still receive a second argument, but they will not use it. So, make it a habit to put 0, or even -. The syntax '-' is strictly equivalent to '0' but clearly indicates that it is an unused argument.

The instructions performing operations take two addresses as arguments, perform the operation with the numbers located at these addresses, and put the result in the address indicated by the first argument.

Thus, the following program displays 7:

set 0 2
set 1 5
add 0 1
printnum 0 -

The stack:

The stack is a data structure integrated into RASMO and allowing to store several values. To add an element to the stack, you must use the push instruction. A stack is a special data structure, and when using a stack, you can only retrieve the last element pushed. Putting numbers into a stack literally consists of stacking them, and to access a number in the middle of the stack, there is no other choice than to unstack all the elements above.

Labels and gotos allow you to jump anywhere in a program. When you put a label, it's as if you planted a flag in the program at the location where the label appears. The goto instruction allows you to jump directly to the label number indicated in the parameter. A supplement on labels will come after.

First, let's talk about jump. The jump is the conditional jump. It is one of the most important instructions, without which IF, WHILE, FOR loops would not exist. However, the jump instruction is not complicated. It is used as follows: jump address label. Here is the behavior of the jump: if the condition represented by the number at the address indicated in the jump is false (that is, the address points to zero), then the program jumps directly to the indicated label. Otherwise, nothing happens.

Example of a simple IF using a JUMP:

set 12 condition ;
jump 12 5
PRINTSTRING \La condition est vraie !\ 1
goto 1 1
label 5 -
PRINTSTRING \La condition est fausse !\ 1
label 1 1

If the condition is true, jump 12 5 will do nothing, and the program will display The condition is true. Then it will jump to the end to avoid the second block. If, on the contrary, the condition was false, the program will directly jump to label 5 and display The condition is false.

Here is a small supplement on labels and jumps:

The label instruction taking 2 numbers as arguments, it is important to understand that the labels are represented by these two numbers (between 0 and 255), and that there are therefore theoretically 256256 = 65536 usable labels.

Some instructions like goto or label allow you to enter a complete/large label name (two arguments), but some like WHILE only accept label names between 0 and 255 (represented by a single argument). For these particular instructions, the labels you enter will be of the form x (a single number between 0 and 256), but the real labels that you will use will be labels in the form x 0 or x - instead of x y. Thus, to use these labels on instructions taking large labels (with two arguments), replace simply the second argument by 0 or -

The call instruction and the return instruction go in pairs. call labelpartie1 labelpartie2 is almost like goto labelpartie1 labelpartie2, except that once the jump is made, if the program encounters at some point or another a return - - (the returns do not take arguments), it will return to the location where the call was made. This opens the door to the creation of functions.

The incr and decr instructions take as arguments an address and a number and increment or decrement the value located at the indicated address by .

The cop instruction is particularly tedious to use, but I will explain it to you in a little more detail because it is essential and still very powerful.

To explain it to you, I will take a number: 48. This number, I put it in cell 1. Finally, in cell 0, I put 1. Concretely:

Address	Value
0	1
1	48

In addition, in cell 11, I put 10. In total, we have:

Address	Value
0	1
1	48

11	10
10	?

what can also be represented by:

0 -> 1 -> 48

11 -> 10 -> ?

Now, if I execute cop 0 11, cell 10 will contain 48.

Remember this:

If x points to a and a contains b and y points to c, cop x y will copy the number b into cell c

If we translate the example from above into code:

set 0 1
set 1 48
set 11 10
cop 0 11
printnum 10 -

This displays 48.

Macros:

The macros defined below simplify the life of the programmer and are transformed into code during assembly/compilation. That's why you saw me write PRINTSTRING even though it's not an instruction. In reality, PRINTSTRING and all the other macros are transformed into several instructions during compilation.

However, for the sake of simplicity of the parser, the entire program must be divided into blocks of 3 "words" in a row. This is why we find two types of macros:

- Macros with 2 arguments

- Macros with 5 arguments

So that these macros (arguments + keyword) are blocks of size a multiple of 3.

However, it is important to note that some instructions do not always require 2 parameters, but sometimes only 1. This is why we complete the parameters with zeros or '-', in such a way as to always have blocks of 3 "words" and thus the keywords are always at the beginning of a block of 3.

Syntax of macros:

PRINTSTRING string mem : string is a sequence of glued characters and mem must be a free memory cell that will be used to copy the characters before displaying them

PRINTSTRING can be used with two different syntaxes, in two different cases:

Or you want to display a string of characters without spaces, in which case you can for example write PRINTSTRING coucou 0

Or your string of characters has spaces, and you must surround it with \ characters. Example: PRINTSTRING \La condition est vraie !\ 1

FOR label1 label2 variant nombremax mem

ENDFOR label1 label2 variant pas - <- five-argument instruction

These two macros allow you to simply write for loops.

You put the FOR macro, then you define the body of your for loop, and finally you put the ENDFOR.

label1 and label2 must be free label numbers and which will be used exclusively for this FOR loop. As you have noticed, these are labels of type [0-255] 0 and not [0-255] [0-255] as I explained in the paragraph on labels and jumps.

variant is the memory cell that will be used for the FOR loop counter

FOR loops also use an extra cell, the mem cell.

The number pas is by how much the loop variant will be incremented each turn.

nombremax is the upper bound for the loop variant.

WHILE label1 label2 condition - -

ENDWHILE label1 label2

The principle is about the same, except that you must take care of calculating the condition of the while loop yourself (with logical instructions).

FUNCTIONDEF id label : defines a routine that can be called by call id -. label is a label used by the FUNCTIONDEF macro because it defines gotos

FUNCTIONEND label -

Between the two, you can write the body of your function. Note that there is no need to put a return, because it is integrated into the FUNCTIONEND.

How chunks work:

We are coming to the end and there is only one slightly mysterious instruction left: setchunk.

As you have seen, the instructions only take arguments between 0 and 255. However, at this stage, I have also said two other things that may seem contradictory: RASMO supports direct addressing and the accessible memory range has a size of 65,536 bytes. How is it then possible to access memory cell n°35,451 for example?

The answer is: chunks. It will not have escaped you, 65,536 = 256 256. And it's not for nothing. In reality, the memory range of 65,536 bytes is divided into 256 chunks of 256 bytes. Inside a chunk, you can directly address any byte of the chunk you are in, by numbers between 0 and 255. The setchunk instruction allows you to change chunks among the 256 possible chunks.

Notable ASCII codes:

Space: 32
Line feed: 10
Backspace: 8
The digit zero: 48
The uppercase 'A': 65
The lowercase 'a': 97

Defines:

As you have noticed, in RASMO, we only manipulate numbers for memory addresses and labels. Defines allow you to use words instead of numbers. For example, if I write:

define myVariable 56

I can write myVariable everywhere there is a 56 normally. This makes the code more readable. I encourage you to use defines massively.

Comments:

The only type of comment that exists is the single-line comment, which starts with ; and ends at the end of the line

Note:

Thanks to calls and the stack, it is perfectly possible to create functions taking arguments, and even to program recursion. You will find examples of these advanced concepts in the files fact.rsm (where I implemented a recursive factorial function) and print.rsm (where I implemented a function taking a string of characters as an argument and displaying it).

To send arguments to a function, the best idea is to stack them on the stack, and then call the function with call. The first thing the function does is then to unstack the arguments to use them and do its calculations. To return one (or more values), the function can also stack numbers on the stack before exiting. After calling the function, you unstack the return values.

To implement recursive functions, there's a slight subtlety. Before launching the recursive call, you must stack all the variables you'll need after the recursive call, and then stack the arguments of the recursive call. Indeed, otherwise, the recursive calls will overwrite the value of the boxes you've used (all variables are global).

Thanks to the fact that you have stacked your important values, after the end of the recursive call, you can unstack the result returned by the recursive call, and then retrieve the values you had stacked, and continue your calculations. This little sleight of hand has allowed your important values not to be overwritten by the same values during recursive calls.

The RASMO ⇨ BrainFuck transformation

Let's now see how the BrainFuck transformation tools work.

As you may suspect, it is not possible to create a pure compiler from RASMO to Brainfuck. Indeed, the fundamental difference between these two languages is that in RASMO, you have the possibility to control the program's execution pointer (via goto, call). You can jump from one place to another in the program as you see fit, which is impossible in Brainfuck. The only possible jumps are loop jumps, which cannot be combined as desired.

For this reason, rather than making a simple compiler, I wrote a RASMO interpreter in Brainfuck, which reads a RASMO program written in memory and executes it. For this reason, the memory (the brainfuck tape) is divided into three main parts:

The RASMO program loaded into memory
The arithmetic, logic and control unit
The heap and the two stacks

Each of these parts of the memory is itself subdivided into smaller and repetitive parts named chunks (which is not a well-chosen word because there are many types of chunks).

The memory area reserved for the program is subdivided into 256 large chunks of 256 small chunks. Each of these small chunks is organized as follows:

| OP  | ARG1 | ARG2 | AD  | DEST | CAR |

OP, ARG1 and ARG2 is the program instruction, AD is the address of the small chunk within the large chunk, and DEST and CAR are cells allowing to perform copies when moving values.

In addition, each beginning of a large chunk is organized in this way:

| OP  | ARG1 | ARG2 | AD  | CHUNKNO | CHUNKAD |

where CHUNKAD is the large chunk number in memory. CHUNKNO is used to move values.

Next, the ALCU is simply composed of about twenty cells, the first 7 of which are reserved for calculating the elif on the opcodes, and the others for performing operations such as division, multiplication, etc.

Finally, for what could be called RAM (the heap and the two stacks), the memory is also divided into 256 large chunks of 256 small chunks. Here's how a small chunk is organized:

| CAR  | DEST | AD  | PILE I | PILE LS | TAS |

CAR and DEST are used to move values, AD corresponds to the place of the small chunk in the large chunk, and the last three cells are memory cells, respectively reserved for the Integrated Stack (called PILE I), the Self-Service Stack (called PILE LS), and the Heap (called TAS).

Finally, each of the large chunks starts with a special small chunk organized as follows:

| AD  | CAR0 | DEST0 | PILE LS | PILE I |

where AD is the large chunk number, CAR0 and DEST0 allow to store temporary values when moving from large chunk to large chunk, and the last two cells contain the size of the division of the PILE I and the PILE LS within the large chunk.

Finally, the entire RAM starts with the following three cells:

| NBCHUNK | PILE I | PILE LS |

which respectively allow to store the current chunk number, the active chunk of stack I and the active chunk of stack LS.

As you notice, the construction of this interpreter capable of executing RASMO code is similar to the construction of a processor architecture. We have reserved places (an ALCU, a RAM, and a ROM), and we spend our time moving values from one to the other and vice versa to execute the code.

If we imagine that the RASMO program is already loaded into memory, the execution scheme will always be the same:

Pointer to the current instruction
Moving OP, ARG1 and ARG2 to the beginning of the ALCU as well as a reminder of the address of the current instruction to return to it.
Execution of the elif on OP to launch the part of the ALCU corresponding to the requested instruction
This instruction very often requires going to fetch values in the middle of the RAM to copy them into the ALCU, then doing calculations with them and bringing them back into the RAM. The instruction can also modify the reminder of the current program address if it is a goto, a call, etc.
Thanks to the reminder, the pointer returns to the current instruction of the program, moves to the next one, and repeats the operation.

This large while loop ends when the current opcode is 41.

As you can see, once the program is loaded into memory, all that remains is to execute the interpreter (the large while loop).

How the RASMO -> Brainfuck compiler works

With the above being said, it becomes extremely simple to create the compiler (once of course we have finished the RASMO interpreter, which is more than 99% of the work). Indeed, it suffices to concatenate three Brainfuck programs in the same file that will fit together:

The program responsible for writing the RASMO executable in the 'ROM'
The program responsible for organizing all the memory and chunks (this makes several hundred thousand cells to browse and write)
The RASMO interpreter

The role of the compiler is therefore only to calculate program n°1 according to the RASMO binary, then to concatenate it with the other two programs which, for their part, never change.