Playing with compilers

Over the past few weeks I’ve been working on a rewrite of Shogun. It is still a stack machine, but now includes two registers, a sane assembler, a mostly complete set of opcodes, and now even a language that compiles to it!


Shogun has been rewritten in its entirety, though it does use snippets of the original here and there (specifically, Shogun::Object is almost cut-and-paste from Shogun::Data). Binaries still use the same header, but the version has been changed to 0.2.x. New versions of Shogun will refuse to run old binaries and old versions will refuse to run new binaries. This is necessary as first of all the opcodes have changed, but more importantly the entire layout of the binary has changed.

The single biggest change to both the VM and the binary layout is the way Shogun now handles memory. There are only two segments of memory: The stack and the heap. There is no separate program memory, as Shogun simply reads the program from the heap. What this means for the binaries is that they only have two sections: the header and raw memory. There is no separation between what an opcode is and what data in memory is. What is nice about this is that you can actually pre-store memory you need inside your binary, and it will immediately be available inside the VM as long as you know where to address.

Now, as for the registers I mentioned, there are only two. They each can only hold an unsigned integer (internally known as the address type). The first is known as PRI, or program index. It simply stores the next address for the VM to execute. The second is known as MMX, which stores a local position of memory. When a program is loaded, MMX is set to the first unallocated position of memory. The VM never uses MMX itself, so it is up to the programmer as to how to use it. Sholan uses it to denote the current scope’s local memory.

I’m probably not going to be updating the online Shogun sandbox, as I don’t want to put the horribly written memory limiters on the new version of SVM. In fact, I’ll probably be taking down the sandbox at some point simply because it is horribly outdated.

Some other miscellaneous improvements:

  • Labels and JUMPing are now sane
  • No more global variables; you can’t define variables in your assembly, you have to hardcode addresses (or use the stack)
    • This was mostly done as it adds a bit of complexity to the assembler and is useless for an actual compiler to use
  • New filename extensions: *.shasm for assembly and *.sx for binaries.
    • On that note, Sholan files are *.sl


Sholan is a new project that was added to Shogun a couple of weeks ago. It is my experimental compiler for a language also known as Sholan. I’ve been designing this language around Shogun, and have decided to finally build a toolchain for it. The compiler is written in C#, much to my chagrin, as the lexer/parser generator ANTLR4 doesn’t currently have a C++ target. Luckily, Sholan should run fine on Mono.

As for using the language itself… I’ll be writing a whole lot more about it in some followup posts.

Moving on from here

Looking to the future, I’m mostly focusing on Sholan development. Building something that uses SVM rather than building the VM itself forces me to think about what I really need from it. Here’s a list of some of the things that are in the works for both SVM and Sholan:

  • [SVM] Dynamic library support – some way to tell AsmReader to import external dependencies
    • Currently Sholan just throws every imported file into one binary. Woohoo monolithic code!
    • I want to try to get this done without really adding much to the VM itself.
  • [Sholan] Not all operators are supported
    • Stuff like bitwise operations
  • [Sholan] Actually use the heap as a heap and not as a stack
  • [Sholan] Some sort of OOP system
    • Not sure how I want to do this, as it is a huge beast to tackle.

Now I’ll leave you with a small snippet of code from Sholan:

// FizzBuzz
import "lib/"

entry {
	var count = readline()

	for(var i = 1; i <= count; i = i + 1) {
		var output = ""

		if(i % 3 === 0) {
			output = "Fizz"

		if(i % 5 === 0) {
			output = output .. "Buzz"

		if(i % 3 !== 0 && i % 5 !== 0) {
			output = i


Leave a Reply