Turing Adventures: Memory and GC

What is Turing?

It’s a Perl-like language where memory management is solely up to the user. It’s barely object-oriented, allowing for classes, inheritance (and abstract classes) and objects - nothing beyond that.

Backstory

A while ago, I started on my Game Engine in Turing - along with a preliminary 2D physics engine. Unbeknownst to me, OpenTuring is an object-oriented programming language. Wow! However, spoiled by C#, I didn’t understand the concept of “clean up your own trash” and found “free after use” quite irritating and annoying.

Before we begin I’ll assume you have a solid grasp of object-oriented concepts, and a solid understanding of pointers and references. This article aims to demonstrate the furthest limits of the Turing language, with no practical goal in mind.

Memory Allocation

Let’s suppose that we want to create an object. We may do something like this (in C++):

1
Object* obj = new Object();

Which is just (in C):

1
Object* obj = (Object*) malloc(sizeof(Object));

Note that every malloc must have a free or else you’re just allocating resources without freeing.
Thus, to destroy the object, we can:

1
delete obj;

Which is just:

1
free(obj);

Ok, but what happens when you don’t free resources? Well, a memory leak occurs! Consider this code:

1
2
while(true)
new Object();

When run, the program will continuously allocate memory for Object without freeing it. This is terrible news as our precious memory will run out the program will crash.

Memory leaks happen when you forget to free objects and the program’s memory usage slowly blows up. Solution? Enter garbage collection (GC)!

How to GC 101

Garbage collection is a form of automatic memory management, where memory and objects no longer used by the program are released. So how do we implement a GC? We need to somehow:

  1. Keep track of all objects
  2. Figure out which ones are not in use by the program

Before we continue… Reflection!

In computer science, reflection is the ability of a program to examine and modify its structure and behaviour (preferably during runtime). Foundry from the (probably dead) compsci.ca forums posted about achieving reflection in Turing. How did he do it? That’s beyond me. Here’s my version of his Invoke.tu (modified to fit my purpose):

Tracking Objects

This is relatively easy to do. Since Turing provides no easy way to override operators, we can instead create a proxy method called New that creates a new object. We can then use reflection to create a new object!

So I defined a new procedure called New which will 2 parameters: an address of the variable that will store the reference of the object we’ll create and a class type:

Now, how the hell do we pass a type as a parameter in Turing? Well, we force Turing to interpret it as a pointer to something. Since Turing allows raw pointers, the object pointed will be in the form of:

We can then pass the baseClass as a parameter to the NEWCLASS JIT opcode and the object should be created. Then we manually craft the JIT code and force Turing to execute it!

Checking Object in Use

Notice that we pass a pointer to the reference of the object to the New procedure. That is, if we store the object in the variable obj, then New accepts the address of obj. This is useful as if obj goes out of scope, so does the object.

Using this, we can track the number of references (in-scope pointers) to the object and when the number of references reaches zero, we may assume the object is no longer being used by the program (it is no longer accessible). However, this brings up the problem of assignment - if ptr1 points to our Object and we allow ptr1 = ptr2 then leave the scope of ptr1, our GC does not know that ptr2 still has a reference to the object. This can lead the GC to potentially free an object still in use,

In C++, the solution would be to create a custom pointer type and override the assignment operators to keep track of such assignments. For instance, an assignment of ptr2 = ptr1 would increase the object’s reference count by 1. Likewise, an assignment of NULL (or similar value) to ptr1 would result in a decrease of the object’s reference count by 1.

Unfortunately in Turing, this is not a viable solution. Instead, when a new object is created, the address of the pointer to that object and the object’s address are stored in a list. When another object is created or destroyed, we loop through that list once and count the references to each object. Then, we loop through a list of allocated objects and objects with references of zero are destroyed.

However, (just like in C++) we would have to create custom pointer types. For instance…

1
var ptr : ^MyObject := existingPtr

… will not work. Instead, we have to cook up some crazy shit like:

1
var ptr : ^MyObject Equate(ptr, existingPtr)

Finally, we must tackle the question of knowing when a pointer falls out of scope. To do this, I’ll briefly introduce the stack frame. When a function is called, it allocates a stack frame at the top of the stack (stack grows down from higher memory addresses) - this stack frame is where most local variables of that function are stored. This will allow us to determine the scope of a variable by comparing its address with the top of the stack. When a function exits, the stack frame is popped and the frame below is restored.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
           + +-----------------------+
| | | bar()
| | Stack frame for foo() | {
| | | ptr2
| | Contains ptr1 | }
Decreasing | | |
Address | +-----------------------+ foo()
| | | {
| | Stack frame for bar() | ptr1
| | | bar()
| | Contains ptr2 | }
| | |
v +-----------------------+

Stack top

Notice that as we enter bar() from for(), the addresses of pointers within bar() will be lower than the pointers within foo(). So, when we exit the scope of foo() all pointers with addresses lower than ptr1 will be out of scope too. Thus, when we allocate a new object and we obtain the address of the current pointer, all current pointers whose address is lower than the current one is out of scope and can be freed. A better solution would be to obtain the current top of the stack and free all pointers that way. But this works so far.

Implementation (so far…)

Here is a proof of concept of the algorithm outlined above. We take advantage of Turing’s flex (flexible) arrays to store a list of pointers/references.

Here is an example script to test the GC:

And a screenshot of the results:

Results

Oh yeah, you can find the full set of files here.

Next Steps? (Function Parameters)

This is not possible 😰