What is Turing?
It’s a Perl-like language where memory management is solely up to the user. It’s barely object-oriented, allowing for classes, inheritance (and abstract classes) and objects - nothing beyond that.
A while ago, I started on my Game Engine in Turing - along with a preliminary 2D physics engine. Unbeknownst to me, OpenTuring is an object-oriented programming language. Wow! However, spoiled by C#, I didn’t understand the concept of “clean up your own trash” and found “free after use” quite irritating and annoying.
Before we begin I’ll assume you have a solid grasp of object-oriented concepts, and a solid understanding of pointers and references. This article aims to demonstrate the furthest limits of the Turing language, with no practical goal in mind.
Let’s suppose that we want to create an object. We may do something like this (in C++):
Which is just (in C):
Note that every
malloc must have a
free or else you’re just allocating resources without freeing.
Thus, to destroy the object, we can:
Which is just:
Ok, but what happens when you don’t free resources? Well, a memory leak occurs! Consider this code:
When run, the program will continuously allocate memory for
Object without freeing it. This is terrible news as our precious memory will run out the program will crash.
Memory leaks happen when you forget to free objects and the program’s memory usage slowly blows up. Solution? Enter garbage collection (GC)!
How to GC 101
Garbage collection is a form of automatic memory management, where memory and objects no longer used by the program are released. So how do we implement a GC? We need to somehow:
- Keep track of all objects
- Figure out which ones are not in use by the program
Before we continue… Reflection!
In computer science, reflection is the ability of a program to examine and modify its structure and behaviour (preferably during runtime). Foundry from the (probably dead)
compsci.ca forums posted about achieving reflection in Turing. How did he do it? That’s beyond me. Here’s my version of his
Invoke.tu (modified to fit my purpose):
This is relatively easy to do. Since Turing provides no easy way to override operators, we can instead create a proxy method called
New that creates a new object. We can then use reflection to create a new object!
So I defined a new procedure called
New which will 2 parameters: an address of the variable that will store the reference of the object we’ll create and a class type:
Now, how the hell do we pass a type as a parameter in Turing? Well, we force Turing to interpret it as a pointer to something. Since Turing allows raw pointers, the object pointed will be in the form of:
We can then pass the
baseClass as a parameter to the
NEWCLASS JIT opcode and the object should be created. Then we manually craft the JIT code and force Turing to execute it!
Checking Object in Use
Notice that we pass a pointer to the reference of the object to the
New procedure. That is, if we store the object in the variable
New accepts the address of
obj. This is useful as if
obj goes out of scope, so does the object.
Using this, we can track the number of references (in-scope pointers) to the object and when the number of references reaches zero, we may assume the object is no longer being used by the program (it is no longer accessible). However, this brings up the problem of assignment - if
ptr1 points to our
Object and we allow
ptr1 = ptr2 then leave the scope of
ptr1, our GC does not know that
ptr2 still has a reference to the object. This can lead the GC to potentially free an object still in use,
In C++, the solution would be to create a custom pointer type and override the assignment operators to keep track of such assignments. For instance, an assignment of
ptr2 = ptr1 would increase the object’s reference count by 1. Likewise, an assignment of
NULL (or similar value) to
ptr1 would result in a decrease of the object’s reference count by 1.
Unfortunately in Turing, this is not a viable solution. Instead, when a new object is created, the address of the pointer to that object and the object’s address are stored in a list. When another object is created or destroyed, we loop through that list once and count the references to each object. Then, we loop through a list of allocated objects and objects with references of zero are destroyed.
However, (just like in C++) we would have to create custom pointer types. For instance…
… will not work. Instead, we have to cook up some crazy shit like:
Finally, we must tackle the question of knowing when a pointer falls out of scope. To do this, I’ll briefly introduce the stack frame. When a function is called, it allocates a stack frame at the top of the stack (stack grows down from higher memory addresses) - this stack frame is where most local variables of that function are stored. This will allow us to determine the scope of a variable by comparing its address with the top of the stack. When a function exits, the stack frame is popped and the frame below is restored.
Notice that as we enter
for(), the addresses of pointers within
bar() will be lower than the pointers within
foo(). So, when we exit the scope of
foo() all pointers with addresses lower than
ptr1 will be out of scope too. Thus, when we allocate a new object and we obtain the address of the current pointer, all current pointers whose address is lower than the current one is out of scope and can be freed. A better solution would be to obtain the current top of the stack and free all pointers that way. But this works so far.
Implementation (so far…)
Here is a proof of concept of the algorithm outlined above. We take advantage of Turing’s
flex (flexible) arrays to store a list of pointers/references.
Here is an example script to test the GC:
And a screenshot of the results:
Oh yeah, you can find the full set of files here.
Next Steps? (Function Parameters)
This is not possible 😰