Garbage Collectors (GC) and Resource Acquisition Is Initialisation (RAII) are two concepts that are focusing a similar goal: They want to make it easier for programmers to write code that does not leak resources. A resource leak can occur in many places. The most common place is when allocating memory: When a programmer needs some memory to store data in, he allocates it. Allocating memory is a relatively slow process that tries to find some unused memory in the application's piece of the RAM. When the programmer is done with the data, i.e. he finished computing the results or the user closed a window, he has to deallocate the memory. If he doesn't deallocate the memory, the operating system will think he still needs it. When doing this with large heaps of memory, for example for displaying a bitmap, this allocated memory that isn't needed anymore can pile up and make the program slower or even let it crash because the RAM is full.
Here's where the garbage collector comes in: The GC scans the application's memory in the background and finds those unused pieces of memory and deallocates them for you. This effectively means that you don't have to care for deallocation anymore. Imagining that a normal-size application of 100.000 lines of code approximately would need over 1.000 deallocations, this is a big productivity increase for the programmer.
RAII doesn't need this "background" process of scanning your memory. RAII is a technique that doesn't allow the programmer to create memory leaks. It solves the problem in it's root instead of curing it's symptoms. RAII is something that can only be used with object oriented programming (OOP). In OOP, every small part of your program is an "object" that will be created (allocated and constructed) and that will be freed (destructed and deallocated).
There are programming languages that let you create objects and forces you to free them accordingly. In these programming languages it would look similar to this piece of pseudo-code to display a Bitmap:
function ShowMyBitmap() {
Bitmap mybmp = Bitmap.Create("test.bmp");
mybmp.Show();
mybmp.Free;
}
As you can see, besides showing the bitmap the programmer needs two additional steps: Creating a bitmap
objects from the bitmap file and freeing the bitmap object after it has been shown to the user.
The last step is especially important to not forget because you won't directly notice if it is missing.
You will notice if you didn't create the bitmap because when you try to show it there will be none,
and you will notice when you miss to show the bitmap because, well, it won't be shown when you test the
application. But if you miss to free the bitmap, you don't notice anything obvious. There will be this
bitmap object in the application's memory, although it isn't needed anymore.
Garbage collected languages will make this last step obsolete: You will create the bitmap, show it to the user and then, on the next garbage collector run, it will be freed by your friendly GC. This obviously saves time for maintaining your application and fixing memory leaks. Here's the pseudo-code for a garbage-collected programming language:
function ShowMyBitmap() {
Bitmap mybmp = Bitmap.Create("test.bmp");
mybmp.Show();
}
There's a third kind of programming languages: The programming language that supports auto-objects and therefore RAII: In those programming languages, you can leave out the last step as well. But in contrast to the GC's technique, you don't need to scan the application's memory. There's another bonus here: You know the exact time when the object will be freed. This always happens when the object runs "out of scope". That means when you don't have any variable in your code that could reference the object. Here's an example:
function ShowMyBitmap() {
Bitmap mybmp("test.bmp");
mybmp.Show();
// Comment: mybmp.Free() will happen automatically when the function finishes.
}
This solution has the same effect as a GC with the mentioned two bonuses of knowing the time when the object
is freed and the savings of the background scanning.
You might think that the most important point is that raii happens "automatically" and doesn't cost you additional CPU cycles. Well, CPU cycles are cheap these days. New CPUs are coming out that are absolutely overkill for most applications. The most important point is a conceptual one. I will make it clear with another example. In this example, a class named "File" is used. Let's imagine that File will open a file when it is created and will close the file automatically when it is freed. Take these two functions: Function Example() is calling function ReadFile() to read some information on a file and then appends data to the file formerly read by ReadFile(). This example is for garbage collected languages:
function ReadFile() {
File ini1 = File.Create("Options.ini",openReadOnly);
ini1.ReadOptions();
}
function Example() {
ReadFile();
File ini2 = File.Create("Options.ini",openReadWrite);
ini2.Append("ScreenSize=1024x768");
}
Can you spot a potential problem with this example code? It is very error-prone because the file Options.ini is
opened in ReadFile. It will be closed when the object ini1 is freed. This definitely will happen, thanks to our
garbage collector. But when will it happen? This is of interest if you don't want your application to misbehave.
It is highly possible that ini1 will not be freed - and therefore the file will not be closed - before ini2 is
created and will try to write to the file. This won't succeed because the file is locked (on Windows operation
systems, at least) by ini1. Therefore either an exception will be thrown (and your program might crash) or the
file operations executed by ini2 will be ignored. Now both options aren't very nice: If this happens on for
example on program initialisation, it won't even start if an exception is thrown. If the operations are ignored,
you don't notice that the program is misbehaving, it might ruin your calculations which you might not even
recognize.
But of course, this can be fixed easily. You need to add a ini1.Close() after the ReadOptions() call. But if you have to do this, there emerges one question: Where is the gain? In a non-garbage collected, non-raii language you would use .Free instead of .Close, because this will free and close the file at the same time. So there's effectively zero gain in this situation. RAII solves this problem easily, though: ini1 will lose scope when ReadFile() finishes and therefore the file will automagically be closed before ini2 will try to access it.
This is a huge conceptual flaw in garbage collecting. It is limiting to one resource only - memory. But in fact, memory leaks will not always cause bugs in your software. Most often they will only make your software eat more memory and therefore run slow. Maybe it will crash if you consume more memory than you have in your system (this is especially important for server-software that may be running hundreds or thousands of days). But garbage collecting doesn't solve problems with "real" bugs. Those that are hard to reproduce, trace and find. RAII, on the other hand, helps the programmer to track the lifecycle of a object. RAII helps producing well defined behaviour for your programs and solves the memory leak issues.
The next paragraph is very interesting. Modern programming languages use "exceptions" to signal some error-condition in the program-flow. A simple example can be that a file is missing that has been tried to read. Or that some file could not be opened with the desired access (e.g. write-access when only read-access is possible at filesystem-level). Those common error-conditions "throw" exceptions upward the function call hierarchy. For example: Let's assume function Test() calls function ReadOptions() which in turn calls function ReadFile(). The function-call hierarchy looks like this: Test() calls: ReadOptions() calls: ReadFile() Now ReadFile() can throw an exception if the file can not be opened because it either does not exist or the user running the program has not enough permissions to open it. The exception is now being thrown up the function-call hierarchy. The advantage of exceptions is that either Test() or ReadOptions() can now handle the exception in a sane way:
function Test() {
try {
ReadOptions();
} on Exception do {
ShowMessage("Options could not be read because the file was inaccessible");
return;
}
DoSomething();
}
But there is another way that this erroneous erroneous can be handled: ReadOptions() can catch the exception
and set all options to sane default values. This could look like this:
function Test() {
ReadOptions();
DoSomething();
}
function ReadOptions() {
try {
ReadFile();
} on Exception do {
SetDefaults();
}
}
The day is saved! Even if the file is inaccessible, the application can run with default-values.
There is, however, a big problem here. An exception terminates the current function instantly. So there
might be parts of the function that will not be executed anymore, if an exception is caught.
Let's imagine the ReadFile() function looks like this:
function ReadFile() {
OptionReader or = OptionReader.Create;
File f = File.Create("Options.ini");
or.ReadFromFile(f);
f.Free; // calls .Close
or.Free;
}
As you can see, this example is for languages that don't have a GC but still enforce heap-allocations. The
part 'or.ReadFromFile(f)' may throw an exception. This means that the execution continues in the catching
block ("on Exception do") in the pseudocode example above. You're right: f.Free will never be executed.
In this kind of bad languages that enforce heap-allocations but don't even provide a GC, you have to do quite
much work to make this simple function errorproof. It would need to look like this:
function ReadFile() {
OptionReader or = OptionReader.Create;
try {
File f = File.Create("Options.ini");
or.ReadFromFile(f);
f.Free;
} on Exception do {
or.Free;
throw_on; // throws the caught exception again
}
or.Free;
}
throw_on is a construct that throws the exception that just has been caught. This is needed in this situation:
We need to catch the exception to free the OptionReader, but we don't want to handle it ourself. Instead, we want
the ReadOptions() function to handle the original exception, hence we throw it again.
In a garbage collected language, it will be much easier to accomplish resource-lack free code:
function ReadFile() {
OptionReader or = OptionReader.Create;
File f = File.Create("Options.ini");
or.ReadFromFile(f);
}
If an exception occurs, `or' and `f' will be finalized and deallocated. But let's remember the problem we were
facing before: We don't know the exact time when f will be closed. Therefore f might be still open when
ReadFile() has already finished and ReadOptions tries to write to the file (ok, a function called "ReadOptions"
should not write to any files, but let's assume it does):
function ReadOptions() {
ReadFile();
File f2 = File.Create("Options.ini",openReadWrite); // potential exception!
f2.Write("Bla=Foo");
}
function ReadFile() {
OptionReader or = OptionReader.Create;
File f = File.Create("Options.ini",openReadOnly);
or.ReadFromFile(f); // may throw exceptions!
f.Close(); // may not be called if exception has been thrown!
}
In case of Java, it actually depends on the VirtualMachine (vm) it is running in how and when finalization
may occur. So it might happen that the vm used in the development state might behave differently than on the
deployed machine. This will cause unknown bugs that can't be reproduced on the development machine and are
therefore hard to trace, debug and fix. Of course, it is not impossible to write this piece of code safe,
so it will behave correctly on any system and any vm. You had to write it like this:
function ReadOptions() {
try {
ReadFile();
} on Exception do {
SetDefaults();
}
File f2 = File.Create("Options.ini",openReadWrite);
f2.Write("Bla=Foo");
}
function ReadFile() {
OptionReader or = OptionReader.Create;
File f = File.Create("Options.ini",openReadOnly); // This may throw an exception, but it does't matter
// because the file will not be locked if it can't be opened.
try {
or.ReadFromFile(f);
} on Exception do {
f.Close();
throw_on;
}
f.Close();
}
This looks awefully similar to the solution of the non-garbage-collected language, doesn't it? Regardless
whether you are using a GC or not; to write really *resource*-safe code (and resources doesn't only mean
memory here!), you need to enforce the finalisation of an object, especially if it holds operating-system
resources like files and sockets (network connections).
Of course you need dynamically allocated memory in your application. RAII only helps you with auto-objects that can only life in a certain scope. If you need a dynamic amount of memory to store data in or need your object to live and die at a certain time, you need dynamic objects. Those dynamic objects need to be created and freed manually. There's a relatively easy fix for this, too: In good languages such as C++ it is possible to build user-defined types that behave just like pointers. You need pointers to hold the address of the dynamically created object. There are classes for C++ that make it possible that a object lives exactly as long as there is *any* pointer pointing to it. This is the ideal solution for dynamic memory handling. This will, of course, open similar problems like the GC: You might not have an overview of all pointers pointing to your object and therefore it might not be clear when the object will be freed. Both methods guarantee *that* it will be freed, though. The advantage of C++'s "shared" pointers is that it is easily trackable and debuggable. You need disciplined programmers or someone that enforces the use of shared pointers, because this is nothing that "automatically" or rather forcefully solves your problems.
With scripting languages, complete RAII is, unfortunately, not possible. One of the strenghts of RAII is that most of the memory consumed lies on the stack. Stack allocation is indefinitely faster than heap allocation, because memory on the stack doesn't need to be allocated. When the program is being executed by the operating-system, it gets a fixed chunk of array assigned as it's stack. Objects that reside on the stack (so-called "auto-objects", the type of objects that have RAII semantics) are therefore much faster allocated and don't suffer the slow sleep of heap-allocation (malloc/new). Scripting languages can simulate this behaviour easily by allocating an initial memory-pool (much like the OS allocates the stack for a program) and use this for auto-objects. I have not seen a scripting language that differentiates between heap and stack allocation because most of them have their own, intelligent allocation mechanisms. That said, I haven't seen scripting language supporting RAII-semantics either.