code complete redux + RAII

As I looked back over past posts, I realized that I’ve never (at least to my memory) written an article explaining RAII.  I’ve done it in person quite a few times, to the various classes that I have TAed or graded, but I haven’t written it down.  So, here’s an attempt at it.

A buddy of mine, from RIT, mentioned that he showed his friend, a “true C#/XAML/Code Complete believer”, the article about Steve McConnell making us dumb.  As you would expect, his buddy, we’ll call him Fred (I don’t know his real name, I just hate using ambiguous pronouns), says that I missed the point, that you need to run the cleanup method, because you don’t know what it does.

There is a problem we are trying to solve in the general case by using RAII.  There are a few other solutions to the same problem, namely Java’s (and other languages’) try/catch/finally paradigm and C#’s using(…) {} paradigm.  This problem is the problem of cleaning up resources in a syntactically simple way.  This problem itself is really either a subproblem or related to the pattern of running code unconditionally before or after your “real” code is run, which is itself a subproblem of writing imperative code out-of-order, i.e. writing control statements and structures that aren’t strictly iterative.

Anywhoo, the problem, in this case, is a really common one in imperative languages, especially those in which you manage your own memory and resources.  The most common examples of this are memory allocation and file handling.

With memory allocation, in C, for example, we malloc some memory at the start of the block.  Now, being that we are good little coders, we need to free the memory before we are finished using it (provided that no-one else is using it).  This is simple enough when our code path’s are obvious:


void* mem = malloc(someSize);
statementA;
statementB;
statementC;
free(mem);

But this can get rather complicated when your statements are not so simple.  What if, for example, statementB has a few conditions where it wants to return early?  What if you are using a language like C++, and statementB might throw an exception?

Now, finally deals with the issue pretty directly.  If I have a try/catch/finally block, I’m guaranteed that the code in the finally block will be run no matter how I exit the try block.  This means that I can put my clean-up code in the finally block and forget about it.  Right?

Well, not quite.  If you write enough C code, you realize that even that gets a bit complicated, since you need to know, often times, at what point your code failed.  If you have three resources, A, B, and C, and only A and B were allocated, then you certainly don’t want to try and do something about C, or your cleanup code needs to have some way of knowing how to handle an uninitialized C.

In this sense, Java is really no different than C.  In effect, Java’s try/catch/finally is semantically equivalent (in the semantic-sugar sense) of just replacing each possible exit point with a goto to the “finally” label, where all the clean-up code is.  You still need, in certain circumstances, to take care of how far you made it through initialization.

On the other hand, C++ solves the problem in a different, more general, and much more powerful way.  Deterministic destruction solves a whole slew of problems, of which this is one of the more prominent ones.  In C++, we have RAII – Resource Acquisition is Initialization.

Think about a vector in C++ – you don’t worry about calling some free method on the vector, since it just takes care of itself.  Files, too – you don’t really need to explicitly close an ofstream, since the destructor will do this for you automatically when the object is destructed (actually, I’m pretty sure that this is the way ofstream is implemented, but don’t take my word for it :) ).

Stroustrup gives an example for a resource handle for a FILE*, on his page here (note that the example is explaining why C++ doesn’t need try/catch/finally, so it is pertinent in more than one way).  Things to note are how the constructor grabs the resource, the destructor frees the resource, and the operator FILE* which means you can treat the handle just like a FILE*.  In other words, you change this:


{
   FILE* fp = howeverYouGetAFilePointer();
   [...]
   close(fp);
}

To this:

{
   File_handle fp(howeverYouGetAFilePointer();
   [...]
   // No need to close, since the destructor does it for you!
}

To put it in context of my previous article, say you want to do this for a “purge file list”.  Also, say you can’t modify FileList, or say that FileList is really just a typedef for FILE** (a pointer to an array of FILE handles).  Instead of doing the goto junk, you instead write a handle class, like Stroustrup’s File_handle:


class PurgeFileListHandle
{
   private:
      FileList _fileList;
      int _numfiles;
   public:
      PurgeFileListHandle() { MakePurgeFileList(_fileList, _numfiles); }
      ~PurgeFileListHandle() { DeletePurgeFileList(_fileList, _numfiles); }
      operator FileList&() { return _fileList; }
      int getNumFiles() { return _numfiles; }
};

Now, making no other changes to other functions (i.e. MakePurgeFileList will still modify both of its arguments, meaning that the last argument is passed as a reference and the first as either a reference or a pointer, another no-no, especially since the function doesn’t use, you know, the return value to return anything), plus some syntactic changes, what was originally:


void PurgeFiles( ErrorCode & errorCode ) {
   FileList fileList;
   int numFilesToPurge = 0;
   MakePurgeFileList( fileList, numFilesToPurge );
   errorCode = FileError_Success;
   int fileIndex = 0;
   while ( fileIndex < numFilesToPurge ) {
      DataFile fileToPurge;
      if ( !FindFile( fileList[ fileIndex ], fileToPurge ) ) {
         errorCode = FileError_NotFound;
         goto END_PROC;
      }
      if ( !OpenFile( fileToPurge ) ) {
         errorCode = FileError_NotOpen;
         goto END_PROC;
      }
      if ( !OverwriteFile( fileToPurge ) ) {
         errorCode = FileError_CantOverwrite;
         goto END_PROC;
      }
      if ( !Erase( fileToPurge ) ) {
         errorCode = FileError_CantErase;
         goto END_PROC;
      }
      fileIndex++;
   }
   END_PROC:
   DeletePurgeFileList( fileList, numFilesToPurge);
}

…is now:


void PurgeFiles( ErrorCode & errorCode ) {
   PurgeFileListHandle fileList;
   errorCode = FileError_Success;
   int fileIndex = 0;
   for(int fileIndex = 0; fileIndex < fileList.getNumFiles(); fileIndex++) {
      DataFile fileToPurge;
      if ( !FindFile( fileList[ fileIndex ], fileToPurge ) ) {
         errorCode = FileError_NotFound;
         return;
      }
      if ( !OpenFile( fileToPurge ) ) {
         errorCode = FileError_NotOpen;
         return;
      }
      if ( !OverwriteFile( fileToPurge ) ) {
         errorCode = FileError_CantOverwrite;
         return;
      }
      if ( !Erase( fileToPurge ) ) {
         errorCode = FileError_CantErase;
         return;
      }
   }
}

Now, if we could, we would also make the function return the ErrorCode, since setting it as a reference parameter is oh-so-dumb. Regardless, what we have now is still an ugly function, but at least we don’t have to worry about exit points. Since this is C++, if one of the functions underneath us is changed to throw an exception, our FileList will still get cleaned up, provided that it was constructed. Examples like this:


{
   Res A = getSomeResource();
   if(!A)
   { free(A); return; }

   Res B = getSomeResource();
   if(!B)
      { free(B); free(A); return; }

   //and so on
}

…which, as you’ll notice, grows quadratically (n free blocks, and each block contains frees for the last m items, so it is O(n2)), is replaced in C++ by doing:


{
   Res A = getSomeResource();
   Res B = getSomeResource();
   //and so on
}

Note that we no longer have the bloat, and we no longer have error detection code in the middle of our function. The function reads in terms of what it functionally accomplishes, not in terms of the exact imperative code that is getting run.

As I said to my friend, its not that I love C++, or that I have anything against people who dislike or even hate C++.  I just have a problem with people who hate it for the wrong reasons.  It would be like saying, “God, I hate Java because it doesn’t have for loops.”  Say what?

So remember, kids: bad code is bad code in any language.  Especially if you are Steve McConnell.

P.S. – C# has a mechanism for doing the same type of thing.  If you make your class inherit from IDisposable, then you can write:


using(MyType object = new MyType(someResourceThing))
{
   ...
}  // object's Dispose() method is called here

Which means that you can write generally equivalent stuff in C#. You can also do something like:


   using(MyType object = new MyType(someResource))
   using(OtherType object2 = new OtherType(anotherResource))
   using(YetAnotherType object3 = new YetAnotherType(lastResource))
   {
      ...
   }  // all Dispose()ed here

… which works for the same reasons you can chain fors or ifs – if there is no block, then the next statement is considered to apply to the control structure (in this case, a using declaration followed by a block is considered to be a statement).

  • Corry
    Another great article. And I believe that the IDisposable pattern of C# was integral to the language from the very beginning, including the using(){} construct.
  • Corry:

    Looks like you are absolutely correct (my C# history is rather crappy, as you can imagine :) ). Peter Hallam wrote an article about it here, which seems to say that using has always been in the language (the article describes how, for a short period of time, it would accept types that didn't implement IDisposable, and only needed to have a Dispose method). Sorry about that :)

blog comments powered by Disqus