Saturday, April 26, 2008

code without free() is code without freedom

Programmers are lazy. If we weren't, we'd choose to do the repetitive task by hand rather than automating it. But programming takes discipline to do correctly. Garbage collection is a wonderful invention that can vastly reduce bugs and increase productivity. But combined with that laziness, it's a dangerous thing.

Memory allocation is intimately connected with another topic: ownership and lifetime. Being freed from releasing memory, programmers create references willy-nilly, assuming they'll be released "when they are no longer needed". What is missing is the realization that "no longer needed" is a semantic decision that no environment is in a position to make. The only deterministic decision to be made is that a particular chunk of memory is no longer referenced. By ignoring memory management issues, programmers create huge webs of references that lure unsuspecting pointers into their clutches, never to be released. Java programmers have a wonderful euphemism for this: "unintentional object retention". The rest of the world calls this a "memory leak".

More insidious than simple object retention, is the API decisions. Or, more precisely, lack thereof. Ownership issues are ignored in both interface and documentation. So even if a conscientious coder comes along and attempts to properly manage lifetime, they find themselves without sufficient information and ability to do so. Before long, they too become ensnared. Faced with this problem, programmers have three paths they can take: ignore it and write unmanagable code with no rhyme or reason to its memory usage, rewrite the offending code with clear ownership and lifetime semantics, or move to New Zealand and raise sheep. Of the three, New Zealand is the most appealing solution.

Similarly, memory allocation is a potentially expensive operation. Hiding it from the programmer eliminates awareness of an important behavior in any program and adds a dangerous amount of nondeterminism to memory usage and performance. Frankly, these are often somewhat less important than the poor design decisions. With a little bit of extra work, implementations can be fixed to avoid allocations. Often repairing faulty designs is much more difficult. Different languages provide different levels of support for this. Java provides more visibility into allocations (typically) than, say, Python. Conversely (and somewhat unexpectedly) Python provides significantly more support for lifetime management, particularly with 2.6. The 'with' keyword provides RAII-like semantics and weakrefs provide support for releasing related or non-memory resources when the owning class goes away. In contrast, Java provides only finalize() methods (which are fundamentally broken). It's weak references provide no notification when the referrant is released.

Remember those three decisions I mentioned above? Want to write reliable, deterministic code? Yes? Want to rewrite Java? No? That just leaves sheep.