Advantages and Disadvantages of Conservative Garbage Collection

Advantages and Disadvantages of Conservative Garbage Collection

Here are some issues to be considered in deciding whether or not to use conservative GC with C or C++. Some of these no doubt reflect personal biases of the author. Some of the observations are based on Detlefs, Dosser and Zorn's Memory Allocation Costs in Large C and C++ Programs. We refer specifically to our garbage collector, which can be retrieved from gc.tar.Z in the gc_source directory.

A number of other interesting issues arise when comparing this style of conservative garbage collection, or malloc/free memory management with more traditional garbage collection techniques. They are not discussed here.

Garbage collection is not an all-or-nothing proposition. It is possible to explicitly deallocate some common kinds of objects, and thus correspondingly decrease the frequency of garbage collections and garbage collection cost. It is possible to accommodate libraries that expect explicit deallocation calls. It is also possible to use the collector only as a debugging tool to locate leaks and remove it completely from the end product. But for the sake of clarity, we'll look at the tradeoffs between exclusive use of explicit memory management (malloc/free or new/delete) and exclusive use of garbage collecting allocation.


Development effort
Garbage collected applications often require significantly less development time than their counterparts with explicit memory management. It is hard to measure this, but commonly proposed estimates are that 30% or 40% of development time devoted to storage management for programs that manipulate complex linked data structures. (See, for example, Rovner, ``On Adding Garbage Collection and Runtime Types to a Strongly-Typed, Statically Checked, Concurrent Language'', PARC CSL-84-7.) C++ constructors and destructors can help with the simple bookkeeping parts of the problem, but don't help much with shared pieces of data structures.
Well-written interface descriptions for C and C++ often invest some effort in assigning deallocation responsibility and providing the needed extra functionality; less well-designed interfaces introduce ambiguities that lead to bugs. Does a container object deallocate its elements when it is destroyed? May a function retain pointers to its arguments for caching purposes? In the best case, the result is significant added complexity.
Multiple threads or exception handling further add to the complexity of manual deallocation. For example, various C++ newsgroups and mailing lists have carried extensive discussions about the proper way to deallocate partially constructed C++ objects in response to an exception. Such issues disappear in the presence of a garbage collector.
Note that the additional development effort required for manual memory management could be used for a substantial amount of performance tuning of a different kind Thus one way to phrase the issue is whether explicit memory management is the best place, or even a reasonable place, to invest that effort.

Reusability
To cope with manual storage management, programmers end up restricting and specializing their designs for the task at hand. For example, an interface might require that the only pointer to an object be passed, so that the called function can deallocate the object. Or it might insist that the called function not reference an argument on a subsequent call once it returns, so that the caller can deallocate the object. The first one would be expensive or inappropriate for some potential users. The second one would prohibit caching in a reimplementation of the interface. In either case, we end up with less reusable code, and fewer reusable interfaces.

Functionality
In general, it is considerably easier to implement and use sophisticated data structures in the presence of a garbage collector. As a result, it is easier to avoid imposing arbitrary limits on program parameters (e.g. string lengths), and to ensure that performance scales reasonably for large inputs. The cord string package included with our garbage collector is an example of a useful abstraction that could not be provided without some form of garbage collection, and would be more expensive with user-provided reference counting, or the like. A program based on something like the cord package has some chance of continuing to run reasonably if it receives a 10 megabyte input string where the programmer expected a line of text. This is much less likely with a string package that uses a conventional representation.

Safety
Conservative garbage collectors require mild restrictions on both the source code and the compiler. The source program must not ``hide'' pointers in such a way that they are invisible to the collector. There are very few ways in which a strictly conforming ANSI C program could hide pointers. Writing and retrieving them from a file is one way. For some collector configurations, using memcpy to copy pointers to unaligned locations is another. Casts from pointers to integers and back to pointers can cause problems. The most common problem is probably to refer to an object by pointing to a known offset before the beginning of the object. The last one is clearly not legal C code. All are uncommon even in existing code. It would be easy to build a tool that identifies possibly dangerous code, except that it is somewhat expensive to detect violations of the ANSI requirements on pointer arithmetic.
Similarly, C compilers may not hide pointers in the generated object code. In our experience, standard commercial compilers obey this restriction in unoptimized code. Most aggressive optimizing compilers do not obey this restriction for all optimized code. For details and examples see papers/pldi96.ps.gz. However, it is difficult to construct examples for which they violate it, especially for single-threaded code. In our experience, the only examples we have found of a failure with the current collector, even in multi-threaded code, were contrived. (The most serious one was the garbage collector test program compiled on an ARM with the vendor compiler.) In contrast, we have found many genuine optimizer bugs in various compilers over the same time period. The structure of some compilers (e.g. gcc) should make it easy to guarantee GC-safety in the future.

Debugging
If a module A explicitly and prematurely deallocates an object P and then continues to write to it after the object is reallocated by module B, module B is likely to fail, in spite of the fact that the fault is in module A. By the time the fault becomes noticeable, there may be no trace of A's actions. The programmer responsible for module B is likely to be held responsible for the problem, but has no reasonable way of attacking it.
Memory access checkers, if available, will often detect such faults, but only if module A also writes to object P before it is reallocated.
This often makes it difficult to debug large multi-person projects using explicit deallocation. A garbage collector can eliminate premature deallocation errors. (It cannot eliminate pointer arithmetic and subscripting errors, which may induce similar problems. But it does facilitate higher level abstraction which confine pointer arithmetic primarily to a few low level modules.)
Similarly, memory leaks induced by failing to deallocate unreferenced memory may be hard to trace and eliminate. Garbage collectors automatically eliminate such leaks.
In either case, memory overwrite errors (e.g. indexing errors) may invalidate the allocators data structures, causing mysterious faults in the allocator. This is arguably somewhat worse in the garbage collected case, since the data structures tend to be more complex. Some limited debugging tools are provided to address this problem; more would be desirable. Availability of allocator source code is sometimes an advantage.

Space Performance
The presence of garbage collection typically allows the user to allocate fewer, and smaller, data structures. Putting this aside temporarily, a leak-free program written for malloc/free or new/delete is likely to use more space when these are replaced by GC_malloc/noop, especially if space is measured in total allocated address space. See the above cited paper by Detlefs, Dosser and Zorn for details. The primary reason for this is that the collector needs to maintain a certain amount of empty space in the heap, so that it does not have to garbage collect after every allocation.
On the other hand, programs written for explicit memory management often incur space overheads in order to preserve properties needed for deallocation. Such overhead might include reference count fields, extra copying of structures to guarantee that an object has a single owner, fragmentation introduced by maintaining separate memory pools for different modules, or cyclic garbage leaked by reference counting. Such overhead is avoided by programs that directly use garbage collected allocation. How this compares to GC-induced space overhead is very application-dependent.
There are some other factors that sometimes increase space used by our collector. The one that appear to have been most significant in Detlefs, Dosser and Zorn's Memory Allocation Costs in Large C and C++ Programs is that it is not well tuned for small heap sizes, and has a relatively large, nearly fixed, space overhead. Others are:
(1) The collector expands the heap in relatively large chunks, parts of which might never be used, and thus may never need to be in physical memory.
(2) The collector sometimes avoids using certain pages it allocates because it detects integers that appear to point to the page. This also results in allocated virtual memory that is never physically resident.
(3) Memory might appear to be referenced by integers and the like.
(4) Memory might still be referenced through an uncleared pointer even though it was deallocated in the original program.
Based on limited measurements by the author and Zhong Shao, the last two appear insignificant in practice for most programs, though (4) may occasionally be a factor.
Neither our garbage collector nor malloc/free implementations provide generally useful guaranteed space usage bounds, since worst-case fragmentation overhead is generally unacceptable. (Note that typical fragmentation overhead is quite acceptable for both.)

CPU-time performance
The overall processor time used by the collector for small objects allocation and collection is comparable to a good malloc/free implementations. In a multi-threaded environment, it may be substantially less, since there are no calls that acquire lock acquisitions; thus the number of lock acquisitions may be effectively halved. (The same may apply on platforms on which the malloc/free implementation is not particularly time-efficient, e.g. some Sun and PC environments. Note that such malloc/free implementation may be either poorly designed or tuned for space.) However, the time required for larger objects is effectively proportional to the object size, both since the collector typically initializes it, and because larger allocations increase GC frequency correspondingly. This is unlikely to drastically affect program running time; after all, the client will presumably also take time at least proportional to object size to fill in the object. But it does put the collector at a disadvantage for programs that allocate primarily larger objects.
For multi-threaded programs, there are cases in which a garbage collected program requires drastically less locking than the equivalent program with explicit memory management. Here is an example.
Unlike reference counting, conservative garbage collection does not affect the performance of pointer assignments and the like.

Latency
Neither the collector nor most malloc/free implementations provide hard guarantees on the amount of time it takes an allocation to complete. In most cases, maximum observed latencies will be appreciably shorter with a malloc/free implementation than with our collector. (This is not necessarily true for GC implementations that use more compiler support.) Our collector provides the option of incremental collection on many platforms with sufficient virtual memory support. This is normally sufficient to provide good interactive response even with very large heaps. Latencies are normally comparable to those introduced by the VM system.
Somewhat better latencies can be achieved by triggering collections during idle time, and aborting them in response to user input. This requires enough idle time for a collection to complete before excessive heap expansion. This facility does not rely on VM support.

Paging locality
A common concern about garbage collection, or any form of dynamic memory allocation, is its interaction with a virtual memory system. Accesses to virtual memory should be such that the traffic between disk and memory is small, i.e. most access should be to pages that were already recently accessed.
On modern computers, where disks are so much slower than CPUs, many programs page very little, and most of their heaps reside in the working set. But even for those programs in which significant parts of the heap do not reside in the working set, there are a number of techniques which dramatically increase the locality of reference of a mark-and-sweep collector.
The fundamental problem is that all memory that may possibly contain pointers has to be examined during every full collection. We use several techniques to lessen the impact of this. Often a large fraction of a programs heap can be allocated as pointer-free memory using GC_malloc_atomic. Such memory is not examined during collections. (Mark bits are separated from data pages to ensure this.) The collector does not have a separate sweep phase during which the heap is touched. Generational garbage collectors, including ours in incremental mode, examine primarily recently modified memory during most collections. Such memory is nearly certain to be resident in physical memory. It is possible to mark the heap in sections, so that the collector mark phase itself exhibits better locality. (This was implemented with some success in older versions of Xerox PCR by Carl Hauser. It was not included in more modern versions of the collector since the other measures appeared to suffice.) Nearly all of the page faults that do occur during a full collection will occur during a collection phase in which the client can still make some progress, thus making paging somewhat less visible to the user. For some details and a few measurements see the paper "Mostly Parallel Garbage Collection". (The paper "The measured Cost of Conservative Garbage Collection" by B. Zorn, Software - Practice and Experience, July 1993, includes some measurements of paging behavior. Unfortunately they are of necessity based on C programs written for malloc/free, and a very old and naive version (1.6) of our collector. Unlike later versions, this collector was also found to yield unacceptable paging performance at PARC.)
Unlike many malloc/free implementations, our collector will normally allocate available objects from the same page consecutively. This is likely to ensure that successively allocated similar objects reside on the same page. Such objects are likely to be linked to each other, and examined at similar times, both by the collector and by its client. As a result, the client may exhibit better reference locality than a malloc/free client.

Portability
Our collector is not, and cannot, be implemented as completely portable C code. However, it has been ported to all common workstation and PC platforms, excluding PCs with segmented addressing. It has been ported to several flavors of Microsoft's win32 environment.


Hans-J. Boehm
(boehm@acm.org)
These are the author's personal opinions.

I would like to thank John Ellis for his comments on earlier drafts of this page.