root/trunk/docsrc/garbage.dd

Revision 2153, 12.3 kB (checked in by walter, 2 years ago)

bugzilla 3112 Specification on what operations call the GC is missing

  • Property svn:eol-style set to native
Line 
1 Ddoc
2
3 $(SPEC_S Garbage Collection,
4
5     $(P D is a fully garbage collected language. That means that it is never
6     necessary
7     to free memory. Just allocate as needed, and the garbage collector will
8     periodically return all unused memory to the pool of available memory.
9     )
10
11     $(P C and C++ programmers accustomed to explicitly managing memory
12     allocation and
13     deallocation will likely be skeptical of the benefits and efficacy of
14     garbage collection. Experience both with new projects written with
15     garbage collection in mind, and converting existing projects to garbage
16     collection shows that:
17     )
18
19     $(UL
20
21     $(LI Garbage collected programs are faster. This is counterintuitive,
22     but the reasons are:
23
24     $(UL
25         $(LI Reference counting is a common solution to solve explicit
26         memory allocation problems. The code to implement the increment and
27         decrement operations whenever assignments are made is one source
28         of slowdown. Hiding it behind smart pointer classes doesn't help
29         the speed. (Reference counting methods are not a general solution
30         anyway, as circular references never get deleted.)
31         )
32
33         $(LI Destructors are used to deallocate resources acquired by an object.
34         For most classes, this resource is allocated memory.
35         With garbage collection, most destructors then become empty and
36         can be discarded entirely.
37         )
38
39         $(LI All those destructors freeing memory can become significant when
40         objects are allocated on the stack. For each one, some mechanism must
41         be established so that if an exception happens, the destructors all
42         get called in each frame to release any memory they hold. If the
43         destructors become irrelevant, then there's no need to set up special
44         stack frames to handle exceptions, and the code runs faster.
45         )
46
47         $(LI All the code necessary to manage memory can add up to quite a
48         bit. The larger a program is, the less in the cache it is,
49         the more paging it does, and the slower
50         it runs.
51         )
52
53         $(LI Garbage collection kicks in only when memory gets tight. When
54         memory is not tight, the program runs at full speed and does not
55         spend any time freeing memory.
56         )
57
58         $(LI Modern garbage collectors are far more advanced now than the
59         older, slower ones. Generational, copying collectors eliminate much
60         of the inefficiency of early mark and sweep algorithms.
61         )
62
63         $(LI Modern garbage collectors do heap compaction. Heap compaction
64         tends to reduce the number of pages actively referenced by a program,
65         which means that memory accesses are more likely to be cache hits
66         and less swapping.
67         )
68
69         $(LI Garbage collected programs do not suffer from gradual deterioration
70         due to an accumulation of memory leaks.
71         )
72     )
73     )
74
75     $(LI Garbage collectors reclaim unused memory, therefore they do not suffer
76     from "memory leaks" which can cause long running applications to gradually
77     consume more and more memory until they bring down the system. GC programs
78     have longer term stability.
79     )
80
81     $(LI Garbage collected programs have fewer hard-to-find pointer bugs. This
82     is because there are no dangling references to freed memory. There is no
83     code to explicitly manage memory, hence no bugs in such code.
84     )
85
86     $(LI Garbage collected programs are faster to develop and debug, because
87     there's no need for developing, debugging, testing, or maintaining the
88     explicit deallocation code.
89     )
90
91     $(LI Garbage collected programs can be significantly smaller, because
92     there is no code to manage deallocation, and there is no need for exception
93     handlers to deallocate memory.
94     )
95     )
96
97     $(P Garbage collection is not a panacea. There are some downsides:
98     )
99
100     $(UL
101
102     $(LI It is not predictable when a collection gets run, so the program
103     can arbitrarily pause.
104     )
105
106     $(LI The time it takes for a collection to run is not bounded. While in
107     practice it is very quick, this cannot be guaranteed.
108     )
109
110     $(LI All threads other than the collector thread must be halted while
111     the collection is in progress.
112     )
113
114     $(LI Garbage collectors can keep around some memory that an explicit
115     deallocator
116     would not. In practice, this is not much of an issue since explicit
117     deallocators usually have memory leaks causing them to eventually use
118     far more memory, and because explicit deallocators do not normally
119     return deallocated memory to the operating system anyway, instead just
120     returning it to its own internal pool.
121     )
122
123     $(LI Garbage collection should be implemented as a basic operating
124     system
125     kernel service. But since they are not, garbage collecting programs must
126     carry around with them the garbage collection implementation. While this
127     can be a shared DLL, it is still there.
128     )
129     )
130
131     $(P These constraints are addressed by techniques outlined
132     in <a href="memory.html">Memory Management</a>.
133     )
134
135 <h2>How Garbage Collection Works</h2>
136
137     $(P The GC works by:)
138
139     $(OL
140     $(LI Looking for all the pointer $(SINGLEQUOTE roots) into GC allocated memory.)
141
142     $(LI Recursively scanning all allocated memory pointed to by
143     roots looking for more pointers into GC allocated memory.)
144
145     $(LI Freeing all GC allocated memory that has no active pointers
146     to it.)
147
148     $(LI Possibly compacting the remaining used memory by copying the
149     allocated objects (called a copying collector).)
150     )
151
152 <h2>Interfacing Garbage Collected Objects With Foreign Code</h2>
153
154     $(P The garbage collector looks for roots in:)
155     $(OL
156     $(LI its static data segment)
157     $(LI the stacks and register contents of each thread)
158     $(LI any roots added by std.gc.addRoot() or std.gc.addRange())
159     )
160
161     $(P If the only root of an object
162     is held outside of this, then the collecter will miss it and free the
163     memory.
164     )
165
166     $(P To avoid this from happening,)
167
168     $(UL
169     $(LI Maintain a root to the object in an area the collector does scan
170     for roots.)
171
172     $(LI Add a root to the object using std.gc.addRoot() or std.gc.addRange().)
173
174     $(LI Reallocate and copy the object using the foreign code's storage
175     allocator
176     or using the C runtime library's malloc/free.
177     )
178     )
179
180 <h2>Pointers and the Garbage Collector</h2>
181
182     $(P Pointers in D can be broadly divided into two categories: those that
183     point to garbage collected memory, and those that do not. Examples
184     of the latter are pointers created by calls to C's malloc(), pointers
185     received from C library routines, pointers to static data,
186     pointers to objects on the stack, etc. For those pointers, anything
187     that is legal in C can be done with them.
188     )
189
190     $(P For garbage collected pointers and references, however, there are
191     some
192     restrictions. These restrictions are minor, but they are intended
193     to enable the maximum flexibility in garbage collector design.
194     )
195
196     $(P Undefined behavior:)
197
198     $(UL
199
200     $(LI Do not xor pointers with other values, like the
201     xor pointer linked list trick used in C.
202     )
203
204     $(LI Do not use the xor trick to swap two pointer values.
205     )
206
207     $(LI Do not store pointers into non-pointer variables using casts and
208     other tricks.
209
210 ------
211 void* p;
212 ...
213 int x = cast(int)p;   // error: undefined behavior
214 ------
215
216     The garbage collector does not scan non-pointer types for roots.
217     )
218
219     $(LI Do not take advantage of alignment of pointers to store bit flags
220     in the low order bits:
221
222 ------
223 p = cast(void*)(cast(int)p | 1);  // error: undefined behavior
224 ------
225     )
226
227     $(LI Do not store into pointers values that may point into the
228     garbage collected heap:
229
230 ------
231 p = cast(void*)12345678;   // error: undefined behavior
232 ------
233
234     A copying garbage collector may change this value.
235     )
236
237     $(LI Do not store magic values into pointers, other than $(TT null).
238     )
239
240     $(LI Do not write pointer values out to disk and read them back in
241     again.
242     )
243
244     $(LI Do not use pointer values to compute a hash function. A copying
245     garbage collector can arbitrarily move objects around in memory,
246     thus invalidating
247     the computed hash value.
248     )
249
250     $(LI Do not depend on the ordering of pointers:
251
252 ------
253 if (p1 < p2)        // error: undefined behavior
254     ...
255 ------
256     since, again, the garbage collector can move objects around in
257     memory.
258     )
259
260     $(LI Do not add or subtract an offset to a pointer such that the result
261     points outside of the bounds of the garbage collected object originally
262     allocated.
263
264 ------
265 char* p = new char[10];
266 char* q = p + 6;    // ok
267 q = p + 11;     // error: undefined behavior
268 q = p - 1;      // error: undefined behavior
269 ------
270     )
271
272     $(LI Do not misalign pointers if those pointers may
273     point into the gc heap, such as:
274
275 ------
276 align (1) struct Foo
277 {   byte b;
278     char* p;    // misaligned pointer
279 }
280 ------
281
282     Misaligned pointers may be used if the underlying hardware
283     supports them $(B and) the pointer is never used to point
284     into the gc heap.
285     )
286
287     $(LI Do not use byte-by-byte memory copies to copy pointer values.
288     This may result in intermediate conditions where there is
289     not a valid pointer, and if the gc pauses the thread in such a
290     condition, it can corrupt memory.
291     Most implementations of $(TT memcpy()) will work since the
292     internal implementation of it does the copy in aligned chunks
293     greater than or equal to a pointer size, but since this kind of
294     implementation is not guaranteed by the C standard, use
295     $(TT memcpy()) only with extreme caution.
296     )
297
298     $(LI Do not have pointers in a struct instance that point back
299     to the same instance. The trouble with this is if the instance
300     gets moved in memory, the pointer will point back to where it
301     came from, with likely disastrous results.
302     )
303
304     )
305
306     $(P Things that are reliable and can be done:)
307
308     $(UL
309
310     $(LI Use a union to share storage with a pointer:
311
312 ------
313 union U { void* ptr; int value }
314 ------
315     )
316
317     $(LI A pointer to the start of a garbage collected object need not
318     be maintained if a pointer to the interior of the object exists.
319
320 ------
321 char[] p = new char[10];
322 char[] q = p[3..6];
323 // q is enough to hold on to the object, don't need to keep
324 // p as well.
325 ------
326     )
327     )
328
329     $(P One can avoid using pointers anyway for most tasks. D provides
330     features
331     rendering most explicit pointer uses obsolete, such as reference
332     objects,
333     dynamic arrays, and garbage collection. Pointers
334     are provided in order to interface successfully with C APIs and for
335     some low level work.
336     )
337
338 <h2>Working with the Garbage Collector</h2>
339
340     $(P Garbage collection doesn't solve every memory deallocation problem.
341     For
342     example, if a root to a large data structure is kept, the garbage
343     collector cannot reclaim it, even if it is never referred to again. To
344     eliminate this problem, it is good practice to set a reference or
345     pointer to an object to null when no longer needed.
346     )
347
348     $(P This advice applies only to static references or references embedded
349     inside other objects. There is not much point for such stored on the
350     stack to be nulled, since the collector doesn't scan for roots past the
351     top of the stack, and because new stack frames are initialized anyway.
352     )
353
354 <h2>Object Pinning and a Moving Garbage Collector</h2>
355
356     $(P Although D does not currently use a moving garbage collector, by following
357     the rules listed above one can be implemented. No special action is required
358     to pin objects. A moving collector will only move objects for which there
359     are no ambiguous references, and for which it can update those references.
360     All other objects will be automatically pinned.
361     )
362
363 <h2>D Operations That Involve the Garbage Collector</h2>
364
365     $(P Some sections of code may need to avoid using the garbage collector.
366     The following constructs may allocate memory using the garbage collector:
367     )
368
369     $(UL
370     $(LI $(GLINK2 expression, NewExpression))
371     $(LI Array appending)
372     $(LI Array concatenation)
373     $(LI Array literals (except when used to initialize static data))
374     $(LI Associative array literals)
375     $(LI Any insertion, removal, or lookups in an associative array)
376     $(LI Extracting keys or values from an associative array)
377     $(V2
378     $(LI Taking the address of (i.e. making a delegate) a nested function that
379      accesses variables in an outer scope)
380     $(LI A function literal that access variables in an outer scope)
381     )
382     $(LI An $(GLINK2 expression, AssertExpression) that fails its condition)
383     )
384
385 <h2>References</h2>
386
387     $(UL
388     $(LI $(LINK2 http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29, Wikipedia))
389     $(LI $(LINK2 http://www.iecc.com/gclist/GC-faq.html, GC FAQ))
390     $(LI $(LINK2 ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps, Uniprocessor Garbage Collector Techniques))
391     $(LI $(LINK2 http://www.amazon.com/exec/obidos/ASIN/0471941484/classicempire,
392     Garbage Collection : Algorithms for Automatic Dynamic Memory Management))
393     )
394
395 )
396
397 Macros:
398     TITLE=Garbage Collection
399     WIKI=Garbage
Note: See TracBrowser for help on using the browser.