root/trunk/docsrc/const.dd

Revision 2040, 13.2 kB (checked in by walter, 1 year ago)

typography

  • Property svn:eol-style set to native
Line 
1 Ddoc
2
3 $(D_S Here A Const$(COMMA) There A Const,
4
5 $(COMMENT
6
7 $(P $(I by Walter Bright))
8
9
10 $(P
11 In a small, experimental program, it's great to benefit from a programming
12 system that's flexible, permissive, and not too pedantic.
13 As the complexity of a program increases, it gets more beneficial
14 to specify the semantics of a declaration in the code itself.
15 Programmers want to carve subdomains in the large application and confine
16 specific state changes to small sections of code. Doing so rids them of
17 long-distance coupling among portions of code that modify the same data.
18 Documentation is unreliable as it is inevitably wrong, misleading,
19 incomplete, out of date, or just plain missing.
20 Of significant utility in this is the notion of constness.
21 C and C++ have added the ability to specify the constness of variables
22 and functions, and it has clearly demonstrated over time that it is
23 popular and useful, and many consider it crucial for developing
24 large programs.
25 In an attempt to simplify, Java dropped const. Its handling of
26 immutable strings and the often-used technique of preemptive copy-out are
27 awkward at best. As a consequence, putting const back into the language has
28 become a favorite indoor sport for industry and academia alike.
29 But C++'s const has a number of important shortcomings, so D took the
30 opportunity to reengineer the concept from top to bottom.
31 This article explores what constness is good for, how C++ constness
32 addresses it, and how D addresses it.
33 )
34
35
36 <h2>What Do We Want From Const?</h2>
37
38 $(P
39 There are a number of benefits that can be derived from knowing something
40 is constant, including benefits to optimization and code generation:
41 )
42
43 $(OL
44 $(LI    Constant data need never be copied! It can be infinitely shared (e.g.
45     via pointers and references) as there is never contention on it. This
46     leads to programs that are both correct and efficient.
47 )
48
49 $(LI    The most obvious is to just be able to name a manifest constant
50     or string.
51 )
52
53 $(LI    Constant data can be placed into ROM (read only memory).
54 )
55
56 $(LI    Const parameters indicate that a function will not modify whatever
57     its arguments refer to, with a direct positive effect on modularity.
58 )
59
60 $(LI    Constant data indicates that other threads or other aliases to the data
61     cannot modify it.
62 )
63
64 $(LI    A constant can be propagated and folded, which pulls operations
65     from run time into compile time.
66 )
67
68 $(LI    Data flow analysis is aided when there's a guarantee that constant
69     data will not change as a side effect of other operations.
70 )
71
72 $(LI    Constant data can be cached or mirrored in registers without
73     needing to synchronize them with memory.
74 )
75
76 $(LI    Const reduces the cognitive load on the programmer - by looking at
77     constness in the declaration, he can learn things about whatever
78     uses that declaration without having to slog through that code.
79 )
80 )
81
82
83 <h2>How Does C++ Const Stack Up?</h2>
84
85
86 $(P
87 C++ const comes in two forms: const as a storage class, and const
88 as a type attribute.
89 )
90
91 $(P
92 Const as a storage class is most useful for
93 declaring manifest constants, such as:
94 )
95
96 $(CPPCODE
97 const int X = 3;
98 )
99
100 $(P
101 and the language guarantees that $(CODE X) will never be anything but 3.
102 $(CODE X) can be put into ROM, and the optimizer can reliably replace all
103 rvalues of $(CODE X) with 3. Const is a storage class when it applies to the
104 top level type of the declaration. <a href="#note1">[1]</a>
105 )
106
107 $(P
108 Const as a type attribute is different. It becomes a type attribute
109 when it does not apply to the top level type of a declaration:
110 )
111
112 $(CPPCODE
113 int x = 3;
114 const int *p = &x;
115 )
116
117 $(P
118 Here the const applies to the int that $(CODE p) is pointing to, not $(CODE p).
119 Const as a type attribute means that a read only view of data is taken.
120 It doesn't mean that the data is constant. For example:
121 )
122
123 $(CPPCODE
124 int x = 3;
125 const int *p = &x;
126 *p = 4;     // error, read-only view
127
128 const int *q = &x;
129 int z = *q; // z is set to 3
130 x = 5;      // ok
131 int y = *q; // y is set to 5
132 )
133
134 $(P
135 $(CODE z) is not equal to $(CODE y), even though $(CODE *q) is const.
136 This is one instance of the so-called aliasing problem,
137 since while the above
138 snippet is trivial, the existence of such aliases can be very hard
139 to detect in a complex program. It is impossible for the compiler to
140 reliably detect it. This means that the compiler cannot cache 3 in
141 a register and reuse the cached value to replace $(CODE *q), it must
142 go back and actually dereference $(CODE q) again.
143 )
144
145 $(P
146 Consider a function defined as:
147 )
148
149 $(CPPCODE
150 void foo(const int *p);
151 )
152
153 $(P
154 Ostensibly, it looks like I can safely pass references to my int variables to
155 $(CODE foo()) and be assured that $(CODE foo()) won't be changing my ints.
156 But that isn't true:
157 )
158
159 $(CPPCODE
160 void foo(const int *p)
161 {
162     int *q = const_cast&lt;int *&gt;(p);
163     *q = 4;
164 }
165 )
166
167 $(P
168 $(CODE foo()) has not only cast away the constness, but it has gone and modified
169 my precious int variable, even though $(CODE foo())'s interface promised it
170 would not.
171 Even worse, this is legal and well-defined C++, and must be supported by
172 any C++ compiler. While writing such code is frowned upon by professional
173 C++ programmers, the fact that it is legal means that the compiler
174 is of no help in enforcing it.
175 )
176
177 $(P
178 So, if someone is doing a code review, and sees a function parameter declared
179 as a pointer to const, he must carefully review all the code in that function,
180 and all the code in functions called by that function that take the parameter
181 as an argument, to see if it is modified or not. This defeats much
182 of the purpose in declaring a parameter as const.
183 )
184
185 $(P
186 But there are more problems with C++ const. Consider a class:
187 )
188
189 $(CPPCODE
190 class C;
191 void foo(const C *p);
192 ...
193 C c;
194 foo(&c);
195 )
196
197 $(P
198 Does $(CODE foo()) modify the contents of $(CODE c)?
199 Sure, through the $(CODE const_cast), but
200 there's another legal way. class $(CODE C) could have mutable members:
201 )
202
203 $(CPPCODE
204 class C
205 {
206     public: mutable int x;
207 };
208
209 void foo(const C *p)
210 {
211     p-&gt;x = 3;    // ok, C::x is mutable
212 }
213 )
214
215 $(P
216 So our beleagured code reviewer now has to search the definition of $(CODE C)
217 for
218 mutable members to see if $(CODE foo()) could modify $(CODE c).
219 )
220
221 $(P
222 The justification for mutable is the concept called $(I logical const), where
223 an object appears to be const to an external viewer, but internally can
224 change. An example would be a class that maintains a cached internal result
225 of an expensive operation. The difficulty with this is two-fold. First,
226 there is no language support at all to ensure that mutable is not used for
227 something other than logical constness. It can be very difficult for a code
228 reviewer to determine if mutable is used correctly in this manner or not.
229 It is impossible to do automated detection of logical constness.
230 Mutable can be and is used for other purposes, and that is completely
231 legal and well-defined C++.
232 Second, having const references to mutable data renders unreliable the
233 ability to rely on const references not being modifiable, which has unfortunate
234 consequences for optimization and writing inherently threadsafe code.
235 It goes back to making it impossible to write generic code that must
236 not modify anything referenced by its parameters.
237 )
238
239 $(P
240 There's one more problem. Suppose class $(CODE C) is the root of a collection,
241 which we'll trivially represent as $(CODE T*):
242 )
243
244 $(CPPCODE
245 class C
246 {
247     T *q;
248 };
249 )
250
251 $(P
252 and a function $(CODE foo()) which reads the collection, and returns some
253 information about it:
254 )
255
256 $(CPPCODE
257 int foo(const C *p);
258 )
259
260 $(P
261 The $(CODE const) only applies to the contents of class $(CODE C), it does not
262 apply to
263 whatever $(CODE q) points to:
264 )
265
266 $(CPPCODE
267 int foo(const C *p)
268 {
269     *p-&gt;q = ...; // ok, we can modify whatever C::q points to
270     return 0;
271 }
272 )
273
274 $(P
275 There is no way to specify in $(CODE foo())'s interface that it promises not to
276 modify
277 anything through its parameters. In other words, const is not transitive.
278 This is especially troublesome when attempting to write generic function
279 APIs based on unknown types:
280 )
281
282 $(CPPCODE
283 template&lt;T&gt; int foo(const T *p) { ... }
284 )
285
286 $(P
287 Without knowing the instantiated type of $(CODE T), it is impossible to know
288 if $(CODE foo()) is modifying things through its parameter or not.
289 )
290
291
292 $(P
293 To summarize the difficulties with C++ const:
294 )
295
296 $(OL
297 $(LI    Const type attributes do not mean immutable data, they only mean
298     a read-only view of the data. Other references to the same data
299     can modify it at any time.
300 )
301
302 $(LI    It is legal and defined behavior to cast away const-ness and change
303     the data anyway if the data was originally mutable.
304 )
305
306 $(LI    Mutable members override the constness of the declaration.
307 )
308
309 $(LI    Const is not transitive; there is no way to specify the constness
310     of a complex type at the point of use of it.
311 )
312 )
313
314
315 $(P
316 C++ const is not a good match with the goals listed at the beginning of this
317 article. That means that it's worth a redesign.
318 )
319
320
321
322 <h2>Constness In D</h2>
323
324 $(P
325 Clearly, there are two distinct meanings
326 to constant - meanings that are routinely conflated. One is that constant
327 data really is constant. It never changes. It's different enough that
328 it needs a different name. In D, this kind of constant is called an
329 invariant.
330 )
331
332 $(P
333 Invariant data solves the aliasing problem, because even if there are
334 other aliases to the same data, since it is invariant, those references
335 cannot alter the data. The more invariant data a program uses, the
336 easier it is to understand. Invariants form a touchstone,
337 a reference point, for exploring the meaning of the rest of the code.
338 If the value of an invariant does change, it is a clear indication of
339 a severe program bug.
340 It's helpful to have this constraint statically enforced.
341 )
342
343 $(P
344 The second kind of constant is a readonly view of data, even
345 though the data may be changed through another mutable reference to that
346 same data. This is called const, and is an invaluable modularity aid. One
347 function wants to look at some data; a module has the data, but wants to control
348 changes to it; all they need is a little protocol that allows the function to
349 look at the data, in confidence that it can't change it.
350 )
351
352 $(P
353 Mutable references can be implicitly converted to const (as in C++).
354 Invariant references can also be implicitly converted to const.
355 But const cannot be implicitly converted to invariant, and neither can
356 mutable references.
357 Essentially, const is a weaker form of invariant because it says: $(DOUBLEQUOTE you can't
358 change this data; someone else may or may not be able to change it.)
359 )
360
361 $(P
362 Const references are usually used in function APIs, where the function
363 is guaranteeing it will not change any data reachable through that const
364 reference.
365 )
366
367 $(P
368 Which brings up another aspect of const in D - it's transitive.
369 Const in C++ is not transitive, which means one can have a pointer
370 to const pointer to mutable int. To declare a variable that is const
371 at each level, one must write:
372 )
373
374 $(CPPCODE
375 int const *const *const *p;   // C++
376 )
377
378 $(P
379 The $(CODE const) is left associative, so the declaration is a pointer to const
380 pointer to const pointer to const int. Const being transitive in D means
381 that every reference reachable through the const is also const.
382 An entire logical region of an application can be protected by placing only one
383 qualifier.
384 To reflect that, the syntax is different, using constructor-like notation:
385 )
386
387 ---
388 const(int **)* p;   // D
389 ---
390
391 $(P
392 Here the $(CODE const) applies to the part of the type that is in parentheses.
393 Note that the syntax makes it impossible to declare things like
394 a pointer to a const pointer to a mutable type.
395 This slight loss in expressiveness is justifiable by the considerable power of
396 transitive protection.
397 )
398
399 $(P
400 Transitive const solves the problem of specifying function interfaces
401 to data structs that truly are read only, even if they are generic functions
402 dealing with unknown types.
403 )
404
405 $(P
406 Analogously to const, invariant types are transitive and follow the same
407 syntactical pattern as const.
408 )
409
410
411 $(P
412 Because a static type system can be a straitjacket, there needs to be
413 a way to circumvent it for special cases. Like C++, D allows the casting away
414 of constness and invariantness. Unlike C++, if the programmer then
415 subverts the const or invariant guarantee and changes the underlying data,
416 then undefined behavior results.
417 )
418
419
420 <h2>References</h2>
421
422 $(UL
423     $(LI $(LINK2 http://en.wikipedia.org/wiki/Const, Const-correctness) Wikipedia)
424 )
425
426 <h2>Acknowledgments</h2>
427
428 $(P
429 Many thanks for Andrei Alexandrescu, Bartosz Milewski, Brad Roberts,
430 David Held, Eric Niebler and many other members of the D community for their
431 major contributions to the design of the new const system.
432 )
433
434 $(P
435 Many thanks in particular to Andrei Alexandrescu for reviewing this
436 article and making many invaluable suggestions for improving it.
437 )
438
439 <h2>Notes</h2>
440
441 $(P
442 <a name="note1">[1]</a> Several people have questioned this, arguing
443 that const_cast allows a const object to be legitimatedly changed.
444 The relevant standard paragraph is C++98 7.1.5.1-4:
445
446 <blockquote>
447 Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
448 object during its lifetime (3.8) results in undefined behavior.
449 </blockquote>
450 )
451
452 )
453 )
454
455 Macros:
456     TITLE=Here a Const, There a Const
457     WIKI=Const
458     D_CODE = <pre class="d_code2">$0</pre>
459     CPPCODE2 = <pre class="cppcode2">$0</pre>
460     ERROR = $(RED $(B error))
461     COMMA=,
462 META_KEYWORDS=D Programming Language, const, final, invariant, mutable,
463 logical constness, C++
464 META_DESCRIPTION=Why const was redesigned in D.
Note: See TracBrowser for help on using the browser.