root/trunk/docsrc/hijack.dd

Revision 724, 12.2 kB (checked in by walter, 4 years ago)

2.030 update

  • Property svn:eol-style set to native
Line 
1 Ddoc
2
3 $(D_S Function Hijacking Mitigation,
4
5 <div align="right">$(DIGG)</div>
6
7 $(P
8 As software becomes more complex, we become more reliant on module
9 interfaces. An application may import and combine modules from multiple
10 sources, including sources from outside the company. The module
11 developers must be able to maintain and improve those modules without
12 inadvertently stepping on the behavior of modules over which they cannot
13 have knowledge of. The application developer needs to be notified if
14 any module changes would break the application. This talk covers
15 function hijacking, where adding innocent and reasonable declarations
16 in a module
17 can wreak arbitrary havoc on an application program in C++ and Java. We'll then
18 look at how
19 modest language design changes can largely eliminate the problem in the D
20 programming language.
21 )
22
23
24 $(SECTION2 Global Function Hijacking,
25
26 $(P Let's say we are developing an application that imports two modules:
27 X from the XXX Corporation, and Y from the YYY Corporation.
28 Modules X and Y are unrelated to each other, and are used for completely
29 different purposes.
30 The modules look like:
31 )
32
33 ----
34 module X;
35
36 void foo();
37 void foo(long);
38 ----
39
40 ----
41 module Y;
42
43 void bar();
44 ----
45
46 $(P The application program would look like:
47 )
48
49 ----
50 import X;
51 import Y;
52
53 void abc()
54 {
55     foo(1);  // calls X.foo(long)
56 }
57
58 void def()
59 {
60     bar();   // calls Y.bar();
61 }
62 ----
63
64 $(P So far, so good. The application is tested and works, and is shipped.
65 Time goes by, the application programmer moves on, the application is
66 put in maintenance mode. Meanwhile, YYY Corporation, responding to
67 customer requests, adds a type $(CODE A) and a function $(CODE foo(A)):
68 )
69
70 ----
71 module Y;
72
73 void bar();
74 class A;
75 void foo(A);
76 ----
77
78 $(P The application maintainer gets the latest version
79 of Y, recompiles, and no problems. So far, so good.
80 But then, YYY Corporation expands the functionality of $(CODE foo(A)),
81 adding a function $(CODE foo(int)):
82 )
83
84 ----
85 module Y;
86
87 void bar();
88 class A;
89 void foo(A);
90 void foo(int);
91 ----
92
93 $(P Now, our application maintainer routinely gets the latest version of Y,
94 recompiles, and suddenly his application is doing something unexpected:
95 )
96
97 ----
98 import X;
99 import Y;
100
101 void abc()
102 {
103     foo(1);  // calls Y.foo(int) rather than X.foo(long)
104 }
105
106 void def()
107 {
108     bar();   // calls Y.bar();
109 }
110 ----
111
112 $(P because $(CODE Y.foo(int)) is a better overloading match than $(CODE X.foo(long)).
113 But since $(CODE X.foo) does something completely and totally different than
114 $(CODE Y.foo), the application now has a potentially very serious bug in it.
115 Even worse, the compiler offers NO indication that this happened and cannot
116 because, at least for C++, this is how the language is supposed to work.
117 )
118
119 $(P In C++, some mitigation can be done by using namespaces or (hopefully)
120 unique
121 name prefixes within the modules
122 X and Y. This doesn't help the application programmer, however, who probably
123 has no control over X or Y.
124 )
125
126 $(P The first stab at fixing this problem in the D programming language was
127 to add the rules:
128 )
129
130 $(OL
131 $(LI by default functions can only overload against other functions in the same
132 module)
133 $(LI if a name is found in more than one scope, in order to use it it must
134 be fully qualified)
135 $(LI in order to overload functions from multiple modules together, an alias
136 statement is used to merge the overloads)
137 )
138
139 $(P So now, when YYY Corporation added the $(CODE foo(int)) declaration, the
140 application
141 maintainer now gets a compilation error that foo is defined in both module
142 X and module Y, and has an opportunity to fix it.
143 )
144
145 $(P This solution worked, but is a little restrictive. After all, there's no
146 way $(CODE foo(A)) would be confused with $(CODE foo()) or $(CODE foo(long)),
147 so why have the compiler
148 complain about it? The solution turned out to be to introduce the notion
149 of overload sets.
150 )
151
152 $(SECTION3 Overload Sets,
153
154 $(P An overload set is formed by a group of functions with the same name
155 declared
156 in the same scope. In the module X example, the functions $(CODE X.foo()) and
157 $(CODE X.foo(long)) form a single overload set. The functions
158 $(CODE Y.foo(A)) and $(CODE Y.foo(int))
159 form another overload set. Our method for resolving a call to foo becomes:
160 )
161
162 $(OL
163 $(LI Perform overload resolution independently on each overload set)
164 $(LI If there is no match in any overload set, then error)
165 $(LI If there is a match in exactly one overload set, then go with that)
166 $(LI If there is a match in more than one overload set, then error)
167 )
168
169 $(P The most important thing about this is that even if there is a BETTER match
170 in one overload set over another overload set, it is still an error.
171 The overload sets must not overlap.
172 )
173
174 $(P In our example:
175 )
176
177 ----
178 void abc()
179 {
180     foo(1);  // matches Y.foo(int) exactly, X.foo(long) with conversions
181 }
182 ----
183
184 $(P will generate an error, whereas:
185 )
186
187 ----
188 void abc()
189 {
190     A a;
191     foo(a);  // matches Y.foo(A) exactly, nothing in X matches
192     foo();   // matches X.foo() exactly, nothing in Y matches
193 }
194 ----
195
196 $(P compiles without error, as we'd intuitively expect.
197 )
198
199 $(P If overloading of $(CODE foo) between X and Y is desired, the following can be done:
200 )
201
202 ----
203 import X;
204 import Y;
205
206 alias X.foo foo;
207 alias Y.foo foo;
208
209 void abc()
210 {
211     foo(1);  // calls Y.foo(int) rather than X.foo(long)
212 }
213 ----
214
215 $(P and no error is generated. The difference here is that the user
216 deliberately combined the overload sets in X and Y, and so presumably
217 both knows what he's doing and is willing to check the $(CODE foo)'s when
218 X or Y is updated.
219 )
220
221 )
222
223 )
224
225 $(SECTION2 Derived Class Member Function Hijacking,
226
227 $(P There are more cases of function hijacking. Imagine a class $(CODE A) coming
228 from AAA Corporation:
229 )
230
231 ----
232 module M;
233
234 class A { }
235 ----
236
237 $(P and in our application code, we derive from $(CODE A) and add a virtual
238 member function $(CODE foo):
239 )
240
241 ----
242 import M;
243
244 class B : A
245 {
246     void foo(long);
247 }
248
249 void abc(B b)
250 {
251     b.foo(1);   // calls B.foo(long)
252 }
253 ----
254
255 $(P and everything is hunky-dory. As before, things go on, AAA Corporation
256 (who cannot know about $(CODE B)) extends $(CODE A)'s functionality a bit by
257 adding $(CODE foo(int)):
258 )
259
260 ----
261 module M;
262
263 class A
264 {
265     void foo(int);
266 }
267 ----
268
269 $(P Now, consider if we're using Java-style overloading rules, where base class
270 member functions overload right alongside derived class functions. Now,
271 our application call:
272 )
273
274 ----
275 import M;
276
277 class B : A
278 {
279     void foo(long);
280 }
281
282 void abc(B b)
283 {
284     b.foo(1);   // calls A.foo(int), AAAEEEEEIIIII!!!
285 }
286 ----
287
288 $(P and the call to $(CODE B.foo(long)) was hijacked by the base class $(CODE A)
289 to call $(CODE A.foo(int)),
290 which likely has no meaning whatsoever in common with $(CODE B.foo(long)).
291 This is why I don't like Java overloading rules.
292 C++ has the right idea here in that functions in a derived class hide
293 all the functions of the same name in a base class, even if the functions
294 in the base class might be a better match. D follows this rule.
295 And once again, if the user desires them to be overloaded against each other,
296 this can be accomplished in C++ with a using declaration, and in D with
297 an analogous alias declaration.
298 )
299
300 )
301
302
303 $(SECTION2 Base Class Member Function Hijacking,
304
305 $(P I bet you suspected there was more to it than that, and you'd be right.
306 Hijacking can go the other way, too. A derived class can hijack a base
307 class member function!
308 )
309
310 $(P Consider:
311 )
312
313 ----
314 module M;
315
316 class A
317 {
318     void def() { }
319 }
320 ----
321
322 $(P and in our application code, we derive from $(CODE A) and add a virtual
323 member function $(CODE foo):
324 )
325
326 ----
327 import M;
328
329 class B : A
330 {
331     void foo(long);
332 }
333
334 void abc(B b)
335 {
336     b.def();   // calls A.def()
337 }
338 ----
339
340 $(P AAA Corporation once again knows nothing about $(CODE B), and adds a
341 function
342 $(CODE foo(long)) and uses it to implement some needed new functionality of
343 $(CODE A):
344 )
345
346 ----
347 module M;
348
349 class A
350 {
351     void foo(long);
352
353     void def()
354     {
355         foo(1L);   // expects to call A.foo(long)
356     }
357 }
358 ----
359
360 $(P but, whoops, $(CODE A.def()) now calls $(CODE B.foo(long)).
361 $(CODE B.foo(long)) has hijacked
362 the $(CODE A.foo(long)). So, you might say, the
363 designer of A should have had the foresight for this, and make
364 $(CODE foo(long)) a non-virtual function. The problem is that $(CODE A)'s
365 designer
366 may very easily have intended $(CODE A.foo(long)) to be virtual, as it's a new
367 feature of $(CODE A). He cannot have known about $(CODE B.foo(long)).
368 Take this to the logical conclusion, and we realize that under this system
369 of overriding, there is no safe way to add any functionality to $(CODE A).
370 )
371
372 $(P The D solution is straightforward. If a function in a derived class
373 overrides a function in a base class, it must use the storage class
374 override. If it overrides without using the override storage class
375 it's an error. If it uses the override storage class without overriding
376 anything, it's an error.
377 )
378
379 ----
380 class C
381 {
382     void foo();
383     void bar();
384 }
385 class D : C
386 {
387     override void foo();  // ok
388     void bar();           // error, overrides C.bar()
389     override void abc();  // error, no C.abc()
390 }
391 ----
392
393 $(P This eliminates the potential of a derived class member function hijacking
394 a base class member function.
395 )
396
397 )
398
399
400 $(SECTION2 Derived Class Member Function Hijacking #2,
401
402 $(P There's one last case of base member function hijacking a derived
403 member function. Consider:
404 )
405
406 ----
407 module A;
408
409 class A
410 {
411     void def()
412     {
413         foo(1);
414     }
415
416     void foo(long);
417 }
418 ----
419
420 $(P Here, $(CODE foo(long)) is a virtual function that provides a specific
421 functionality.
422 Our derived class designer overrides $(CODE foo(long)) to replace that behavior
423 with one suited to the derived class' purpose:
424 )
425
426 ----
427 import A;
428
429 class B : A
430 {
431     override void foo(long);
432 }
433
434 void abc(B b)
435 {
436     b.def();   // eventually calls B.foo(long)
437 }
438 ----
439
440 $(P So far, so good. The call to $(CODE foo(1)) inside $(CODE A)
441 winds up correctly calling
442 $(CODE B.foo(long)). Now $(CODE A)'s designer decides to optimize things, and
443 adds
444 an overload for $(CODE foo):
445 )
446
447 ----
448 module A;
449
450 class A
451 {
452     void def()
453     {
454     foo(1);
455     }
456
457     void foo(long);
458     void foo(int);
459 }
460 ----
461
462 $(P Now,
463 )
464
465 ----
466 import A;
467
468 class B : A
469 {
470     override void foo(long);
471 }
472
473 void abc(B b)
474 {
475     b.def();   // eventually calls A.foo(int)
476 }
477 ----
478
479 $(P Doh! $(CODE B) thought he was overriding the behavior of $(CODE A)'s
480 $(CODE foo), but did not.
481 $(CODE B)'s programmer needs to add another function to $(CODE B):
482 )
483
484 ----
485 class B : A
486 {
487     override void foo(long);
488     override void foo(int);
489 }
490 ----
491
492 $(P to restore correct behavior. But there's no clue he needs to do that.
493 Compile time is of no help at all, as the compilation of $(CODE A) has no
494 knowledge of what $(CODE B) overrides.
495 )
496
497 $(P Let's look at how $(CODE A) calls the virtual functions, which it
498 does through the vtbl[]. $(CODE A)'s vtbl[] looks like:
499 )
500
501 ----
502 A.vtbl[0] = &A.foo(long);
503 A.vtbl[1] = &A.foo(int);
504 ----
505
506 $(P $(CODE B)'s vtbl[] looks like:
507 )
508
509 ----
510 B.vtbl[0] = &B.foo(long);
511 B.vtbl[1] = &A.foo(int);
512 ----
513
514 $(P and the call in $(CODE A.def()) to $(CODE foo(int))
515 is actually a call to vtbl[1].
516 We'd really like $(CODE A.foo(int)) to be inaccessible from a $(CODE B) object.
517 The solution is to rewrite $(CODE B)'s vtbl[] as:
518 )
519
520 ----
521 B.vtbl[0] = &B.foo(long);
522 B.vtbl[1] = &error;
523 ----
524
525 $(P where, at runtime, an error function is called which will throw an
526 exception. It isn't perfect since it isn't caught at compile time,
527 but at least the application program won't blithely be calling the wrong
528 function and continue on.
529 )
530
531 $(P $(I Update: A compile time warning is now generated whenever the
532 vtbl[] gets an error entry.)
533 )
534
535 )
536
537 $(SECTION2 Conclusion,
538
539 $(P Function hijacking is a pernicious and particularly nasty problem in
540 complex C++ and Java programs because there is no defense against it
541 for the application programmer. Some small modifications to the language
542 semantics can defend against it without sacrificing any power or performance.
543 )
544
545 )
546
547 $(SECTION2 References,
548
549 $(UL
550 $(LI $(LINK2 http://www.digitalmars.com/d/archives/digitalmars/D/Hijacking_56458.html, digitalmars.D - Hijacking))
551 $(LI $(LINK2 http://www.digitalmars.com/d/archives/digitalmars/D/Re_Hijacking_56505.html, digitalmars.D - Re: Hijacking))
552 $(LI $(LINK2 http://www.digitalmars.com/d/archives/digitalmars/D/aliasing_base_methods_49572.html#N49577, digitalmars.D - aliasing base methods))
553 $(LI Eiffel, Scala and C# use override or something analogous)
554 )
555
556 $(P Credits:)
557
558 $(UL
559 $(LI Kris Bell)
560 $(LI Frank Benoit)
561 $(LI Andrei Alexandrescu)
562 )
563
564 )
565
566 )
567
568 Macros:
569     TITLE=Hijack
570     WIKI=Hijack
Note: See TracBrowser for help on using the browser.