The Problem
Object files may have 'communal' or 'shared' record data within them. This presents a problem for the present design for the linker and overall data model for DDL. The assumption was that there were two types of symbols - those that were defined and those that weren't. Now, its apparent that with a third type, the model will have to adapt to accomodate a few new behaviors.
Basically a module can have a symbol that is both defined and undefined simulataneously. The idea is to have a single instance of that symbol within the entire program, that is used by all other modules that reference it. In order for the linker to cope, it has to allow for some symbols to fully link, and others to be 'self-resolved' from within the module itself . The trick is to only allow self-resolution *after* the linker has exhausted all other possibilities. Right now, the OMF loader is self-resolving *first*, which is causing some major headaches.
Proposed Revisions
enum SymbolType: ubyte{ // ubyte to save space
Weak, // defined but not relied upon for linking
Strong, // defined, can be linked against
Extern // undefined, needs a strong reference
}
// 'type' is added to help show the status of the symbol
ExportSymbol{
char[] name;
void* address; // zero is only valid for 'strong' symbols
SymbolType type;
}
// these methods are either added or changed to the existing DynamicModule class
DynamicModule{
ExportSymbol[] getSymbols();
ExportSymbol getSymbol(char[] name); // quick lookup - can use internal hash if need be
void resolveFixups(); // handles self-resolution
bool isResolved();
depricated char[][] getDependencies();
depricated void resolveDependency(char[] name,void* address);
depricated ExportSymbol[] getExports();
}
The ramifications here are profound. Instead of maintaining dependencies, exports and defined symbols separately, they are now all thrown into one unified set of symbols. Not only does this streamline the linking process, but it should help reduce the complexity of aggregate DynamicModule? types.
As a rule: any weak symbol must already be pointed to its local counterpart and flagged as 'weak'. If it does not have a local counterpart than it is an 'Extern'.
The algorithm for resolveFixups() is simple:
- resolve any fixups the module may have
- flag isResolved as true if all symbols are now 'Strong'
The linker will also undergo a few changes:
// revised link routine
// canSelfResolve is passed as 'true' for registraion variants of link routines (explained below)
public void link(DynamicModule mod,bool canSelfResolve){
//protect against infinite recursion here by returning early
//by this, we count on the module being resolved further up the call stack
if(mod.isLinking) return;
mod.isLinking = true;
foreach(symbol; mod.getSymbols){
if(symbol.type == SymbolType.Strong) continue;
// resolve a dependency from out of the registry
foreach(lib; this.libraries){
auto libMod = lib.getModuleForExport(symbol.name);
if(libMod){
if(!libMod.isResolved()){
this.link(libMod,true);
}
auto otherSymbol = libMod.getSymbol(symbol.name);
assert(otherSymbol.type == SymbolType.Strong);
symbol.address = otherSymbol.address;
symbol.type = SymbolType.Strong;
goto nextSymbol;
}
}
// throw if we aboslutely *must* have this symbol resolved on this pass
if(symbol.type == case SymbolType.Extern || !canSelfResolve){
throw new DDLLinkException("cannot resolve symbol '%s'",symbol.name);
}
nextSymbol:
{} // satisfy compiler
}
if(canSelfResolve){
foreach(sym; mod.getSymbols){
if(sym.type == SymbolType.Weak) sym.type = SymbolType.Strong;
}
}
mod.resolveFixups();
mod.isLinking = false;
if(!mod.isResolved()){
throw new LinkModuleException(mod);
}
// dig up the ModuleInfo (if applicable) and initalize it!
foreach(ExportSymbol sym; mod.getExports){
if(getSymbolType(sym.name) == SymbolType.ModuleInfo){
// found it, now get it and run the constructor
ModuleInfo moduleInfo = cast(ModuleInfo)(sym.address);
debug debugLog("init! %0.8X",sym.address);
initModule(moduleInfo,0);
}
}
}
There is one caveat here. Weak symbols will only be self-resolved on registered libs. Therefore, if a module is passed in on a basic 'link' call, and it contains weak symbols that aren't already registered, then the linker will throw. This is one behavior that can only be worked around with an unannounced central registry of shared symbols - stiff drawbacks apply here.
