Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #1368 (closed enhancement: fixed)

Opened 15 years ago

Last modified 15 years ago

Add stacktracing

Reported by: larsivi Assigned to: fawzi
Priority: major Milestone: 0.99.9
Component: Core Functionality Version: 0.99.7 Dominik
Keywords: Cc:

Description

There are two available components that should be integrated, h3's tangotrace and Hxal's jive.

The feature set should be commonalized.

Change History

11/19/08 00:24:56 changed by Hxal

Can implement for Linux

  • Adding stack traces to regular exceptions
  • Printing a stack trace on crash
  • Resolving symbols through libbfd (part of binutils)

Cannot implement for Linux at this time

It's impossible to throw exceptions on crashes due to compiler deficiencies:

  • DMD doesn't expect exceptions out of random instructions and such exceptions leak out of immediate exception handlers
  • GDC and LDC don't expect that too, and the GCC unwinding code they use aborts the process
  • The GCC backend supports asynchronous unwind tables for handling such exceptions, but it's not the default; no unwind tables at all are generated for C code by default, which also makes GCC abort on exceptions propagating through C code
  • LLVM doesn't support non-call exception handlers at all; support is planned for an unspecified future

Need a decision on:

Whether handling crashes should be threadsafe - to prevent an infinite signal loop, the signal handler needs to maintain a counter to know whether it crashed itself.

If the counter is global, it will cause a race condition that could prevent a stack trace from being output, if the counter is thread-local it means memory has to be allocated in the signal handler.

The latter solution would be required for throwing exceptions out of signal handlers, but for simply dumping and exiting maybe we can leave the race condition be.

Implementation details Fawzi might not like

  • Mutex operations and textual output need to be performed in the signal handler; it's possible to get rid of the output in favor of more thread synchronization, but that'd really be overengineering.

Things I have no idea about

  • How to set up per-thread secondary stacks, a single secondary stack as specified by sigaltstack is useless for a multi-threaded program.

Out of the scope of this ticket

  • Symbol demangling

11/19/08 12:00:06 changed by fawzi

Thanks Hxal, now I have a better idea of the issues.

I think that one main design principle should be that it is ok in extreme situation to give only partial information (even simply "ehm something is wrong) but not to possibly lock and give no info.

I will look at http://team0xf.com:8080/tangoTrace and http://zygfryd.net/hg/jive

and try to com up with a meaningful proposal

11/19/08 18:20:35 changed by h3r3tic

Just a quick note before I divulge into writing an article like Hxal's ;) My stuff sits at http://team0xf.com:1024/tangoTrace3/ the one you linked is old and lacks features.

11/19/08 18:42:13 changed by Hxal

Disregard what I said about alternate signal stacks, they couldn't be used anyway. I rely on the glibc backtrace() to get a raw stack trace, it wouldn't probably work with an alternate stack.

I rely on backtrace() instead of walking the stack manually, because on x86_64 frame pointers seem to be omittable by default. Backtrace() uses the GCC unwinding code to make use of exception handling frames in addition to frame pointers, without it tracing on x86_64 would work poorly.

11/20/08 10:13:49 changed by larsivi

Considering the GPL nature of libbfd, it may turn difficult to actually use the library. I don't think we can even support linking it for open source applications without covering Tango itself in GPL? And that isn't exactly an option.

The only possibility is if the stack tracing component is a wholly independent plugin to Tango, that the user can enable himself if GPL is ok for him to use.

However, this is probably not a long term solution, so we need to think about a more freely licensed alternative to libbfd.

11/21/08 17:31:22 changed by kris

What about the code from Thomas?

11/21/08 17:44:55 changed by fawzi

Indeed the license of libbdf is problematic, if we use it it should be through a plugin the the user has to install. having a plugin or callback interface

I looked for other libraries, I found:

http://sourceforge.net/projects/libcwd/ threadsafe but still ugly license: QPL

and

http://reality.sgiweb.org/davea/dwarf.html

LGPL, and used in commercial debuggers... (zero)

I don't know how good it is, but it could be a solution (I checked since XCode 2.4 the default debug format is DWARF also on mac).

You use it only to find the debugging symbols and IPC-> function/file/nr mappings or I missed something?

demangling is another issue, so I have added a separate ticket: #1370

11/21/08 20:59:05 changed by kris

flectioned is the stack-trace I was thinking of (thanks to larsivi for the prompt)

11/22/08 02:27:55 changed by h3r3tic

http://team0xf.com:1024/tangoTrace3/

Current status on DMD+Windows

  • Trace info for regular and special exceptions
    • Except out-of-memory exceptions which have special treatment in the runtime
  • Extra info for Access Violations: operation type (read/write/code-execution-prevention violation), address
  • Stack tracing that tries to be smart
    • Sometimes there's a better trace from ESP than from EBP
    • The tracing code attempts to find a trace from ESP first
    • When it can't reach _Dmain or Fiber.run, it does the trace from EBP
    • It can be disabled using a version switch
  • Stack tracing witin and out of fibers
    • Reaches Fiber.run, switches to the outer stack
  • Seems thread-safe

Quirks

  • It needs a patch to the runtime (in the repository - tangoRuntimeMod.patch)
    • Most critical - creating trace info for special exceptions
    • Nice to have - the extra info for Access Violations
    • Required for Fibers - exposing core.Thread.topContext to the tracing code, so it may get the outer stack when tracing exceptions in fibers
  • When genobj.d and tango.core.Thread.d are compiled with -O, the top-most function in the trace may be missing due to the "omit frame pointer" optimization
  • I haven't investigated stack tracing for/in DLL-related crashes yet

Yet another licensing issue

The code in DbgInfo.di which parses executable files and extracts debug symbols is a modified version from the code found in the 'phobos backtrace hack', which in turn, seems to be a direct port of a module in Wine. The most recent version can be found here: http://cvs.winehq.org/cvsweb/wine/tools/winedump/debug.c?rev=1.28&content-type=text/x-cvsweb-markup&hideattic=0 , whereas it's a modified version of http://cvs.winehq.org/cvsweb/wine/tools/cvdump/Attic/cvdump.c?rev=1.2&content-type=text/x-cvsweb-markup&hideattic=0 . It's a bit unclear to me whether the original version simply lacked the LGPL license or if it was used in an LGPL project despite the Copyright. Either way, we'll either take it as LGPL (which could be a pain), somehow obtain a compatible license, or rewrite it. There is some well-licensed code to parse CodeView debug symbols in DDL's COFF backend, although it would require some refactoring. There's also Kong: http://dsource.org/projects/kong which does some PE parsing and has a very nice license, but I don't know much more besides that.

Notes

I'm not sure what's the purpose of the void* ptr in genobj.traceContext, so I use it to pass the CONTEXT structure from WinAPI to the trace handler.

I have exposed a C API to manipulate the registered debug symbols, so they may be provided externally, for instance when loading/unloading DDL modules:

ModuleDebugInfo ModuleDebugInfo_new();
void ModuleDebugInfo_addDebugInfo(ModuleDebugInfo minfo, size_t addr, char* file, char* func, ushort line);
char* ModuleDebugInfo_bufferString(ModuleDebugInfo minfo, char[] str);
void GlobalDebugInfo_addDebugInfo(ModuleDebugInfo minfo);
void GlobalDebugInfo_removeDebugInfo(ModuleDebugInfo minfo);

The ModuleDebugInfo class can be treated as an opaque pointer by any code using this externally. The ModuleDebugInfo_bufferString function allocates memory in a module-dependent storage that is automatically freed when releasing the module. I believe the rest of the functions should be pretty self-explanatory

I'd like to propose an extension to the Exception.TraceInfo class, so it can provide the trace info as the five fields:

  • char[] functionName
  • char[] file
  • int line
  • size_t debugSymbolOffset // or ptrdiff_t <- offset from the trace address to the located debug symbol - might be useful to determine the accuracy of the located symbol, e.g. when the debug info is not generated for every single line in the source code.
  • size_t address // or void*

It could be provided in an alternative version of opApply.

The way I'm currently printing stack traces is:

object.Exception: Access Violation - Write at address 0xdeadbeef
    at Main.baz(Test.d:11) +3 [402024]
    at Main.bar(Test.d:17) +0 [402037]
    at Main.foo(Test.d:21) +0 [402043]
    at main(Test.d:30) +0 [40207d]

11/25/08 10:08:06 changed by fawzi

I think LGPL we can handle without big problems (these are normal functions, not templates, so we can just compile them as separate libraries, and link them. It seems that such an approach is needed in any case.

GPL on the other hand cannot be linked in any automatic way, as it "taints" the whole tango, to allow it it should be manually performed by the user, either by installing a plugin that is loaded if found (I like this) or even manually statically compiled, but upon explicit user action.

So it would be nice if Hxal could comment on the viability of libDWARF

http://reality.sgiweb.org/davea/dwarf.html

The problem with these libraries is how well they actually work in practice, not only what they should do in theory.

It seems that the really problematic part (license wise) is getting the debugging symbols, so we need to define a unique interface for it.


h3r3tic thanks for the nice overview of the Windows side.

I would say LGPL is ok, I would not try to use code from a project "dead due to lack of interest" :) in this low level parts there are many small detail that can make the difference between working, and inexplicably dying in some situations.

-O optimization, as long as the fiber frame is there and the code works it is ok, omitted frames have to be expected

In the next days I will look at the patch, I do not like so much opApply with so many parameters, I would prefer a structure that contains all the fields, or something similar, but I haven't made up my mind yet. The fields in Exception.TraceInfo? seem reasonable to me though.

(follow-up: ↓ 12 ) 11/25/08 19:11:01 changed by h3r3tic

As for compiling into separate libraries, am I correct to assume that there will simply be two libraries that implement the functionality of tango-base-dmd, one with and one without the stack tracing features? If this is so, then the -O issue may also be solved swiftly, since the two modules (genobj and Thread) might be compiled without -O just for the lib with stack tracing. It's not strictly necessary, but not seeing the top-most frame might be misleading.

There shouldn't be issues with the Fiber frame either way, since the stack tracing code simply finds the bounds of the function at the init time, by walking it until a 'ret' instruction is found. Tracing out of fibers might fail if DMD somehow decided to inline the Fiber.run() function into fiber_entryPoint(). I'm yet to find out whether that "are we in Fiber.run()" test during the trace is required at all. The code might simply walk down all the way and then check if there's a next stack in Thread.getThis...

If we can go with the LGPL code for now, that's cool. Perhaps someone will require stack tracing in release code and is willing to donate a rewritten module. I might do it at some point.

As for the opApply - you're probably right with the struct. I bet there would be folks forgetting about the number of values to iterate on or messing up their order. A struct will solve that nicely indeed :)

(in reply to: ↑ 11 ) 11/25/08 19:17:29 changed by larsivi

Replying to h3r3tic:

As for compiling into separate libraries, am I correct to assume that there will simply be two libraries that implement the functionality of tango-base-dmd, one with and one without the stack tracing features?

No, when talking about separate libraries, it means that the stacktracing lives in a separate library that is able to hook into the runtime - there can not be any LGPL code inside the runtime library itself.

If stacktracing won't work with the user just adding the stack tracing lib to the link line, then there cannot be any LGPL code whatsoever.

11/25/08 22:14:16 changed by h3r3tic

That's a problem because static module ctors are not run if you simply link to a lib containing them. I've found a way that allows this approach, but I think it will still require a separate tango-base lib for the runtime. It could use the win32 dbghelp API to lookup an initialization function for the LGPL'd stuff in the host app. But if the runtime should not do this active lookup by default, we'll need a secondary lib.

01/18/09 04:38:51 changed by kris

Apparently the team0xf version also (currently) exhibits some licensing issues. Awaiting further news ...

03/29/09 13:15:48 changed by larsivi

  • milestone changed from 0.99.8 to 0.99.9.

03/31/09 21:49:31 changed by fawzi

r4491 introduced the first version of tango native stacktracing. Features are still rough, and code should be cleaned, but the structure is (I hope) sound.

main functions are rt_addrBacktrace and rt_symbolizeFrameInfo, one creates a backtrace, the other loads symbols for a frame.

I would like to remove TraceInfo? (and thus rt_setTraceHandler) and use the BasicTraceInfo? (converted to a struct) instead.

The trace output is ugly, I am open to suggestions.

Hxal, about linux if you adapt something to this interface I am willing to check in any reasonably clean code that does not use GLP stuff, at the limit I can accept a post processing GPL tool that completes the info with line numbers...

h3r3tic I tried to add your tango trace code, it is still quite messy, and should be cleaned up...

05/21/09 10:44:47 changed by larsivi

How much is left before we can close this one?

06/03/09 07:26:41 changed by fawzi

  • status changed from new to closed.
  • resolution set to fixed.

with r4727 linux gets symbolification (through ELF parsing).

Line numbers on linux/mac are still missing, and can be retrieved through addr2line, but I see no easy way to add that in a safe way and with an acceptable license.

So I think that stacktracing is at an acceptable level, and I will close this ticket.

Further stacktracing issues should go to new tickets.