View previous topic :: View next topic |
Author |
Message |
sean
Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA
|
Posted: Thu Feb 02, 2006 11:13 pm Post subject: |
|
|
kris wrote: | sean wrote: | Done and done. The intrinsics now have inline asm implementations and I've altered the rounding functions as described above. |
Interesting! Does the compiler actually inline them optimally? |
Not really I think the presence of an asm block prevents inlining, though it would be nice if inlining were allowed so long as the asm code didn't do any explicit register manipulation. Also, the functions don't use naked asm, so there's a bit of extra wrapper code in the function itself that could be done away with. That said, here's a quick comparison. Using this source file: Code: | import std.math;
void main()
{
real x = 10.0;
x = sqrt( x );
printf( "?Lf\n", x );
} |
Building against Phobos with -release -inline set yields this: Code: | __Dmain comdat
assume CS:__Dmain
L0: enter 0Ch,0
fld tbyte ptr FLAT:_DATA[00h]
fstp tbyte ptr -0Ch[EBP]
fld tbyte ptr -0Ch[EBP]
fsqrt
fstp tbyte ptr -0Ch[EBP]
push dword ptr -4[EBP]
push dword ptr -8[EBP]
push dword ptr -0Ch[EBP]
push offset FLAT:_DATA[0Ch]
call near ptr _printf
add ESP,010h
leave
ret
__Dmain ends |
However, building against Ares (and importing std.math.core) with the same options yields this: Code: | __Dmain comdat
assume CS:__Dmain
L0: enter 0Ch,0
fld tbyte ptr FLAT:_DATA[00h]
fstp tbyte ptr -0Ch[EBP]
push dword ptr -4[EBP]
push dword ptr -8[EBP]
push dword ptr -0Ch[EBP]
call near ptr _D3std4math4core4sqrtFeZe
fstp tbyte ptr -0Ch[EBP]
push dword ptr -4[EBP]
push dword ptr -8[EBP]
push dword ptr -0Ch[EBP]
push offset FLAT:_DATA[0Ch]
call near ptr _printf
add ESP,010h
leave
ret
__Dmain ends |
With this as the approximate code generated for sqrt: Code: | _D4test4sqrtFeZe comdat
assume CS:_D4test4sqrtFeZe
push EBP
mov EBP,ESP
fld tbyte ptr 8[EBP]
fsqrt
fstp tbyte ptr 8[EBP]
fld tbyte ptr 8[EBP]
pop EBP
ret 0Ch
_D4test4sqrtFeZe ends |
So the intrinsic version is basically just straight inlined assembler, while the other requires a jump and at least a bit of stack manipulation to deal with the parameter passing. I imagine that this is still better than calling a C routine, but the intrinsic is obviously still a better choice if performance is critical. I may still go back and make all those asm block naked, but I'm trying to avoid confusing the casual reader any more than necessary :p
[edit]
I got the sqrt function down to this by trimming out a few lines. I didn't realize the float stack is used for return passing : Code: | _D4test4sqrtFeZe comdat
assume CS:_D4test4sqrtFeZe
push EBP
mov EBP,ESP
fld tbyte ptr 8[EBP]
fsqrt
pop EBP
ret 0Ch
_D4test4sqrtFeZe ends |
|
|
Back to top |
|
|
Don Clugston
Joined: 05 Oct 2005 Posts: 91 Location: Germany (expat Australian)
|
Posted: Mon Feb 06, 2006 2:10 am Post subject: FYI: Naked floating point |
|
|
Code: | // An example of a naked asm floating-point function.
real sin(real x)
{
asm {
naked;
fld real ptr [ESP+4];
fsin;
ret x.sizeof + x.alignof;
}
}
|
This works for DMD-Windows, but I'm not sure if it's correct for Linux.
Would be better to get Walter to make it intrinsic, of course.
(I'd also like to see a few more intrinsics, such as rot and fsincos). |
|
Back to top |
|
|
sean
Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA
|
Posted: Mon Feb 06, 2006 1:09 pm Post subject: |
|
|
This is the current implementation: Code: | real sin(real x) /* intrinsic */
{
version(D_InlineAsm_X86)
{
asm
{
fld x;
fsin;
}
}
else
{
return std.c.math.sinl(x);
}
} |
Letting the compiler sort out stack issues should avoid alignment problems, no?
However, I'll test-build the naked version and see what the difference in code generation is. If it allows the call to be inlined then it's definately better.
[edit]
Looks like neither version is inlined with -inline specified, but your version obviously compiles to fewer instructions within the call itself. The naked version would need to be modified for 64-bit machines, but perhaps it's worthwhile to use anyway? |
|
Back to top |
|
|
Don Clugston
Joined: 05 Oct 2005 Posts: 91 Location: Germany (expat Australian)
|
Posted: Tue Feb 07, 2006 1:45 am Post subject: |
|
|
I just noticed that the Phobos docs for std.intrinsic have been updated with the latest release to include the functions from std.math (eg, fabs, sin, etc). But the file itself is unchanged. Perhaps Walter is going to put them in there in the next release.
Here's a potentially-intrinsic function for std.math.ieee.
Many functions in std.math.core will use it for calculations involving complex numbers. (But, for non-x86, there may be a problem with the use of sin() and cos() in std.math.ieee -- aargh).
/*************************************
* Calculate cos(y) + i sin(y).
*
* On x86 CPUs, this is a very efficient operation;
* almost twice as fast as calculating sin(y) and cos(y)
* seperately, and is the preferred method when both are required.
*/
creal fcis(ireal y)
{
version(D_InlineAsm_X86) {
asm {
naked;
fld real ptr [esp+4];
fsincos;
fxch st(1), st(0);
ret y.sizeof + y.alignof;
}
} else {
return cos(y.im) + sin(y.im)*1i;
}
}
unittest {
assert(fcis(1.3e5Li)==cos(1.3e5L)+sin(1.3e5L)*1i);
assert(fcis(0.0Li)==1L+0.0Li);
} |
|
Back to top |
|
|
sean
Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA
|
Posted: Thu Feb 16, 2006 4:32 pm Post subject: |
|
|
This will be fine so long as the math modules don't all have static ctors (which seems unlikely). Will there be any alignment issues on non-Windows platforms? I tried "fld y" instead of "naked; fld real ptr [esp+4]" and got an access violation so I'm leaving it as provided for now. |
|
Back to top |
|
|
Don Clugston
Joined: 05 Oct 2005 Posts: 91 Location: Germany (expat Australian)
|
Posted: Tue Feb 21, 2006 1:58 am Post subject: |
|
|
Quote: | Will there be any alignment issues on non-Windows platforms? |
It should work fine on Linux (on 32 bits, anyway -- the [esp+4] might need to be [esp+8] on x86-64).
Quote: | I tried "fld y" instead of "naked; fld real ptr [esp+4]" and got an access violation so I'm leaving it as provided for now.
|
You'd also need to remove the ret instruction. |
|
Back to top |
|
|
sean
Joined: 24 Jun 2004 Posts: 609 Location: Bay Area, CA
|
Posted: Tue Feb 21, 2006 10:28 am Post subject: |
|
|
Don Clugston wrote: | You'd also need to remove the ret instruction. | Doh! Must have been a long day that day. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|