Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #1032 (closed wishlist: fixed)

Opened 16 years ago

Last modified 16 years ago

How about add a Big Int implementtation in D

Reported by: yidabu Assigned to: Don Clugston
Priority: major Milestone: 1.0
Component: Packaging Version: 0.99.5 Jascha
Keywords: Cc: sean

Description

Python, ruby, haskell... has a big int implementation. how about add a D version in Tango ?

Change History

04/10/08 01:57:16 changed by kris

I'd really like to see that too :)

04/27/08 09:17:34 changed by larsivi

  • milestone changed from 0.99.6 to 0.99.7.

04/28/08 09:10:58 changed by Don Clugston

I really like the idea too, but it's a heck of a lot of work to do from scratch. The obvious thing to do is to make a wrapper for the gnu GMP library (http://gmplib.org/), since they've done a fantastic job, but unfortunately it's under the GPL. So I think the optimal approach would be to create an interface which would be a wrapper for GMP if GMP is available, otherwise fallback to a simple, less-optimised implementation provided with Tango.

But I don't have the time for this at the moment.

05/07/08 07:28:19 changed by Don Clugston

I've had a bit of a look at this. I think we could create standard primitives for multi-byte add, subtract, shift, and long multiply, without too much work. They are really easy in asm, and really difficult in every other language that I know of.

I whipped up a quick asm multibyte add & subtract, and guess what -- it's 25% faster than GMP! Tested on Pentium M. I expect it to also be faster on PPro, P2, P3, and PCore. Obviously GMP isn't as heavily optimised as I thought (and I'm still not sure that my code is optimal).

I'm still not committing to do this for 1.0.

05/07/08 08:01:48 changed by larsivi

Great stuff Don! 1.0 is still some way off, so we'll see what's the status when we get closer.

05/08/08 02:21:56 changed by kris

Yeah, nice one Don

06/10/08 09:20:52 changed by Don Clugston

In a nice bit of synergy, Janice posted a bigint implementation into Phobos a few days after I started work on the asm routines for Tango. I stuck my routines in the end of her code so she can test them for us. It's rather good because I really didn't want to have to write D versions of the low-level functions.

I have now written asm routines for all of the basic primitives. You can see them in the Phobos2 repository in std.bigint. I'm currently considering how to put them into Tango. Ideally, we'd have multiple versions for the different CPUs, and detect which one to use at run time.

This could be done as an array of function pointers, but how to make it thread safe? Here's a suggestion: All function ptrs could point to a 'not yet initialized' function, which would determine the CPU type, write all the func pointers accordingly (with a locked write), and then jump to the new function address. Note that every thread would write them to the same value, so there shouldn't be a problem if multiple cores happen to be in the 'not yet initialized' function simultaneously.

Does that sound reasonable? (Sean's probably the best person to answer this).

I would use the same CPUDISPATCH facility for the basic matrix operations, as well.

06/10/08 09:36:13 changed by larsivi

  • cc set to sean.

06/10/08 16:36:23 changed by kris

  • milestone changed from 0.99.7 to 1.0.

06/26/08 11:25:10 changed by Don Clugston

The asm routines were checked in, in svn #3674. A nasty issue is, how to deal with I/O without creating a messy dependency on the io system? Basically it needs to do itoa() and atoi() with the same flexibility that existing integers are done, but with thousands of digits. There's no way that a simple toString() style function can work.

Since conversion between base 10 and base 2 is very slow, might be best to have the class itself work only on ASCII strings of digits. If there's other stuff like digit separators, that would need to be added and removed at a higher level.

08/25/08 07:47:37 changed by Don Clugston

  • status changed from new to closed.
  • resolution set to fixed.

tango.math.BigInt? supports basic arithmetic. Although the I/O design needs some thought, the basic functionality is there, and the performance is quite respectable. So, although there will be further BigInt? enhancement requests (and bugs, no doubt), I consider the basic task to be complete (the initial request was pretty vague), so I'm closing this ticket. Tango now has BigInt?!

08/26/08 03:10:35 changed by kris

We'd talked with Walter about "formatted output" issues, related to your I/O concern.

Didn't get very far with the suggestions we had, but given that we'd suggested passing an output buffer (and an optional format string) as toString() arguments, I suspect that could have addressed the issue here?

Nice work, Don, though that heap-allocation thing for 'x++' is a nagging concern

08/26/08 07:54:18 changed by Don Clugston

Didn't get very far with the suggestions we had, but given that we'd suggested passing an output buffer (and an optional format string) as toString() arguments, I suspect that could have addressed the issue here?

Yes. Although as someone suggested on the newsgroup, some kind of output stream is probably a better design than returning a char []. (Would be nice if displaying an array of 10000 ints didn't require allocating a string to store them all).

(A more general solution might be to do something a bit like COM: instead of toString(), have a built-in function for structs vaguely like:

CObject getInterface(CObject desiredInterface) {

if (typeid(desiredInterface) == typeid(Iformatable)) return BigIntFormatter?; return null; // everything else is not supported

}

So that an arbitrary number of interfaces could be supported. (Maybe using function pointers instead of interfaces). )

But even so, this doesn't solve the problem of what to do right now. If we were to decide that BigInt? was fundamental, one solution would be to include a typeinfo pointer and a pointer to a BigInt? formatting function somewhere, both initialized to null. This could be set in the BigInt? constructor (so that it's not linked in, if never used). Then, the formatting code could compare the typeinfo, and call the formatting function when a match occurs.

If it's not fundamental, it can be just be done through member functions; and maybe that's not too bad. But I'd like some help in knowing what the signatures should be, to fit in with Tango's I/O style.

that heap-allocation thing for 'x++' is a nagging concern

Yeah. It's easily fixable in D2 where you can have opAssign(const BigInt?). If it turns out that x++ is important, I can think of three solutions: (1) add some in-place member functions for such operations; (2) add an extra member for storing the lowest bits. (this only helps the ++ and -- cases, though); (3) provide BigIntRef?, with all operations in-place. But I'm hoping none of this is urgent.