FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

ICU bindings for D!
Goto page 1, 2, 3, 4  Next
 
Post new topic   Reply to topic     Forum Index -> Mango
View previous topic :: View next topic  
Author Message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Wed Nov 03, 2004 9:52 pm    Post subject: ICU bindings for D! Reply with quote

Update: Mango.icu now requires ICU v3.2 libraries

Mango has a set of bindings to the extensive ICU project at http://www-306.ibm.com/software/globalization/icu/started.jsp. This effort does not have any reliance whatsoever on other Mango classes, so could easily live elsewhere (mango.icu is just a convenient place to host the files for now).

Currently included:
Character classification
Conversion to and from UTF16, via all the supported transcoders
Preliminary Locale support
Number formatting, and parsing
Resource bundles
UnicodeString class
Message formatting
Numeric spellout
Calendars
Date & time formatting, and parsing
Preliminary time zone support
Character properties
String searching
Text-boundary analysis
Transliteration (with issues)
Normalization
StringPrep
International domain names
Collation
Unicode Set
Regular Expressions

Under way:
(none)

To do:
Character Iteration
Bidi text
Arabic shaping
Universal Timescale (ICU 3.2)


It would be great if some of you good folks have an interest in working on these, since many can be wrapped in isolation (given a few basic classes).

The 'adoption' itself is actually quite mechanical:

1) take a look at the C code and headers, and decide whether to wrap the functionality or expose it directly. Many C functions will be wrapped individually; however, character classification (for example) forgoes the notion of individual wrappers. It's useful to look at the C++ classes to get some ideas too.

2) setup a bunch of class members to hold a pointer to the relevant C function. These are effectively function pointers, and are necessary to support using the ICU dll's on windows (due to the lib format mismatches)

3) for Win32, provide a set of static mappings for a dll-method-loader to bind the appropriate dll methods. For linux, provide a different set of static mappings for a shared-lib loader to munge.

You don't need to wholly understand any of the ICU library to do this, so it's not a huge learning curve. It's perfectly reasonable to take a stab at any particular area, and expose only the ICU methods that appear to be most useful Cool

Testing would also be a great asset! Anyone up for that? Some ongoing compilations by someone with a linux box would also be really helpful ...

Lastly, I will be updating these lists to reflect progress.


Last edited by kris on Sun Mar 06, 2005 2:02 am; edited 15 times in total
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Nov 04, 2004 12:58 am    Post subject: Reply with quote

Thinking about this some more, it should probably be completely isolated from Mango from the start ~ I'll ask Brad to create a new project for it.
Back to top
View user's profile Send private message
teqdruid



Joined: 11 May 2004
Posts: 390
Location: UMD

PostPosted: Thu Nov 04, 2004 12:23 pm    Post subject: Redundant? Reply with quote

I though AJ was working on this project? Although I don't believe we've heard from her lately...
Back to top
View user's profile Send private message Send e-mail AIM Address
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Nov 04, 2004 12:33 pm    Post subject: Reply with quote

I thought so too. Have tried a number of times to contact her regarding progress over the last couple of months (after we had some initial discussions), but there's been no response. Others have apparently tried also, with a similar result. Sometimes people just move on to different things.

I don't much care for letting time slip away, so picked this one up myself. Some help would be very much appreciated though! And, if AJ still has interest in this arena, it would be great to collaborate.
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Thu Nov 04, 2004 1:34 pm    Post subject: Re: ICU port! Reply with quote

kris wrote:


...

<snip>

The 'adoption' itself is actually quite mechanical:

1) take a look at the C code and headers, and decide whether to wrap the functionality or expose it directly. Many C functions will be wrapped individually; however, character classification (for example) forgoes the notion of individual wrappers. It's useful to look at the C++ classes to get some ideas too.


What do you mean by wrapping individually? From what I understand, the ICU interface has two interfaces to it, one C callable and one C++ class based, to facilitate working with either procedural or OO paradigms.

So is the D version going to wrap the C functions in D classes (to model the C++ classes), wrap in D procedural functions, or do both. Maybe I misunderstood what ICU does. Are the C++ and C interfaces mutually exclusive or dependent on eath other?

kris wrote:
2) setup a bunch of class members to hold a pointer to the relevant C function. These are effectively function pointers, and are necessary to support using the ICU dll's on windows (due to the lib format mismatches)


So, relative to my previous question, is this suggesting that the functions will be integrated into a class, or is this a separate step such that we have both D wrapper functions and D class members to accomodate different programming styles? Do we need both? Also if these members are function pointers to the DLL C functions, how do we represent the C "WChar" parameters in D? Does D's "wchar" map properly to it, or do we have to use "short int."?

kris wrote:
3) for Win32, provide a set of static mappings for a dll-method-loader to bind the appropriate dll methods. For linux, provide a different set of static mappings for a shared-lib loader to munge.


Static mappings should be fairly consistant between the two platforms, right? On linux, I guess the "so" libraries will have to be included with the project package also. Either that or users can compile and install the ICU libraries themselves?

kris wrote:
You don't need to wholly understand any of the ICU library to do this, so it's not a huge learning curve. It's perfectly reasonable to take a stab at any particular area, and expose only the ICU methods that appear to be most useful Cool


That's a good thing! Smile

kris wrote:
Testing would also be a great asset! Anyone up for that? Some ongoing compilations by someone with a linux box would also be really helpful ...

I will be starting on the second list tomorrow so, if you can jump in, let's coordinate here?


I should be able to contribute a bit on both Linux and Windows.

Later,

John
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Thu Nov 04, 2004 1:42 pm    Post subject: Reply with quote

One other question...

This project appears more of a ICU "interface" than an ICU "port." Do you think it would be possible to create an actual D port of ICU or would that amount to being a gargantuan project?

For example, perhaps many of the ICU C functions and types could be rapidly ported to D or remodeled as D class members.

This would cut down on a lot of the C++ baggage that the current dll's or so's are carrying. Of course, if there already are ready-made libraries that are practically bug free, maybe such an effort would be totally pointless.
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Nov 04, 2004 2:05 pm    Post subject: Re: ICU port! Reply with quote

JJR wrote:
What do you mean by wrapping individually? From what I understand, the ICU interface has two interfaces to it, one C callable and one C++ class based, to facilitate working with either procedural or OO paradigms.

This particular wrapper is (so far) class-based, with some static methods sprinkled around where appropriate.

JJR wrote:
So is the D version going to wrap the C functions in D classes (to model the C++ classes), wrap in D procedural functions, or do both. Maybe I misunderstood what ICU does. Are the C++ and C interfaces mutually exclusive or dependent on eath other?

I've been wrapping the C functions, and making them look something like the C++ classes. In ICU, the C++ interface does leverage the C API, but there's also quite a bit of reimplementation. It's very deep in functionality, but it's also clear that a lot of 'rethinking' has gone on over the years ... and there are some notable 'holes' in the OO exposure.

kris wrote:
2) setup a bunch of class members to hold a pointer to the relevant C function. These are effectively function pointers, and are necessary to support using the ICU dll's on windows (due to the lib format mismatches)


JJR wrote:
So, relative to my previous question, is this suggesting that the functions will be integrated into a class, or is this a separate step such that we have both D wrapper functions and D class members to accomodate different programming styles? Do we need both? Also if these members are function pointers to the DLL C functions, how do we represent the C "WChar" parameters in D? Does D's "wchar" map properly to it, or do we have to use "short int."?

I've stuck with the assumption that the D wrappers are all class-based (with some static members). I'll check in the current code, so you can see what I mean. No need for 'short int': So far, everything is working well (even the callbacks from C to D).

kris wrote:
3) for Win32, provide a set of static mappings for a dll-method-loader to bind the appropriate dll methods. For linux, provide a different set of static mappings for a shared-lib loader to munge.


JJR wrote:
Static mappings should be fairly consistant between the two platforms, right? On linux, I guess the "so" libraries will have to be included with the project package also. Either that or users can compile and install the ICU libraries themselves?

The exposed API will be indentical, but yes, the internal mappings will be very close: Win32 has a mapping of each function pointer to a char[] literal representing its DLL entry-point name, whereas linux (I think) will map each function pointer to a statically linked lib-entry instead. As I understand it, GDC can link directly to linux shared-libs. This is far from the case under Win32, hence the wierd indirection step. I highly recommend that users should grab the precompiled libraries from the ICU site Wink

JJR wrote:
I should be able to contribute a bit on both Linux and Windows.

Welcome aboard! Shall we switch to email?
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Nov 04, 2004 2:16 pm    Post subject: Reply with quote

JJR wrote:
One other question...

This project appears more of a ICU "interface" than an ICU "port." Do you think it would be possible to create an actual D port of ICU or would that amount to being a gargantuan project?

For example, perhaps many of the ICU C functions and types could be rapidly ported to D or remodeled as D class members.

This would cut down on a lot of the C++ baggage that the current dll's or so's are carrying. Of course, if there already are ready-made libraries that are practically bug free, maybe such an effort would be totally pointless.

There's a bit of both going on: while the UChar and ULocale classes are simply a wrapper, the UConverter class does not have an ICU OO equivalent, and the UString class reimplements the C++ UnicodeString in D, since D is better at that sort of thing. Those particular classes were done first, since most everything else expects them to be available. I think the other functionality will be more straightforward.

Reimplementing the entire library in D would be a huge undertaking, and the maintenance alone should be enough strike fear into the heart of even the bravest soul Shocked

("answer me these questions three ... ere the other side you see")
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Thu Nov 04, 2004 5:24 pm    Post subject: Reply with quote

Kris,

Thanks for all the info. We can continue this via email. I've done an svn update on my system for mango, so I now see your D ICU files.

I'll look at how you've got it structured.

Looks like Mango could grow into quite the swiss army knife. Good show!

I'm still waiting to get my Linux up and running, but I should have it working next week. I've got a bit of fiddling to do with Derelict also.

- John
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Thu Nov 04, 2004 5:26 pm    Post subject: Reply with quote

kris wrote:

("answer me these questions three ... ere the other side you see")


Is that Shakespeare? Perhaps "Macbeth?" Sounds familiar but can't place it.
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Thu Nov 04, 2004 5:28 pm    Post subject: Reply with quote

Ha ha.... I guess I must have been a little off.... Looks like a "Monty Python" quip.

Just a little off... Wink
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Thu Nov 04, 2004 9:02 pm    Post subject: Reply with quote

JJR wrote:
Ha ha.... I guess I must have been a little off.... Looks like a "Monty Python" quip

... from the "bridge of death" scene. Only the bravest and most righteous will prevail Shocked http://www.intriguing.com/mp/_scripts/bridge.txt

Just checked-in UNumberFormat ~ turned out that the C API actually wraps the C++ OO model (with loads of flags, switches, and so on). Wrapping said C API with another external OO model (for D) just makes one's hair stand on end. I think it may be the same for some other formatters also ...
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sat Nov 06, 2004 1:06 am    Post subject: Reply with quote

Checked-in these two wrappers:

Number formatting and parsing
Resource bundles


Also changed the design a bit, which noticably cleaned up some code and will make it simpler to hook up the shared-libs on a linux platform. The ULocale module was reworked, and is now a struct instead of a class.

Tomorrow should see MessageFormat and a bit of Calendar done ...
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sat Nov 06, 2004 4:13 am    Post subject: Reply with quote

Split part of UString into UText, for immutable references. Added read-only support to UString, so it'll operate faster in many cases. Also added toHash(), opCmp() and friends, along with remove() and extract() methods.

The UResourceBundle now returns immutable UText objects.
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sat Nov 06, 2004 5:41 pm    Post subject: Reply with quote

UMessageFormat is checked-in. I ran into a problem with this one, where the module-dtor was being invoked before all object-dtors had been invoked. The result was a method being called in a DLL that had already been unloaded.

For now, I've disabled all library unloads for this package. Win32 seems to handle the refcount appropriately anyway (even for explicitly loaded DLLs) so perhaps it's not really an issue. What's of more concern is the complete lack or order inherent in the D dtor design ... Confused

Also reorganized some methods in UString to make them more uniform and more helpful for other modules which format content into them.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Mango All times are GMT - 6 Hours
Goto page 1, 2, 3, 4  Next
Page 1 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group