Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Unicode upper lower and whatever Case

Posted: 09/18/07 00:33:36

Hi !

I just coded a toUpper toLower toTitle (and nearly a toFold) for Unicode, and I really want to submit it to Tango but I have a few design questions.

Since all mappings in Unicode can change the length of the String I need a dynamic resizing buffer. THere are 3 ways to solve this

1) Trivial: Just allocate whatever you need. Since Tango aims for speed this is not really a good option.

2) Class: Make a Converter Class that has the buffer needed as a member. This would negate the need for new allocation every time the function is called

3) User Driven: Make it as toUtf8 made it. Make it possible for the user to pass the buffer to the converter function, so it can be reused.

1 is kinda useless

My personal take would be 2, but then again, I coded a lot of Java the last year, and probably would make anything a class

3 would be consistent with the toUftX methods, which already deal with Unicode stuff.

A second small design decision to make is how to deal with char[] and wchar[]. Since I need the actual Unicode characters I have to convert each to dchar before I can check how to upper or lower the case. the Easy way would be just to call the toUtf32 change the case and then call the appropriate toUtfxx to change it back to the desired form.

The harder way would be to do it on a char by char basis. This would reduce memory usage (complexity wise there is no difference), but the downside is, that I would have to copy & paste large potions of the toUtfxx code.

Since I want my case conversion to fit in with the rest of the code, some input would be apreciated,



Author Message

Posted: 09/18/07 05:46:02

Changing the case of a single dchar can result in a dchar[]? Huh, that never struck me before :)

Posted: 09/18/07 06:45:07 -- Modified: 09/18/07 06:55:01 by
ptriller -- Modified 2 Times

Yep .. it can, thats the way of the Unicodes ....there is a "simpleCaseMapping" which wont change the length of the string, but with full case Mapping strange things can happen .