Forum Navigation
Use of tango.text.Util.indexOf
Posted: 03/05/08 22:13:29 Modified: 03/05/08 22:15:02Hi,
I am switching from Phobos to Tango, for various reason (first would be presence of XML related function, that Phobos is missing in its 1.0 version, even if I don't really care about alpha/beta for my project), and I'm making some basing String methods, mapping on the one found in Tango.
I have the following:
module lang.String; import TangoUtil = tango.text.Util; import TangoLayout = tango.text.convert.Layout; public uint indexOf(T, U = uint)(T[] str, T c, U offset = 0) { return TangoUtil.indexOf(str[offset..$].ptr, c, str.length); }In fact, it is just like the indexOf function of Tango, except it allow me to use an additional offset which is not present in Tango.
My problem is not with the call in itself, nor it has to do with the interest of doing such thing while Tango seems to do it well (well, if I import the module tango.text.Util, I suppose I'd be able to do "foobar".indexOf('a') or "foobar"[offset..$].indexOf('a').
The problem is : if T is a char, then it is in utf-8.
My question is simple:
assert("é"d.indexOf(cast(char)'é') == 0);(the cast is mandatory, because the D parser will convert it to a wchar (honestly, I don't know why we should fight with three king of char? ...)
How this will work?
- If é is a char, then it takes only 8 bits, but the fact is that in utf-8 'é' is coded 0xC3 0xA9. - If it look up like this :
for (int i = 0; i < length; ++i) { if (array[i] == c) return i; } return length;Then there is no way that it work because the é character take two bytes, so it should be looked on two bytes.
Is there a class, or something better, for dealing with utf-8 compatible operation?
[edit] or am I forced to use wchar?