Forum Navigation
Utf String Iterator
Moderators:
kris
Posted: 07/06/08 12:18:30 Modified: 02/09/09 15:53:19I see many code samples that use the following iteration over a string:
for (int i = 0; i < str.length; ++i) { Stdout(str[i]); // just an example }Which is wrong, since char[] may contain multibyte characters.
foreach (dchar c; str) {} works great, but
a) it doesn't allow iteration over two strings in parallel.
b) it doesn't allow you to keep track over current iterator position. I mean, you can't say
- what's the position (offset) of current character c in the source string
- how much space does it occupy in the source Utf string (1, 2, 3, 4?)
- how many characters left in a string.
c) once you break, you can't continue.
As an example, I will show use simple test case: Given two strings, cut first N characters, that match in both strings, i.e. "Hello, there" and "Hello, World!" -> "there" and "World!" while preserving utf correctness.
Here is my solution:
import std.utf; import std.stdio; struct UtfStringIterator(CharType) { public CharType[] str; // the string public size_t offset; // an offset of the current character public size_t nextOffset; // an offset of the next character. public dchar value; // current character // its length can determined as nextOffset - offset. static UtfStringIterator opCall(CharType[] str) { UtfStringIterator it = void; it.str = str; it.offset = 0; it.nextOffset = 0; it.value = decode(str, it.nextOffset); return it; } bool isValid() { return this.offset < str.length; } void moveNext() { offset = nextOffset; if (isValid()) { value = decode(str, nextOffset); } } int opApply(int delegate(ref dchar d) dg) { while (isValid) { int result = dg(value); if (result != 0) { return result; } moveNext(); } return 0; } } unittest { auto iter = UtfStringIterator!(char)("Hello, World!"); while (iter.isValid) { writef(iter.value); iter.moveNext(); } writefln(); auto iter2 = UtfStringIterator!(wchar)("Hello, World!"w); foreach (dchar c; iter2) { writef(c); } } void main() { string s1 = "Привет, Страна!"; string s2 = "Привет, Мир!"; auto i1 = UtfStringIterator!(char)(s1); auto i2 = UtfStringIterator!(char)(s2); while (true) { if (i1.value != i2.value) { s1 = s1[i1.offset..$]; s2 = s2[i2.offset..$]; break; } i1.moveNext; i2.moveNext; } assert(s1 == "Страна!"); assert(s2 == "Мир!"); for (int i = 0; ; ++i) { if (s1[i] != s2[i]) { s1 = s1[i..$]; s2 = s2[i..$]; break; } } assert(s1 == "Страна!"); assert(s2 == "Мир!"); return 0; }I would be happy to see something like this in Tango.