Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #649 (closed defect: fixed)

Opened 17 years ago

Last modified 17 years ago

toUtf8z bug and performance?

Reported by: keinfarbton Assigned to: kris
Priority: major Milestone: 0.99.2 RC5
Component: Tango Version: 0.99.1 RC4 Keep
Keywords: Cc: larsivi, sean

Description

The current implementation of toUtf8z from tango.stdc.stringz:

char* toUtf8z (char[] s) {
  if (s.ptr)
    s ~= '\0';
  return s.ptr;
}
  1. That means toUtf8z( null ) return null. I think, it shall return a valid pointer to a null character. This is called to ensure this string can be used correctly from C code. But the C code does not necessarilly accept a null pointer. If it is indented behaviour, i think we should have two variants, one which can return null, one which returns always a valid ptr.
  2. It creates heap activity in every case of non zero length string. We could easily check if s is already terminated, and then do nothing. So this method ensures the null termination, but does not always allocate and copy the entire string.
private const char[] nullChar = "\0";
char* toUtf8z (char[] s) {
  if (s.length == 0)
    return nullChar.ptr;
  if (s[$-1] == '\0')
    return s.ptr;
  s ~= '\0';
  return s.ptr;
}

Change History

09/28/07 18:29:19 changed by kris

  • status changed from new to assigned.

returning null (for a null array) is intended, but checking for an existing zero-terminator could certainly be added. Thanks!

09/28/07 18:36:50 changed by keinfarbton

What about a second version for returning always the valid ptr?

09/28/07 18:43:23 changed by kris

I'd rather not. A null input is a zero array pointer (and a zero length) ... that means a zero pointer is returned.

09/28/07 19:06:15 changed by keinfarbton

I had this case:

char[] res;
foreach( str; getRecords() ){
  res ~= str;
}
cfunction( toUtf8z( res ));

The cfunction segfaults for a NULL ptr. I think in the general case it is save to return a valid ptr to a null character. And there should be an explicit case to get null-ptr for a null d array.

toUtf8z vs. toUtf8zNull

09/29/07 12:06:04 changed by keinfarbton

So how would you write this code to work properly?
res would need to be initialized to a valid ptr and length zero.

char[] res = nullString; // can we have this in tango?
foreach( str; getRecords() ){
  res ~= str;
}
cfunction( toUtf8z( res ));

Now it would work, but it means the initialization needs to know about the later use in toUtf8z. Or I can do this:

char[] res;
foreach( str; getRecords() ){
  res ~= str;
}
char* ptr = toUtf8z( res );
cfunction( ptr ? ptr : nullString );

Extra variable declaration, yuck!

Or I need a workaround function in my app.

I think there should be no difference in handling a null intput and a zero length array input, because D does a null initialization.

09/29/07 17:34:37 changed by kris

Sorry Frank, this is application-specific behaviour. What if cfunction() expected and/or could deal with null as valid input? Other folks would (rightfully) complain that a null input should return a null output.

Besides, your example can be written without extra var decls:

char[] res;
foreach( str; getRecords() ){
  res ~= str;
}
cfunction(res.length ? toUtf8z(res) : "");

09/29/07 17:45:44 changed by kris

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [2586]) fixes #649

added short-circuit to avoid appending multiple nulls

09/30/07 15:23:08 changed by keinfarbton

So there is one last question unanswered ...

Why not add the variation which offers to return always the valid ptr? char* toUtf8zPtr(char[] str)

I will stop bugging you with that, from now on. :)

09/30/07 16:54:15 changed by kris

  • cc set to larsivi, sean.

toUtf8zNeverNullPtr!(char[])(str) ?

:)

09/30/07 17:42:13 changed by larsivi

Is this really useful? If that's a yes, I suppose we may include, but I'm currently reluctant.