Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #353 (closed enhancement: fixed)

Opened 13 years ago

Last modified 12 years ago

Non-ASCII appended to Cout.buffer isn't displayed in the Windows console

Reported by: Deewiant Assigned to: kris
Priority: normal Milestone:
Component: IO Version: trunk
Keywords: Cc: deewiant@gmail.com, larsivi, sean

Description

import tango.io.Console;
import tango.stdc.stdio;

void main() {
	fputs("C: a-\xe4-b\n", stdout);
	fflush(stdout);
	Cout.buffer.append("D: a-\xe4-b\r\n");
	Cout.buffer.flush();
}

Expected output:

C: a-X-b
D: a-X-b

Where X is some special character whose precise appearance, I think, depends on locale settings. On my computer it's รต. Anyway, what I get in cmd.exe is:

C: a-X-b
D: a--b

Redirecting the output to a file, or using one of Cygwin's terminals, I get the correct a-X-b in both cases. I find it very strange that the output doesn't show up in Tango, and I'm wondering why.

I can just use the C functions, of course, so this isn't that big a problem.

Is there some other method of writing directly to stdout which would work?

Attachments

RawCoutFilter.d (2.1 kB) - added by Deewiant on 07/11/07 17:40:03.
RawCoutFilter.2.d (1.9 kB) - added by Deewiant on 08/30/07 19:39:00.

Change History

03/24/07 12:26:38 changed by Deewiant

  • owner changed from sean to kris.
  • component changed from Core Functionality to IO.

D'oh, forgot to set the component.

03/24/07 14:34:12 changed by larsivi

  • milestone set to 0.97 RC 1.

03/25/07 16:24:33 changed by kris

  • status changed from new to assigned.

odd ...

is this a UTF8 char?

03/25/07 16:33:21 changed by Deewiant

No, it's not. I'm trying to bypass any UTF translation or screening and just output "raw" characters.

03/25/07 16:44:30 changed by kris

Ah right;

Tango console is UTF8 only; doesn't support "codepage" characters. This is a partly a cross-platform issue, since *nix is utf8 also. Not exactly sure what happens to your char, but under win32 it will be sent through the Win32 utf8toUtf16 converter for display (Win32 console "raw chars" are utf16; you have to go through a different set of converters for codepage support).

We're not quite sure what to do (officially) about codepage support at this time, but it may be added as an optional layer above the console

03/25/07 16:56:30 changed by Deewiant

Darn. I thought going straight to Cout.buffer would have done the job.

It's not really codepage support I want: what I want is a way to output a given ubyte, and let the user worry about whether it's in the right encoding, whether it displays as he expected, or not. And the behaviour should be the same regardless of whether we're on *nix or Windows.

Like I noted, the C library, with (f)printf and (f)puts, allows me to do this, but I'd prefer a (relatively simple) way of doing it in Tango.

04/05/07 01:19:30 changed by kris

  • status changed from assigned to closed.
  • resolution set to wontfix.

There's a new CodePage? module in tango.sys which might be of help on Windows systems? Basically, you could use it to convert code-page characters into utf8. Hope that might help

04/05/07 07:24:34 changed by Deewiant

I don't want to convert to UTF-8, I need to output the characters as codepage characters. For instance, the following code, when redirected to a file, should not have the file contain the bytes 0xc3 0xa4 (U+00e4 in UTF-8), it should contain the byte 0xe4 and nothing more.

ubyte ch = 0xe4;
Cout.putChar(ch);

The reason is that there's no way of knowing for certain what codepage the character is in, and so there's no way of reliably translating it to UTF-8. So I wish to just output it and let the user worry.

04/05/07 08:25:46 changed by kris

I'm sorry, Deewiant

emitting arbitrary content to a file is fully supported; but not the console. For example, the console output is converted to utf16 for display on Win32 (from utf8), so there's no way (in Tango) to emit console content that cannot be represented as utf8 e.g. tango.io.Console is intentionally exposed as a pure utf8 device. This enables it to operate identically across supported platforms.

Emitting to a file is straightforward though; and if you need to hookup the equivalent of Stdout to the file output, tie an instance of tango.io.Print to the buffer representing the file. I'd imagine you'd be using buffer.append, or conduit.write directly though?

04/05/07 08:33:39 changed by Deewiant

I guess I'll be using the C standard library then, since I need to be able to output to stdout and stderr. Oh well.

04/05/07 10:33:38 changed by larsivi

If there is a clean way to solve this, we would really like to, but the Windows API's aren't helping. If I understand your wish correctly, it does work as you want on Linux. The problems on Windows stems from it using wchars for unicode output, and we don't want to go away from unicode support.

If you have a suggestion on how this can be fixed in a transparent, portable and easy way, feel free :)

04/05/07 13:45:05 changed by Deewiant

Dammit, should have previewed. Apparently two vertical bars are translated to a <td> tag. Sorry about that.

04/05/07 13:55:32 changed by larsivi

Deewiant; I deleted your comment and reposts it below (and wow, the td broke the page!):

Here's a C program which does the trick, writing to stdout using only the Windows API:

#include <stdio.h> 
#include <windows.h> 
int main() { 
    HANDLE hStdout = GetStdHandle(STD_OUTPUT_HANDLE); 
    char* string = "a-\xe4-b\r\n"; 
    int len = 7; 
    DWORD writtenLen; 
    if (!WriteFile(hStdout, string, len, &writtenLen, NULL)) 
        printf("Oops: error code %d\n", GetLastError()); 
    if (len != writtenLen) 
        printf("Oops: didn't write everything\n"); 
} 

That is, just use WriteFile? (or WriteConsoleA, but then you have to check for redirection and use WriteFile? anyway) instead of WriteConsoleW. Taking a quick look at the source of tango.io.Console, it seems that everything goes through ConsoleConduit?.writer(void[]). Perhaps a user-settable flag "unicodeTranslation" on the Console.Output classes (which would default to true) would do the trick: there's already a check for redirection, just change the "if (redirect)" to "if (redirect || !unicodeTranslation) ", since DeviceConduit?.writer appears to do the right thing already. Which gives me a thought: a problem with the current approach is that you get different output when redirecting and when not redirecting, since the WriteFile? case doesn't translate the output. I can't think of an example (other than my own situation, of course ;-)), but I'm sure this will be a problem for someone unless there's an option to do something about it: either my suggestion of forcing no translation, or a way of forcing translation even when redirecting - ideally, both. The library user can, of course, check for redirection himself and then transform strings if necessary, but that undermines the whole system in place.

04/05/07 14:04:28 changed by Deewiant

Thanks, though it's a bit harder to read in one paragraph.

04/05/07 14:06:37 changed by larsivi

Sorry about that - was a bit triggerhappy and forgot how it looked - was able to reindent the source though ;)

04/05/07 17:28:47 changed by kris

Tango is a unicode library. Using WriteFile?() drops support for unicode, so it cannot be used in place of WriteConsole?.

Redirection is not a problem with Tango, since on all platforms the console is specified as being utf8; it does become a serious problem without that stipulation, in that there would be little or no implication on what the encoding might be when redirecting.

If you choose to put something other than utf8 into a char[], you are then breaking the implicit usage of the D char[] (utf conversions will fail). Yes, the example just happens to work on Linux; and yes, there are no explicit checks to make sure you're not putting non-utf8 into a char[] (for input or output). That does not validate what you are doing, which seem to be an edge condition at best?

04/05/07 17:49:27 changed by kris

If you really need to bypass ConsoleConduit?, then you could replace the conduit used by Cout by installing your own at runtime. To do this, create a new DeviceConduit? with an appropriate instance of FileDevice? (using a win32 stdout Handle).

Then, replace the Cout conduit like so: Cout.buffer.setConduit(myConduit);

This is not recommended, or even supported per se, but it will disable unicode translation on the Win32 console output.

04/05/07 18:05:14 changed by Deewiant

It doesn't matter whether char[] or not, the same thing occurs, since Buffer.append takes void[], not char[]:

import tango.io.Console;

void main() {
	ubyte[] x = cast(ubyte[])"a-\xe4-b\r\n";
	Cout.buffer.append(x);
}

Might be worth changing these void[]s into char[]s if that's all they're meant to accept.

I didn't mean using WriteFile? in place of, but as an alternative to, WriteConsole?. What I'm saying is that adding a parameter which toggles whether the console is considered as being UTF-8 or "an arbitrary encoding" would be handy, and not very difficult, since (as far as I can see) the only change (in addition to adding the boolean) would be in the if statement in ConsoleConduit?.writer (and, for completeness's sake, reader). That way, by default, everything would be as it is now, but one would have the option of bypassing Unicode.

This is turning into a question of the Tango philosophy: if "Tango is a Unicode library" through and through, there's no reasonable way of adding what I want without violating that principle.

(BTW, I didn't know it works on Linux. Cool that it does.)

-- (answering your newer post)

I didn't think of replacing the conduit. That's handy, and preferable to using the C library. I'll do that. Of course, I'd prefer if I could just set a boolean and leave it at that. <g>

How isn't it supported? Can I rely on it to work in future versions? Do I need to do the same thing on Linux, or can I rely on it to keep working as it does now with the default conduit?

04/05/07 18:21:31 changed by kris

There would be no toggle made available, since we'd then be providing a special-case (the avoidance of which is a Tango philosophy) and it would be for a condition that we just cannot condone as being entirely legitimate.

As for Buffer accepting a void[], you are already going under the covers there: Cout accepts char[] only. There's no points to be scored there :)

---

You should be doing the same thing on linux, since there is no guarantee the linux ConsoleConduit? will not change in the future. The ability to replace the conduit will likely remain, but the default behaviour of the console is for us to change as necessary :)

04/05/07 18:39:10 changed by kris

There's also a redirected() method in the Console now, which you might find useful for some things

04/05/07 18:53:59 changed by Deewiant

Great, thanks. I can work with this.

04/21/07 05:33:18 changed by kris

deewiant ...

I just changed the Conduit ctor signature for DeviceConduit?, and it may affect you? Basically, I got rid rid of the FileDevice? aggregate, and replace the ctor(FileDevice?) with ctor(Access, Handle) instead.

Hope that doesn't cause problems for you

04/21/07 07:03:57 changed by Deewiant

Nah, I'm good.

(follow-up: ↓ 26 ) 06/07/07 18:05:23 changed by Deewiant

Changeset 2257 messes things up for me. The following no longer works:

Stdout.buffer.setConduit(new RawCoutConduit);

My current hack is to do:

(cast(Buffer)Stdout.stream).setConduit(new RawCoutConduit);

I haven't tried this, but another option might be to reinitialize Cout and Stdout completely with:

auto c = new RawCoutConduit;
Cout = new typeof(Cout)(c, c.redirected);
Stdout = new typeof(Stdout)(Stdout.layout, Cout.stream);

However, I'm getting more uncomfortable with these solutions, as I'm clearly fighting the library. Re-initializing provided globals can't be good, and with D 2.0 I wouldn't be surprised to find Cout/Stdout and family to become final/const/invariant, blowing me completely out of the water.

A setConduit method on the Console struct, just routing the call to the Buffer, would suffice, and I think it's the best solution. Alternatively, some method of getting the Buffer as a Buffer instead of an OutputStream/InputStream?, but the whole point of Changeset 2257 was to avoid that.

Or is there a simpler way of bypassing UTF translation that I'm unaware of?

(in reply to: ↑ 25 ) 06/07/07 18:20:40 changed by kris

Replying to Deewiant:

Changeset 2257 messes things up for me. The following no longer works: {{{ Stdout.buffer.setConduit(new RawCoutConduit?); }}} My current hack is to do: {{{ (cast(Buffer)Stdout.stream).setConduit(new RawCoutConduit?); }}} I haven't tried this, but another option might be to reinitialize Cout and Stdout completely with: {{{ auto c = new RawCoutConduit?; Cout = new typeof(Cout)(c, c.redirected); Stdout = new typeof(Stdout)(Stdout.layout, Cout.stream); }}} However, I'm getting more uncomfortable with these solutions, as I'm clearly fighting the library. Re-initializing provided globals can't be good, and with D 2.0 I wouldn't be surprised to find Cout/Stdout and family to become final/const/invariant, blowing me completely out of the water. A setConduit method on the Console struct, just routing the call to the Buffer, would suffice, and I think it's the best solution. Alternatively, some method of getting the Buffer as a Buffer instead of an OutputStream/InputStream?, but the whole point of Changeset 2257 was to avoid that. Or is there a simpler way of bypassing UTF translation that I'm unaware of?

ach ... sorry dude :(

Will give it some more thought

06/07/07 18:33:07 changed by Deewiant

Thanks for that and the quick reply.

I still think that offering something like my own RawCoutConduit? in the library itself would be the truly best solution. ;-)

06/08/07 06:13:25 changed by kris

Part of that update introduced a strengthened version of Conduit filters. These are attached to either the input or output of a conduit and intercept each read/write operation. Attaching a filter is performed via a conduit function, and the conduit can be accessed through Cout.stream.conduit.

Filters are chained together, with the last one added being the first invoked on read/write, and each filter is expected to invoke then next one in the chain. A filter can thus modulate content as it flows through the conduit, or it can 'hijack' it (simply by not invoking the next filter in the chain).

I suspect your 'raw output' handler could perhaps act as an output filter? Either by converting the output to utf8 for the conduit to consume, or by hijacking the output entirely? Should be almost trivial to convert the code to an OutputStream?, which is what an output filter is (declared in tango.io.model.IConduit)

06/08/07 15:28:51 changed by Deewiant

Can you give me some example code demonstrating these new filters? Something simply passing the output through and doing nothing would probably suffice for me to figure it out.

I tried just:

Stdout.stream.conduit.attach(new RawCoutStream(false));

But the filter seemed to be completely ignored, none of its functions were called.

07/11/07 17:39:16 changed by Deewiant

  • status changed from closed to reopened.
  • resolution deleted.
  • version set to trunk.
  • type changed from defect to enhancement.
  • milestone deleted.

Reopening as an enhancement, and attaching a capturing filter which does raw output to the console both in Windows and on Posix systems.

07/11/07 17:40:03 changed by Deewiant

  • attachment RawCoutFilter.d added.

08/17/07 17:32:34 changed by kris

Ach. I'm afraid this is gonna have to change one last time :(

The console has a divert() method, which installs a replacement conduit and manages other attributes. Thus, we'll need to revert your filter to a conduit, as it was before. I'll do that, if you like, and attach the results?

08/17/07 17:40:32 changed by Deewiant

Meh. Why the change this time? What's the need for divert()? You mentioned "hacking the filter-chain" in changeset 2490: I actually thought capturing filters were a pretty good idea.

Go ahead and do the conversion, you'll spare me the trouble. I don't have the original conduit version any more, anyway. (But then, I suppose it wouldn't compile any more, regardless.)

08/17/07 17:51:28 changed by kris

Capturing the filter chain is a fine approach, but the way in which the console supported that was all wrong. It had to rely on a 'notification' from the conduit that things had changed, which was both too brittle and limiting.

With the change you get to switch the entire conduit, along with all filters, and manipulate the redirection state (which was missing before). Alternatively, you can add your filter to the existing conduit, and then reset that conduit back into the console.

Thus you don't have to revert the filter, per se; just the way in which it is attached. The choice is yours ;)

08/17/07 17:57:27 changed by Deewiant

Alright, cool. That makes sense.

08/28/07 06:37:45 changed by kris

  • status changed from reopened to closed.
  • resolution set to fixed.

Don't really wish to place the filter into the Tango core, per se, since we want to focus on UTF instead.

We do need a good place for Tango extensions though ...

08/28/07 06:50:01 changed by Deewiant

I just think that as D is a systems programming language, the library shouldn't limit us to too high levels of abstraction. I mean, we have inline assembly statements, yet we have to call the OS functions ourselves if we want to output encoding-independent bytes? It's not that rare a use case, I think it's better for a program to do that to unknown-encoding user input than to complain about how it's not UTF.

Some sort of standard Tango extensions repository would be a good idea in any case.

08/28/07 07:33:23 changed by kris

  • cc changed from deewiant@gmail.com to deewiant@gmail.com, larsivi, sean.

Good and valid points.

Looking through the tickets I see a number of really useful facilities that don't necessarily fit perfectly for one reason or other. What's missing right now is a staging area where others can easily get their hands on such things when they're not yet included as part of Tango. Some kind of polling system might be useful also, so that folks could indicate what's important to them?

Anyway, I'll talk with Larsivi and Sean about setting up a specific repo for this purpose.

08/30/07 19:15:18 changed by kris

Can you attach the latest filter, please, Deewiant?

08/30/07 19:38:17 changed by Deewiant

I haven't made any significant changes since your posts on 2007-08-17, I'm afraid. The currently attached one is for all practical purposes the latest version. I'll attach the one I've got anyway, but it's almost exactly the same, and thus won't compile against the latest SVN of Tango.

If you need it, I can probably turn it into a conduit tomorrow. Of course, you have the most understanding of the IO system and how it "should be done", so it'd be best if you did it. There's not much code: it's mostly just ConsoleConduit?'s main output method converted to use WriteConsoleA.

One thing to keep in mind is Ticket #542: the limit for this one should be exactly twice what ConsoleConduit? uses (since wchar.sizeof == 2 * char.sizeof).

08/30/07 19:39:00 changed by Deewiant

  • attachment RawCoutFilter.2.d added.

08/30/07 19:58:36 changed by kris

thanks!

Like we'd discussed, you could leave the filter as is but change the method of attachment? e.g.

import tango.io.Console;

auto filter = new MyConsoleFilter(Cout.stream);
Cout.divert (filter.conduit, Cout.redirected);

?

08/30/07 20:31:04 changed by kris

I've copied this over to tango.scrapple for now, so it's there for others to use. Hope that's ok, Deewiant?

08/31/07 08:24:18 changed by Deewiant

Yeah, that's fine by me.