Forum Navigation
Tokenizer / buffer.next method
Posted: 08/16/07 15:53:15Hi guys,
currently I'm writing a tokenizer/lexer for a programming language. Therefore I use the next() method of my FileConduit?-Buffer, which calls a user defined parse function, which in turn parses the input, calls for more via returning IConduit.Eof or returns the number of chars actually used for that token.
First question is, whether this is the intended way or is there a better option in Tango?!
If this is supposed to work that way, I have a problem: In a tokenizer you often check, if the current set of read characters is your token by reading the next one. But what to do if the next one is not in the buffer? You call for more, returning IConduit.Eof and in the next run of your function you hopefully get more data to decide. But what if there is end of file? The buffer will skip (readable) and be happy, although this might have been a valid token.
Example:
Buffer has space for 1024 bytes, currently there is only "+" inside and there is no further data in the attached conduit.
My parse function which I pass to next() does the following, which normally works like a charm and, by the way, is very elegant thanks to this architecture!
if (data[0] == '+') { // now we have to decide with the next char if (data.length <= 1) { // call for more data (*) return IConduit.Eof; } if (data[1] == '=') { // token is +=, return 2 } else { // token is +, return 1 } }Is there a clean way to get information, whether the buffer reached Eof? If this was possible, I would be really happy, as my tokenizer-code looks beatiful and I don't like to change the archtitecture too much ;)
PS: In "normal" lexers, this problem does not exists, because you always have your '\0' character which is then different from '=' and thus validates the '+' token.