FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Matching /* .. */ and repl. #include with import

 
Post new topic   Reply to topic     Forum Index -> The Language Machine
View previous topic :: View next topic  
Author Message
Marenz



Joined: 13 Mar 2009
Posts: 4

PostPosted: Sun Feb 07, 2010 4:58 pm    Post subject: Matching /* .. */ and repl. #include with import Reply with quote

Hello there,

I am fascinated by tlm and eager to learn how to work with it.
For my training, I have have a small c file that I want to convert to D.

Now, my plan was to do that line by line:
run the file through my rules, get the first line that is wrong and
create a rule so it is right.

Now my first line was a multi-line comment. Altough I don't need to convert or remove comments, I just tried it:

Code:

 -out <- eof -;
 - '/*' repeat anything '*/' <- eof - 'comment!';


as you can see, I outputed 'comment!' just to see when/where the replace would happen.
It kinda worked. but it matched already on '\*' so it was not quite my goal.

I couldn't figure out a way to do it, so I looked for more examples and finally found that in the j2d converter:

Code:

- '/*' comment '*/' <- eof -;
 '*/' <- comment '*/';
 - anything <- comment -;
 eof <- comment '*/' eof ;


and guess what: It works. of course.
But why? And why did mine not work? Could you explain to me in detail what exactly happens here?


Why is
Code:
- anything <- comment -;

not always matched?
The "-" says 'try this rule, no matter what input symbol'.
Then it says, 'match anything'.
Having comment written on the right side..
does it mean, 'try this rule only, if you try to match the symbol comment, and replace it with the empty string'?


Another thing I try to do: replacing #include <file.h> with import tango.stdc.file;
this is my try:

Code:

 '#include <' :Import '.h>' <- eof - 'import tango.stdc.' Import ';\n';


but it has no effect :/

This is my second try, copying what i saw in the comment rules:

Code:
'#include <' headerfile :A '.h>' <- eof - 'import tango.stdc.' A ';\n';
 '.h>'                                           <- headerfile '.h>';
 eof                                             <- headerfile '.h>' eof;
 - anything %                                  <- headerfile -   ;


It replaces any #include with
Code:
import tango.stdc.headerfile;

I don't know why though.

if I write
Code:
 - anything %; (A)                         <- headerfile -   ;

like in the example at http://languagemachine.sourceforge.net/lexicalbuffer.html lm segfaults.

--Marenz
Back to top
View user's profile Send private message
mp4



Joined: 22 Jun 2007
Posts: 19

PostPosted: Sat Feb 27, 2010 7:29 am    Post subject: about comments Reply with quote

Code:
-out <- eof -;
 - '/*' repeat anything '*/' <- eof - 'comment!';


This does work because

it is the same as

Code:
-out <- eof -;
 - '/*' repeat { anything '*/' } <- eof - 'comment!';


As repeat can match zero times, it can match /* only...
Also, a repetition is anything+ */ which is not correct.

Code:
-out <- eof -;
 - '/*' repeat { anything } '*/'  <- eof - 'comment!';


does not work either because anything is greedy, eats up all characters, so */ cannot be matched so the second rule fails...

So because of the greediness of anything, another solution is needed.

So one may wonder how the solution works.


Code:
- '/*' comment '*/' <- eof -;
 '*/' <- comment '*/';
 - anything <- comment -;
 eof <- comment '*/' eof ;

The lm tries to reach eof. There is no eof in the queque.
So can it get eof?
Eof can be achieved through rule 1.

So it tries rule 1.
It matches /*. Then it reaches comment. There is no comment.
So how it get there?
It finds rule 3 as a way to there.
So it executes rule 3 unless it finds */ in the stream.
if it finds */ in the stream, then it gets comment + */..
That is nice because it can continue rule 1. So it continues rule 1.

These rules are not completely correct.
This input is not handled "gggg/*comment*/".
Rule 4 is used if the input is an unclosed comment.
E.g. /* + end of file.
In that case, rule 4 "autocompletes /*" and it becomes a full comment /**/.

Quote:
Why is Code:
- anything <- comment -;


not always matched?
The "-" says 'try this rule, no matter what input symbol'.
Then it says, 'match anything'.
Having comment written on the right side..
does it mean, 'try this rule only, if you try to match the symbol comment, and replace it with the empty string'?

It does not look so.
it seems to be try this rule if rule 2 or rule 4 does not generate a comment. And try rule if you need a comment...
Back to top
View user's profile Send private message
mp4



Joined: 22 Jun 2007
Posts: 19

PostPosted: Tue Mar 02, 2010 7:45 am    Post subject: Reply with quote

Code:
'#include <' headerfile :A '.h>' <- eof - 'import tango.stdc.' A ';\n';
 '.h>'                                           <- headerfile '.h>';
 eof                                             <- headerfile '.h>' eof;
 - anything %                                  <- headerfile -   ;


The question is why this produces import tango.stdc.headerfile.

If the input is #include <test.h>

rule 1 reads until <. Then it encounters headfile. There is no headfile symbol in the stream. Rule 4 says it can help find headerfile.

So Rule 4 processed (% has no effect, it collects into a temporary buffer, which is not used in rule 4)
until Rule 2 matches .h>. but then headerfile '.h>' is in the stream...

That does not match against headfile :A in Rule 1. So Rule 1 fails...
:A is expected by Rule 1 and it is not produced...
I wonder how it did produce tango.stdc.headerfile for you.

Here it is a working version

Code:
 '#include <' headerfile :A '.h>' <- eof - translation "import tango.stdc." translation A translation ";\n" ;

 - {repeat {.[^.]} %} toSym :Filename         <- headerfile :Filename   ;
 -translation out <- eof -;


Rule 1,2 just produce the translation. Rule 3 prints the translation.

:A is produced by rule 2 as :Filename.
rule2 reads the filename and I assume it does not contain "." (It is more like filename without extension). .[^.] means read a char which is not ".".
% puts the read filename into a temporary buffer.
That buffer is the argument of toSym "function" which produces a symbol out of it. The symbol is stored in Filename variable.
I just changed the single quote on the right hand side into double quotes.
I think double quotes specify a symbol.
Why to produce symbols? Because they stay in one piece.
If I used 'import tango.stdc' that would be broken up into i m p o r t ... etc (I mean into characters which I do not want as it makes processing difficult for rule 3).
So when rule 3 starts at the end, it prints import tango.. stc in one piece instread of printing the first character only.
Translation is a symbol, its purpose is to separate the translation from the source in the stream so the two are not mixed up. This way the translation is not reanalysed by rule 1 and rule 2.
Back to top
View user's profile Send private message
Marenz



Joined: 13 Mar 2009
Posts: 4

PostPosted: Fri Mar 05, 2010 5:53 pm    Post subject: Reply with quote

First of all: Thanks for your time and your answers, I appreciate it, especially because every now and then, reading the code makes my head explode. It's not that easy to get used to that thinking.

Alright I've been reading and playing around with lm again.

Apparently I even found a solution to my later problem in the post after I posted, but forgot about it, it is:

Code:

 '#include <' var A; headerfile '.h>'   <- eof - 'import tango.stdc.' A ';';
matches only when only .h> is left in the input, this is tried first, but fails until the whole filename has been matched by the last rule
 '.h>'                   <- headerfile '.h>';
matches one charackter but does not advance in the rule-matching, so still searchign for symbol headerfile
 - (A)                      <- headerfile -  ;




So, it uses the (Variable) approach rather then the % one. I have no idea which is better. One difference is, that mine allows dots and similar in the filename.

While thinking about what I need to convert c++ to D, I remembered that c++ likes to seperate class declerations and definitons. Luckily I found the builtin function "include" and I even got it to work. (man, that was a happy moment for me).

Currently, I am thinking about how to convert this:
Code:

#include "Somefolder/SomeFile"  <-  import Somefolder.SomeFile;


including several subfolders and filenames with and without postfix, but always using the basefilename for the import.

Right now I have only a ruffly idea how I could do this, my try so far is

Code:

 
 '#include "' var A; importpath '"'    <- eof - "import " importpath ';' ;
I don't make use of the var P yet. Not sure how. or if at all?
 - var P; path filename                  <- importpath ;
I think here I need to add a dot somehow to the variable for the import path... but how?
 -  repeat { folder '/' }                  <- path;
 - '/'                           <- folder '/' ;
 - (F)                           <- folder -  ;
 - '"'                           <- filename '"' ;
 - (N)                           <- filename - ;
 



Note that this is just some thinking, I haven't tried this yet and I doubt that it works. I wrote it to show how I think it might could be done.
Back to top
View user's profile Send private message
mp4



Joined: 22 Jun 2007
Posts: 19

PostPosted: Sat Mar 06, 2010 5:44 am    Post subject: next Reply with quote

Quote:
So, it uses the (Variable) approach rather then the % one. I have no idea which is better. One difference is, that mine allows dots and similar in the filename.


I think (Variable) approach is better as it is more general/flexible.
It reads up to the separator (.h>) . This cannot be done with %. I see no ways of doing that.

I found/tried another way of printing the translation:

Code:
 '#include <' var A; headerfile '.h>'   <- eof - t "import tango.stdc."  A  ";" te; // t is start, te is the end of translation

 '.h>'                   <- headerfile '.h>';
 
 - (A)                      <- headerfile -  ;
 
 - t targs <- eof -;
  -out <- targs -;
  te <- targs;


How do you print your translations?
[/quote]
Back to top
View user's profile Send private message
Marenz



Joined: 13 Mar 2009
Posts: 4

PostPosted: Sat Mar 06, 2010 5:48 am    Post subject: Reply with quote

ah right, the printing. I have this at the start

Code:
 - out <- eof -;
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> The Language Machine All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group