Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

TimeStamp.parse

Moderators: kris

Posted: 03/17/09 14:19:31 Modified: 03/20/09 10:35:13

Hey,

I've got a little problem with TimeStamp?.parse and TimeSpan?
functions. I used the following code to calculate the difference
between to times:

int getTimeDifference(char[] first_timestamp, char[] last_timestamp) 
{
    auto t_in   = TimeStamp.parse(first_timestamp.dup);
    auto t_out  = TimeStamp.parse(last_timestamp.dup);
    auto t      = t_out.span() - t_in.span(); 

    return t.seconds();
}

I run through thousands of log file lines and on some of these
lines I got wrong values by the TimeStamp? parser. Here is a output
(the first to numbers are the converted times and behind the original
chars..

Output

t_in.span() t_out.span()  first_timestamp              last_timestamp
63348801727 63348801727   'Wed Jun 11 17:22:07 2008' - 'Wed Jun 11 17:22:07 2008'       
63348800663 63348800663   'Wed Jun 11 17:04:23 2008' - 'Wed Jun 11 17:04:23 2008'
315537897599 315537897599 'Wed Jun 11 17:00:32 2008' - 'Wed Jun 11 17:00:32 2008'
63348801826 63348801847   'Wed Jun 11 17:23:46 2008' - 'Wed Jun 11 17:24:07 2008'       
63348802335 63348802335   'Wed Jun 11 17:32:15 2008' - 'Wed Jun 11 17:32:15 2008'        
63348803866 63348803866   'Wed Jun 11 17:57:46 2008' - 'Wed Jun 11 17:57:46 2008'        
63348804450 63348804507   'Wed Jun 11 18:07:30 2008' - 'Wed Jun 11 18:08:27 2008'        
315537897599 63348805511  'Wed Jun 11 18:25:00 2008' - 'Wed Jun 11 18:25:11 2008'

What could be the problem, that the parser creates these strange values? [[BR]]
I almost added .dup everywhere, where a char[] is involved to be sure that [[BR]]
no reference is handled. Any ideas?

/L

Author Message

Posted: 03/17/09 20:46:09

What generated the log files and, specifically, the 315537897599 values?

Posted: 03/18/09 11:23:51

I did the following:

    auto ts=TimeStamp.parse("Wed Jun 11 17:00:32 2008");
    Stdout("timestamp:")(ts.span.ticks()).newline;
    ts=TimeStamp.parse("Wed Jun 11 17:04:23 2008");
    Stdout("timestamp:")(ts.span.ticks()).newline;

and got the following result

timestamp:633488004320000000
timestamp:633488006630000000

which seems correct...

Posted: 03/18/09 12:42:25

Works for me:

import TimeStamp = tango.text.convert.TimeStamp;
import tango.io.Stdout;

void main()
{
    auto ts1 = [
        "Wed Jun 11 17:22:07 2008", "Wed Jun 11 17:22:07 2008",
        "Wed Jun 11 17:04:23 2008", "Wed Jun 11 17:04:23 2008",
        "Wed Jun 11 17:00:32 2008", "Wed Jun 11 17:00:32 2008",
        "Wed Jun 11 17:23:46 2008", "Wed Jun 11 17:24:07 2008",
        "Wed Jun 11 17:32:15 2008", "Wed Jun 11 17:32:15 2008",
        "Wed Jun 11 17:57:46 2008", "Wed Jun 11 17:57:46 2008",
        "Wed Jun 11 18:07:30 2008", "Wed Jun 11 18:08:27 2008",
        "Wed Jun 11 18:25:00 2008", "Wed Jun 11 18:25:11 2008",
        ];

    foreach(ts; ts1)
    {
        Stdout.formatln("{} = {}", ts, TimeStamp.parse(ts).span.seconds);
    }
}

output:

[steves@localhost testing]$ ./testtimestamp
Wed Jun 11 17:22:07 2008 = 63348801727
Wed Jun 11 17:22:07 2008 = 63348801727
Wed Jun 11 17:04:23 2008 = 63348800663
Wed Jun 11 17:04:23 2008 = 63348800663
Wed Jun 11 17:00:32 2008 = 63348800432
Wed Jun 11 17:00:32 2008 = 63348800432
Wed Jun 11 17:23:46 2008 = 63348801826
Wed Jun 11 17:24:07 2008 = 63348801847
Wed Jun 11 17:32:15 2008 = 63348802335
Wed Jun 11 17:32:15 2008 = 63348802335
Wed Jun 11 17:57:46 2008 = 63348803866
Wed Jun 11 17:57:46 2008 = 63348803866
Wed Jun 11 18:07:30 2008 = 63348804450
Wed Jun 11 18:08:27 2008 = 63348804507
Wed Jun 11 18:25:00 2008 = 63348805500
Wed Jun 11 18:25:11 2008 = 63348805511

dmd version is 1.038
tango is trunk revision 4396 (not that old)
OS is Linux

Can you divulge the details of your environment? There were some pretty obscure bugs in some dmd compilers that caused issues with timestamps if I recall correctly.

Also, FYI, you do not need to convert to TimeSpan? to subtract two Times, you should be able to change your line to:

auto t = t_out - t_in; // type of t is TimeSpan

That is the point of having two types, to define what arithmetic you can and can't do and what types result :)

Posted: 03/19/09 10:31:31 -- Modified: 03/19/09 17:24:50 by
lars_kirchhoff -- Modified 2 Times

thanks for the replies.. I'm sorry I wasn't that clear in my previous post.

The parsing does work fine when it takes single values. When I use the timestamps, which produced the wrong results, in a programm that just converts a single datestamp no error appear.

I first thought the problem would be related to wrong encodings in the timestamp and therefore run the test on single datestamps. All seems fine if converted one by a time. But when used in a loop and run over a lot of datestamps the error occur.

kris wrote:

What generated the log files and, specifically, the 315537897599 values?

The 315537897599 was produced by:

    auto t_in   = TimeStamp.parse(first_timestamp.dup);
    t_in.span().seconds; 
schveiguy wrote:

Also, FYI, you do not need to convert to TimeSpan?? to subtract two Times, you should be able to change your line to:

      auto t = t_out - t_in; // type of t is TimeSpan

Thanks that is what I did in first place, but in trying to understand what happens I took it apart.

Environment: OS is openSuSE 11.3
DMD is 1.033
TANGO I'm not quite sure which version right now..

I'll try to update dmd first to see if this changes anything.

Posted: 03/19/09 14:57:14 -- Modified: 03/20/09 14:08:53 by
lars_kirchhoff -- Modified 2 Times

ok.. its not a TimeStamp?.parse problem. I've just created a file that only
contains the dates and parsed every line. No error occured.

But as soon as I use the Tango Text Util function (split,substitute) to get the
date from the log file I have problems with the TimeStamp?.parse function. This
is the code to extract the date from the log file:

    char[][] line_tokens = split(line, "]");    
    char[] timestamp = substitute(line_tokens[0], "[", "");
    Time t = TimeStamp.parse(timestamp);
    Stdout.formatln("{}", t.span.seconds);

Using dup doesnt make a difference.

I updated to dmd-1.039 and the recent Tango trunk (4417).

/L

Posted: 03/19/09 19:07:54

It's nearly impossible to help unless you show us what the input data actually looks like. Can you provide a failing example, similar to the above, that has 'line' replaced with a text string instead?

Posted: 03/20/09 10:33:09 -- Modified: 03/20/09 10:33:48 by
lars_kirchhoff

Ok here is the test file. Its an extract from the log file I would like to analyze.

And here is the code that produces the wrong results:

module      timeparse;

private     import      tango.io.Stdout, tango.io.device.File, tango.io.stream.Lines;
private     import      TimeStamp = tango.text.convert.TimeStamp,
                        tango.text.Util : containsPattern, trim, substitute, split;
private     import      tango.time.Time;

void main(char[][] args) 
{ 
    int line_no=1;    
    File fi = new File(args[1]);
        
    foreach (line; new Lines!(char)(fi)) 
    {        
        char[][] line_token = split(line, "]");        
        char[] timestamp = substitute(line_token[0], "[", "");        
        Time t = TimeStamp.parse(timestamp);
        Stdout.formatln("{} {}", line_no, t.span.seconds);
        line_no++;
    }
}

After some further testing I think it has to do with the encoding within
the log file. At the end of the line are different strings. If I remove
them and run the code above everything works fine.

I'm not sure what kind of character may have such an impact on the text
util functions.

/L

Posted: 03/20/09 20:52:03 -- Modified: 03/20/09 21:31:38 by
schveiguy -- Modified 2 Times

Your file is double compressed, which confused me :) you have to decompress test.log from test.log.gz, then rename test.log to test.log.gz, then decompress again. Looking at it now...

Update: I'm seeing errors as well. They seem to be different from test run to test run, not very encouraging, sounds like memory corruption.

Found the problem. It's in tango/text/convert/TimeStamp.

What is happening is that module is using pointers everywhere, so there is no regard for the end of the array. So if the date string happens to have garbage data that is a valid numeric character right after the string, then it is counted as part of the year.

For example, you have the string: "Wed Jun 11 17:22:07 2008"

But if we look at continuous memory, the garbage data looks like:
Wed Jun 11 17:22:07 20085@x4721bgy

Then the module parses a year of 20085.

This explains the wildly large values, and the randomness of the error occurrence. Tango times only are valid until the year 10000, and technically stop somewhere above that, but certainly it's not valid at 20000, so you get randomly large (or small) values.

So TimeStamp needs some attention, I'd recommend passing a pointer to the end of the string along with the current position to avoid reading garbage data. If you could submit a bug report, that would be awesome. what would be good is to include a program that generates test data, as you probably need a lot of data to force the garbage to appear, and linking that huge file to the bug probably won't work, and I'm sure you don't want to keep the link alive forever ;)

Thanks!

Posted: 03/20/09 21:34:00

Oh, and a temporary fix, append " " to the end of your date string.

Posted: 03/21/09 11:46:04

thanks so much.. the workaround does work. I will submit a bug report and try to create a little program that generates a file with data that leads to the data corruption/garbage.

/L

Posted: 03/23/09 11:41:55 -- Modified: 03/23/09 11:43:45 by
lars_kirchhoff

I filed a bug report: http://www.dsource.org/projects/tango/ticket/1546

But I could not create a small program that generates
test file that leads to the wrong results. I leave the file
up for a while.

/L