#335938 - 09/08/2010 20:11
Math using time as variables in Perl
|
carpal tunnel
Registered: 20/12/1999
Posts: 31600
Loc: Seattle, WA
|
Coworker is working on a Perl script that validates some pieces of collected data. The nature of the data isn't important but we're having trouble wrapping our head around the math needed to validate this elegantly.
We have a set of timestamps within some files we're creating. These timestamps are a kind of a "marker" within the file, indicating the start point of a certain kind of data activity. These markers can be created at any time (very random times) depending on conditions. But we also added code to make sure to, at the very least, create the markers at fifteen minute intervals, four times per hour at the 00 15 30 45 points in the hour based on the time of day.
So a file might have timestamps like this: - File start time (at <hour>:07) - Marker - Marker - Marker - Marker (at <hour>:15) - Marker - Marker - File end time (at <hour>:24)
Or the file might look like this: - File start time (at <hour>:22) - Marker (at <hour:30>) - Marker (at <hour:45>) - Marker (at <hour+1:00>) - File end time (at <hour+1>:07)
Or the file might look like this: - File start time (at <hour>:32) - Marker - File end time (at <hour>:34)
The length of the file is variable, as are all of the markers and the file's start and end times. But there should at least be markers at those 00 15 30 45 points within each hour.
We're creating a script to validate that the 15-minute interval bits actually got done in our code in all possible cases. We need to run this script against a huge set of data files of varying length that were previously collected.
All timestamps are already converted to Epoch time (number of seconds since the Epoch) so theoretically we should be able to validate this with integer math. We're just trying to figure out how to do it without resorting to brute force. We're looking for an elegant math-based solution for this. Given a list like the ones above, how do we elegantly validate that the list does not contain a gap at any of the quarter hour points?
Any ideas?
|
Top
|
|
|
|
#335943 - 09/08/2010 23:13
Re: Math using time as variables in Perl
[Re: tfabris]
|
pooh-bah
Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
|
I don't think you can. Unless the data is massively large, surely just bruteforcing is just the way.
1) Read first time stamp 2) Calculate next 15 minute boundary time stamp 3) Search for it from there - if a later time stamp is found first, file is borked. Success if end of file is reached. 4) Calculate next time stamp boundary (add 15 minutes), goto 3
Alternately, filter out any markers that are not on the exact 15 minute boundary first and then do the same run through. Need to be careful in case there are non regular markers exactly on the 15 boundary since you won't filter them out (unless you can identify them and filter them out too). Then it's just a case of making sure each entry is 15 minutes after the previous one.
Edited by Shonky (09/08/2010 23:14) Edit Reason: Changed success detection in step 3
_________________________
Christian #40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)
|
Top
|
|
|
|
#335944 - 10/08/2010 00:00
Re: Math using time as variables in Perl
[Re: Shonky]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
If you're looking for an elegant solution using Perl, you're doing it wrong. If I'm understanding you right, I think you just want to do something like this: 1. Grab first timestamp in the file, let's call it "t0" 2 The first quarter-hour timestamp (t1) is going to be: t1 = t0 + 900 - (t0 % 900) 3. Then you just loop, looking for t1 + n*900, until it fails or you hit EOF. That's what you mean by a "math based" solution, right?
|
Top
|
|
|
|
#335952 - 10/08/2010 13:03
Re: Math using time as variables in Perl
[Re: tonyc]
|
old hand
Registered: 09/01/2002
Posts: 702
Loc: Tacoma,WA
|
Grab the first timestamp, then grab all the markers until the end of the timestamp, put them in a hash list keyed by date. Now you've got a nice list of easily accessed timestamps for a time period and you also know the starting and ending point of that list of timestamps. Next, take the start hour (i.e. strip off the minutes) of the first timestamp and assign it to an incrementing date variable you use to loop. Loop until the incrementing date variable < end date, increment the date by 15 minutes each time. At each iteration look for the date in the hash list. If you find it you are good, if you don't find it there is a problem. It should be very fast this way because of the use of a hash list. I have no idea if Perl can do a hash list as easily as OO languages though (Java, .NET etc..)
|
Top
|
|
|
|
#335953 - 10/08/2010 13:08
Re: Math using time as variables in Perl
[Re: siberia37]
|
carpal tunnel
Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
|
put them in a hash list keyed by date That takes O(N) space, plus it can't check for out-of-order timestamps. Shonky/tonyc's algorithm is the way to go. Peter
|
Top
|
|
|
|
#335958 - 10/08/2010 14:05
Re: Math using time as variables in Perl
[Re: tonyc]
|
carpal tunnel
Registered: 20/12/1999
Posts: 31600
Loc: Seattle, WA
|
That's what you mean by a "math based" solution, right? No, that's what I consider a brute-force solution. That sort of thing would work, though, and yours and Shonky's solutions are viable if we don't have a math solution. By math solution, I'm talking about something that simply needs a math equation to solve the problem. Something along the lines of "IF (T1-T0) MOD 900" or something like that. Something that doesn't require iterating through a list for each time stamp. I don't know if that's possible.
|
Top
|
|
|
|
#335960 - 10/08/2010 14:20
Re: Math using time as variables in Perl
[Re: tfabris]
|
carpal tunnel
Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
|
By math solution, I'm talking about something that simply needs a math equation to solve the problem. Something along the lines of "IF (T1-T0) MOD 900" or something like that. Something that doesn't require iterating through a list for each time stamp. I don't know if that's possible. Unless I'm misunderstanding either your problem or their solution, the Shonky/tonyc algorithm makes just one pass through the file and uses only scalars (not arrays or lists) as local variables. You aren't going to improve on that. Peter
|
Top
|
|
|
|
#335962 - 10/08/2010 14:52
Re: Math using time as variables in Perl
[Re: tfabris]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
Something that doesn't require iterating through a list for each time stamp. I have the feeling you don't know what you're asking for. I'm not poking fun; I've been in the same situation of feeling like there's a better solution but not even really knowing how to ask for it. Ultimately, you have a list of numbers. You have to check each one. There's no way around dealing with a list, because that's your input. You can avoid dealing with additional lists, and that's what Shonky and Tony have suggested. There are potentially other ways to decorate it, but they're all going to be basically the same.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#335966 - 10/08/2010 15:30
Re: Math using time as variables in Perl
[Re: wfaulk]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
I have the feeling that you're looking for a more stateless solution. Let me redecorate the existing solution in that style: // entry action; can start at any point in list
test(initial_stamp)
// recursive transit of list
// will either error, or exit due to EndOfList
function test(current)
next := nextmarker(current)
if next - current > FifteenMinutes then
error
else
test(next)
endif
endfunc
// get next 15m marker
// returns marker or exits due to EndOfList
function nextmarker(stamp)
check_stamp := stamp.next
if check_stamp == EndOfList
exit
if check_stamp MOD FifteenMinutes <> 0 then
return nextmarker(check_stamp)
else
return check_stamp
endif
endfunc TonyC got after me about wasting RAM on the recursion, so: function test(current)
next := nextmarker(current)
while next - current <= FifteenMinutes do
current := next
next := nextmarker(current)
done
error
endfunc
Edited by wfaulk (10/08/2010 16:02) Edit Reason: iterative solution
_________________________
Bitt Faulk
|
Top
|
|
|
|
#335967 - 10/08/2010 15:45
Re: Math using time as variables in Perl
[Re: wfaulk]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
Here's a slightly different decoration: // find first marker stamp
marker := nextmarker(initial_stamp)
// make sure there's no missing marker between the onset and the first marker
if (marker - initial_stamp) > FifteenMinutes then
error
endif
// test starting at first marker
test(marker, marker DIV FifteenMinutes)
// recursively test for each marker by counting units of FifteenMinutes
// stops either with error or EndOfList
function test(marker, expected_quotient)
assert(marker MOD FifteenMinutes == 0)
if (marker DIV FifteenMinutes) != expected_quotient then
error
endif
test(nextmarker(marker, expected_quotient + 1))
endfunc Again: an iterative version: function test(marker, expected_quotient)
while (marker DIV FifteenMinutes) == expected_quotient do
marker := nextmarker(marker)
expected_quotient++
done
error
endfunc
Edited by wfaulk (10/08/2010 16:07) Edit Reason: iterative procedure
_________________________
Bitt Faulk
|
Top
|
|
|
|
#335968 - 10/08/2010 16:08
Re: Math using time as variables in Perl
[Re: wfaulk]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
Ultimately, though, both of those solutions are likely to be less optimal than the initial decorations. I can't imagine an architecture where (MOD 900) costs less than (+ 900).
_________________________
Bitt Faulk
|
Top
|
|
|
|
#335976 - 10/08/2010 18:12
Re: Math using time as variables in Perl
[Re: wfaulk]
|
carpal tunnel
Registered: 20/12/1999
Posts: 31600
Loc: Seattle, WA
|
Ultimately, though, both of those solutions are likely to be less optimal than the initial decorations. I can't imagine an architecture where (MOD 900) costs less than (+ 900). Thanks very much, Bitt, Peter, TonyC, Shonky, and everyeone else. Your clarity with describing the issues and the tradeoffs is extremely helpful.
|
Top
|
|
|
|
|
|