Friday, February 24, 2012

obtain first and last occurance of contiguous blocks of data

Hello everyone, I am in need of some assistance, and was hoping someone
could help me out.
Essentially my problem boils down to the following...
I have a table that contains a date variable and an indication whether the
system is on or off:
[system] [date] [isOn]
A 01 0
A 04 1
A 05 1
A 06 0
A 20 1
A 21 0
A 25 0
A 27 1
A 32 1
A 33 1
A 34 0
A 40 1
B 41 1
B 45 0
B 49 1
B 50 1
B 51 1
B 53 1
B 67 0
What I need to derive is a table that shows the following
[system] [date switched ON] [date switched OFF]
A 04 06
A 20 21
A 27 34
A 40 NULL
B 41 45
B 49 67
So each row contains three variable:
0. first variable: The system concerned
1. second variable: The date the system was switched ON: i.e. the first
member of an occurance a contiguous block of 1's
2. third variable: The date the system was switched OFF: i.e. the subsequent
first member of an occurance of a contiguous block of 0's
... then the next row contains the next member of an occurance of contig.
block of 1's, and subsequent first member of an occurance of a contiguous
block of 0's etc etc
A NULL is placed where for a given system, the date switched OFF is not
defined
I am thinking of utilizing a cursor to run through the list.
Would anyone have a non-sequencial way of looking at addressing the problem
using normal SQL?
Any help most appreciated!
Many thanks
wileyYour options are to use a complicated correlated subquery, or to use a
cursor. This is one of the very few cases where a cursor can provide better
performance than a set-based operation. Use a table variable and insert the
values into it as you walk through the table.
"wiley" <w.smith@.nospam.com> wrote in message
news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
> Hello everyone, I am in need of some assistance, and was hoping someone
> could help me out.
> Essentially my problem boils down to the following...
> I have a table that contains a date variable and an indication whether the
> system is on or off:
> [system] [date] [isOn]
> A 01 0
> A 04 1
> A 05 1
> A 06 0
> A 20 1
> A 21 0
> A 25 0
> A 27 1
> A 32 1
> A 33 1
> A 34 0
> A 40 1
> B 41 1
> B 45 0
> B 49 1
> B 50 1
> B 51 1
> B 53 1
> B 67 0
>
> What I need to derive is a table that shows the following
> [system] [date switched ON] [date switched OFF]
> A 04 06
> A 20 21
> A 27 34
> A 40 NULL
> B 41 45
> B 49 67
> So each row contains three variable:
> 0. first variable: The system concerned
> 1. second variable: The date the system was switched ON: i.e. the first
> member of an occurance a contiguous block of 1's
> 2. third variable: The date the system was switched OFF: i.e. the
> subsequent first member of an occurance of a contiguous block of 0's
> ... then the next row contains the next member of an occurance of contig.
> block of 1's, and subsequent first member of an occurance of a contiguous
> block of 0's etc etc
> A NULL is placed where for a given system, the date switched OFF is not
> defined
> I am thinking of utilizing a cursor to run through the list.
> Would anyone have a non-sequencial way of looking at addressing the
> problem using normal SQL?
> Any help most appreciated!
> Many thanks
> wiley
>|||Thanks for the suggestion! I knew id be wasting my time trying to find a
non-sequential algorithm.
I was working on the cursor-based solution as soon as i sent my post. I
really took a step back and mapped out the sequence via a flowchart and used
simple goto statements in my solution. I tested and it seems to be working.
I know... I really should be use WHILE statements etc. but i had to hack
this fast for a DTS process algorithm i need to implement tomorrow. I'll
fashion it in terms of WHILE statements, comments, and decent variables etc
in time...
declare @.system char(1), @.currentsystem char(1)
declare @.date int, @.begin int, @.end int
declare @.systemon int, @.previous int
delete resulttable
declare system_cursor cursor for
select [system], [date], systemon
from contract1
order by 1, 2
open system_cursor
a:
fetch next from system_cursor into @.system, @.date, @.systemon
if @.@.FETCH_STATUS = -1
goto z
b:
set @.currentsystem = @.system
if @.systemon <> 1
goto a
c:
set @.begin = @.date
d:
set @.previous = 1
fetch next from system_cursor into @.system, @.date, @.systemon
if @.@.FETCH_STATUS = -1
goto z
if @.currentsystem <> @.system
begin
set @.end = NULL
insert into resulttable select @.currentsystem, @.begin, @.end
goto b
end
if (@.systemon = 1 and @.previous = 1)
goto d
set @.end = @.date
insert into resulttable select @.currentsystem, @.begin, @.end
e:
set @.previous = 0
fetch next from system_cursor into @.system, @.date, @.systemon
if @.@.FETCH_STATUS = -1
goto z
if @.currentsystem <> @.system
goto b
if (@.systemon = 0 and @.previous = 0)
goto e
goto c
z:
close system_cursor
deallocate system_cursor
---
cheers, wiley
"Brian Selzer" <brian@.selzer-software.com> wrote in message
news:utWcAzQFGHA.3308@.TK2MSFTNGP10.phx.gbl...
> Your options are to use a complicated correlated subquery, or to use a
> cursor. This is one of the very few cases where a cursor can provide
> better performance than a set-based operation. Use a table variable and
> insert the values into it as you walk through the table.
> "wiley" <w.smith@.nospam.com> wrote in message
> news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
>|||Why not doing such reports on the client side?
"wiley" <w.smith@.nospam.com> wrote in message
news:u7l0YmRFGHA.216@.TK2MSFTNGP15.phx.gbl...
> Thanks for the suggestion! I knew id be wasting my time trying to find a
> non-sequential algorithm.
> I was working on the cursor-based solution as soon as i sent my post. I
> really took a step back and mapped out the sequence via a flowchart and
> used simple goto statements in my solution. I tested and it seems to be
> working. I know... I really should be use WHILE statements etc. but i had
> to hack this fast for a DTS process algorithm i need to implement
> tomorrow. I'll fashion it in terms of WHILE statements, comments, and
> decent variables etc in time...
> --
> declare @.system char(1), @.currentsystem char(1)
> declare @.date int, @.begin int, @.end int
> declare @.systemon int, @.previous int
> delete resulttable
> declare system_cursor cursor for
> select [system], [date], systemon
> from contract1
> order by 1, 2
> open system_cursor
> a:
> fetch next from system_cursor into @.system, @.date, @.systemon
> if @.@.FETCH_STATUS = -1
> goto z
> b:
> set @.currentsystem = @.system
> if @.systemon <> 1
> goto a
> c:
> set @.begin = @.date
> d:
> set @.previous = 1
> fetch next from system_cursor into @.system, @.date, @.systemon
> if @.@.FETCH_STATUS = -1
> goto z
> if @.currentsystem <> @.system
> begin
> set @.end = NULL
> insert into resulttable select @.currentsystem, @.begin, @.end
> goto b
> end
> if (@.systemon = 1 and @.previous = 1)
> goto d
> set @.end = @.date
> insert into resulttable select @.currentsystem, @.begin, @.end
> e:
> set @.previous = 0
> fetch next from system_cursor into @.system, @.date, @.systemon
> if @.@.FETCH_STATUS = -1
> goto z
> if @.currentsystem <> @.system
> goto b
> if (@.systemon = 0 and @.previous = 0)
> goto e
> goto c
> z:
> close system_cursor
> deallocate system_cursor
> ---
> cheers, wiley
>
> "Brian Selzer" <brian@.selzer-software.com> wrote in message
> news:utWcAzQFGHA.3308@.TK2MSFTNGP10.phx.gbl...
>|||I need this algorithm to pre-process data for an olap cube. Hence the DTS
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:%23X2pbuRFGHA.532@.TK2MSFTNGP15.phx.gbl...
> Why not doing such reports on the client side?
>
>
> "wiley" <w.smith@.nospam.com> wrote in message
> news:u7l0YmRFGHA.216@.TK2MSFTNGP15.phx.gbl...
>|||Wiley,
You may want to give the following query a whirl before you opt for cursors:
select system,
date as [date switched ON],
(select top 1 date
from t1 as sysoff
where sysoff.system = syson.system
and sysoff.isOn = 0
and sysoff.date > syson.date
order by date) as [date switched OFF]
from t1 as syson
where isOn = 1
and coalesce(
(select top 1 isOn
from t1 as prev
where prev.system = syson.system
and prev.date < syson.date
order by date desc), 0) <> 1;
Though I'd definitely compare its performance to cursor code, as the cursor
code may end up being faster in this case.
BG, SQL Server MVP
www.SolidQualityLearning.com
www.insidetsql.com
"wiley" <w.smith@.nospam.com> wrote in message
news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
> Hello everyone, I am in need of some assistance, and was hoping someone
> could help me out.
> Essentially my problem boils down to the following...
> I have a table that contains a date variable and an indication whether the
> system is on or off:
> [system] [date] [isOn]
> A 01 0
> A 04 1
> A 05 1
> A 06 0
> A 20 1
> A 21 0
> A 25 0
> A 27 1
> A 32 1
> A 33 1
> A 34 0
> A 40 1
> B 41 1
> B 45 0
> B 49 1
> B 50 1
> B 51 1
> B 53 1
> B 67 0
>
> What I need to derive is a table that shows the following
> [system] [date switched ON] [date switched OFF]
> A 04 06
> A 20 21
> A 27 34
> A 40 NULL
> B 41 45
> B 49 67
> So each row contains three variable:
> 0. first variable: The system concerned
> 1. second variable: The date the system was switched ON: i.e. the first
> member of an occurance a contiguous block of 1's
> 2. third variable: The date the system was switched OFF: i.e. the
> subsequent first member of an occurance of a contiguous block of 0's
> ... then the next row contains the next member of an occurance of contig.
> block of 1's, and subsequent first member of an occurance of a contiguous
> block of 0's etc etc
> A NULL is placed where for a given system, the date switched OFF is not
> defined
> I am thinking of utilizing a cursor to run through the list.
> Would anyone have a non-sequencial way of looking at addressing the
> problem using normal SQL?
> Any help most appreciated!
> Many thanks
> wiley
>|||Wiley,
try the following:
SET NOCOUNT ON;
SET ANSI_NULLS ON;
USE Your_DB;
IF EXISTS(SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name='SwitchHistory') DROP TABLE SwitchHistory;
CREATE TABLE SwitchHistory(
system_id CHAR(1) NOT NULL,
switch_date INTEGER NOT NULL CHECK(switch_date > 0),
system_state INTEGER NOT NULL CHECK(system_state IN (0, 1)),
PRIMARY KEY(system_id, switch_date));
INSERT INTO switchHistory
SELECT 'A', 1, 0 UNION ALL
SELECT 'A', 4, 1 UNION ALL
SELECT 'A', 5, 1 UNION ALL
SELECT 'A', 6, 0 UNION ALL
SELECT 'A', 20, 1 UNION ALL
SELECT 'A', 21, 0 UNION ALL
SELECT 'A', 25, 0 UNION ALL
SELECT 'A', 27, 1 UNION ALL
SELECT 'A', 32, 1 UNION ALL
SELECT 'A', 33, 1 UNION ALL
SELECT 'A', 34, 0 UNION ALL
SELECT 'A', 40, 1 UNION ALL
SELECT 'B', 41, 1 UNION ALL
SELECT 'B', 45, 0 UNION ALL
SELECT 'B', 49, 1 UNION ALL
SELECT 'B', 50, 1 UNION ALL
SELECT 'B', 51, 1 UNION ALL
SELECT 'B', 53, 1 UNION ALL
SELECT 'B', 67, 0;
SELECT H2.system_id,
MIN(H1.switch_date) AS switch_to_on,
MIN(H6.switch_date) AS switch_to_off
FROM switchHistory AS H1
INNER JOIN switchHistory AS H2
ON H2.system_id = H1.system_id
AND H2.switch_date >= H1.switch_date
AND H2.system_state = H1.system_state
LEFT OUTER JOIN switchHistory AS H6
ON H6.system_id = H2.system_id
AND H6.switch_date > H2.switch_date
AND H6.system_state = 0
WHERE H1.system_state = 1
AND NOT EXISTS(SELECT *
FROM switchHistory AS H3
WHERE H3.system_id = H2.system_id
AND H3.switch_date
BETWEEN H1.switch_date AND H2.switch_date
AND H3.system_state = 0)
AND NOT EXISTS(SELECT *
FROM switchHistory AS H4
WHERE H4.system_id = H2.system_id
AND H4.switch_date = (SELECT MIN(H5.switch_date)
FROM switchHistory AS H5
WHERE H5.switch_date >
H2.switch_date)
AND H4.system_state = 1)
GROUP BY H2.system_id, H2.switch_date;
Andrey Odegov
avodeGOV@.yandex.ru
(remove GOV to respond)
"wiley" <w.smith@.nospam.com>: news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
> Hello everyone, I am in need of some assistance, and was hoping someone
> could help me out.
> Essentially my problem boils down to the following...
> I have a table that contains a date variable and an indication whether the
> system is on or off:
> [system] [date] [isOn]
> A 01 0
> A 04 1
> A 05 1
> A 06 0
> A 20 1
> A 21 0
> A 25 0
> A 27 1
> A 32 1
> A 33 1
> A 34 0
> A 40 1
> B 41 1
> B 45 0
> B 49 1
> B 50 1
> B 51 1
> B 53 1
> B 67 0
>
> What I need to derive is a table that shows the following
> [system] [date switched ON] [date switched OFF]
> A 04 06
> A 20 21
> A 27 34
> A 40 NULL
> B 41 45
> B 49 67
> So each row contains three variable:
> 0. first variable: The system concerned
> 1. second variable: The date the system was switched ON: i.e. the first
> member of an occurance a contiguous block of 1's
> 2. third variable: The date the system was switched OFF: i.e. the
> subsequent first member of an occurance of a contiguous block of 0's
> ... then the next row contains the next member of an occurance of contig.
> block of 1's, and subsequent first member of an occurance of a contiguous
> block of 0's etc etc
> A NULL is placed where for a given system, the date switched OFF is not
> defined
> I am thinking of utilizing a cursor to run through the list.
> Would anyone have a non-sequencial way of looking at addressing the
> problem using normal SQL?
> Any help most appreciated!
> Many thanks
> wiley
>|||Itzik,
Thank you so much for the algorithm you put forward. I have tested it and it
performs far better as compared to my cursor implementation. On a set of
company data containing in excess of 1.5 million rows, your algorithm takes
approx. 1.5 min, whereas mine takes an average of 45 min. My algorithm
doesnt have a proper exiting feature based on instance where the last row of
data read contains isOn = 1 (error on my part). Yours however picks up on
this which is great!
Thanks so much again! much appreciated!
cheers
wiley
"Itzik Ben-Gan" <itzik@.REMOVETHIS.SolidQualityLearning.com> wrote in message
news:ey9Y3ySFGHA.2708@.TK2MSFTNGP11.phx.gbl...
> Wiley,
> You may want to give the following query a whirl before you opt for
> cursors:
> select system,
> date as [date switched ON],
> (select top 1 date
> from t1 as sysoff
> where sysoff.system = syson.system
> and sysoff.isOn = 0
> and sysoff.date > syson.date
> order by date) as [date switched OFF]
> from t1 as syson
> where isOn = 1
> and coalesce(
> (select top 1 isOn
> from t1 as prev
> where prev.system = syson.system
> and prev.date < syson.date
> order by date desc), 0) <> 1;
> Though I'd definitely compare its performance to cursor code, as the
> cursor code may end up being faster in this case.
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com
> www.insidetsql.com
>
> "wiley" <w.smith@.nospam.com> wrote in message
> news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
>|||I modified my query:
SET STATISTICS IO ON;
SELECT H1.system_id, H1.switch_date AS switch_to_on,
MIN(H4.switch_date) AS switch_to_off
FROM SwitchHistory AS H1
LEFT OUTER JOIN SwitchHistory AS H4
ON H4.system_state = 0
AND H4.system_id = H1.system_id
AND H4.switch_date > H1.switch_date
WHERE H1.system_state = 1
AND NOT EXISTS(SELECT *
FROM SwitchHistory AS H2
WHERE H2.system_state = 1 AND H2.system_id = H1.system_id
AND H2.switch_date = (SELECT MAX(H3.switch_date)
FROM SwitchHistory AS H3
WHERE H3.system_id =
H1.system_id
AND H3.switch_date <
H1.switch_date))
GROUP BY H1.system_id, H1.switch_date;
SET STATISTICS IO OFF;
Andrey Odegov
avodeGOV@.yandex.ru
(remove GOV to respond)
"Andrey Odegov" <avodeGOV@.yandex.ru> wrote in message
news:%23$1mHbVFGHA.208@.tk2msftngp13.phx.gbl...
> Wiley,
> try the following:
> SET NOCOUNT ON;
> SET ANSI_NULLS ON;
> USE Your_DB;
> IF EXISTS(SELECT *
> FROM INFORMATION_SCHEMA.TABLES
> WHERE table_name='SwitchHistory') DROP TABLE SwitchHistory;
> CREATE TABLE SwitchHistory(
> system_id CHAR(1) NOT NULL,
> switch_date INTEGER NOT NULL CHECK(switch_date > 0),
> system_state INTEGER NOT NULL CHECK(system_state IN (0, 1)),
> PRIMARY KEY(system_id, switch_date));
> INSERT INTO switchHistory
> SELECT 'A', 1, 0 UNION ALL
> SELECT 'A', 4, 1 UNION ALL
> SELECT 'A', 5, 1 UNION ALL
> SELECT 'A', 6, 0 UNION ALL
> SELECT 'A', 20, 1 UNION ALL
> SELECT 'A', 21, 0 UNION ALL
> SELECT 'A', 25, 0 UNION ALL
> SELECT 'A', 27, 1 UNION ALL
> SELECT 'A', 32, 1 UNION ALL
> SELECT 'A', 33, 1 UNION ALL
> SELECT 'A', 34, 0 UNION ALL
> SELECT 'A', 40, 1 UNION ALL
> SELECT 'B', 41, 1 UNION ALL
> SELECT 'B', 45, 0 UNION ALL
> SELECT 'B', 49, 1 UNION ALL
> SELECT 'B', 50, 1 UNION ALL
> SELECT 'B', 51, 1 UNION ALL
> SELECT 'B', 53, 1 UNION ALL
> SELECT 'B', 67, 0;
> SELECT H2.system_id,
> MIN(H1.switch_date) AS switch_to_on,
> MIN(H6.switch_date) AS switch_to_off
> FROM switchHistory AS H1
> INNER JOIN switchHistory AS H2
> ON H2.system_id = H1.system_id
> AND H2.switch_date >= H1.switch_date
> AND H2.system_state = H1.system_state
> LEFT OUTER JOIN switchHistory AS H6
> ON H6.system_id = H2.system_id
> AND H6.switch_date > H2.switch_date
> AND H6.system_state = 0
> WHERE H1.system_state = 1
> AND NOT EXISTS(SELECT *
> FROM switchHistory AS H3
> WHERE H3.system_id = H2.system_id
> AND H3.switch_date
> BETWEEN H1.switch_date AND H2.switch_date
> AND H3.system_state = 0)
> AND NOT EXISTS(SELECT *
> FROM switchHistory AS H4
> WHERE H4.system_id = H2.system_id
> AND H4.switch_date = (SELECT MIN(H5.switch_date)
> FROM switchHistory AS H5
> WHERE H5.switch_date >
> H2.switch_date)
> AND H4.system_state = 1)
> GROUP BY H2.system_id, H2.switch_date;
> --
> Andrey Odegov
> avodeGOV@.yandex.ru
> (remove GOV to respond)
> "wiley" <w.smith@.nospam.com>: news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
>|||Thanks for the query Andrey!
I will be testing it tomorrow. Many thanks!
cheers, wiley
"Andrey Odegov" <avodeGOV@.mail.ru> wrote in message
news:OoqSNpeFGHA.1124@.TK2MSFTNGP10.phx.gbl...
>I modified my query:
> SET STATISTICS IO ON;
> SELECT H1.system_id, H1.switch_date AS switch_to_on,
> MIN(H4.switch_date) AS switch_to_off
> FROM SwitchHistory AS H1
> LEFT OUTER JOIN SwitchHistory AS H4
> ON H4.system_state = 0
> AND H4.system_id = H1.system_id
> AND H4.switch_date > H1.switch_date
> WHERE H1.system_state = 1
> AND NOT EXISTS(SELECT *
> FROM SwitchHistory AS H2
> WHERE H2.system_state = 1 AND H2.system_id =
> H1.system_id
> AND H2.switch_date = (SELECT MAX(H3.switch_date)
> FROM SwitchHistory AS H3
> WHERE H3.system_id =
> H1.system_id
> AND H3.switch_date <
> H1.switch_date))
> GROUP BY H1.system_id, H1.switch_date;
> SET STATISTICS IO OFF;
> --
> Andrey Odegov
> avodeGOV@.yandex.ru
> (remove GOV to respond)
> "Andrey Odegov" <avodeGOV@.yandex.ru> wrote in message
> news:%23$1mHbVFGHA.208@.tk2msftngp13.phx.gbl...
>

No comments:

Post a Comment