Showing posts with label someonecould. Show all posts
Showing posts with label someonecould. Show all posts

Friday, February 24, 2012

obtain first and last occurance of contiguous blocks of data

Hello everyone, I am in need of some assistance, and was hoping someone
could help me out.
Essentially my problem boils down to the following...
I have a table that contains a date variable and an indication whether the
system is on or off:
[system] [date] [isOn]
A 01 0
A 04 1
A 05 1
A 06 0
A 20 1
A 21 0
A 25 0
A 27 1
A 32 1
A 33 1
A 34 0
A 40 1
B 41 1
B 45 0
B 49 1
B 50 1
B 51 1
B 53 1
B 67 0
What I need to derive is a table that shows the following
[system] [date switched ON] [date switched OFF]
A 04 06
A 20 21
A 27 34
A 40 NULL
B 41 45
B 49 67
So each row contains three variable:
0. first variable: The system concerned
1. second variable: The date the system was switched ON: i.e. the first
member of an occurance a contiguous block of 1's
2. third variable: The date the system was switched OFF: i.e. the subsequent
first member of an occurance of a contiguous block of 0's
... then the next row contains the next member of an occurance of contig.
block of 1's, and subsequent first member of an occurance of a contiguous
block of 0's etc etc
A NULL is placed where for a given system, the date switched OFF is not
defined
I am thinking of utilizing a cursor to run through the list.
Would anyone have a non-sequencial way of looking at addressing the problem
using normal SQL?
Any help most appreciated!
Many thanks
wileyYour options are to use a complicated correlated subquery, or to use a
cursor. This is one of the very few cases where a cursor can provide better
performance than a set-based operation. Use a table variable and insert the
values into it as you walk through the table.
"wiley" <w.smith@.nospam.com> wrote in message
news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
> Hello everyone, I am in need of some assistance, and was hoping someone
> could help me out.
> Essentially my problem boils down to the following...
> I have a table that contains a date variable and an indication whether the
> system is on or off:
> [system] [date] [isOn]
> A 01 0
> A 04 1
> A 05 1
> A 06 0
> A 20 1
> A 21 0
> A 25 0
> A 27 1
> A 32 1
> A 33 1
> A 34 0
> A 40 1
> B 41 1
> B 45 0
> B 49 1
> B 50 1
> B 51 1
> B 53 1
> B 67 0
>
> What I need to derive is a table that shows the following
> [system] [date switched ON] [date switched OFF]
> A 04 06
> A 20 21
> A 27 34
> A 40 NULL
> B 41 45
> B 49 67
> So each row contains three variable:
> 0. first variable: The system concerned
> 1. second variable: The date the system was switched ON: i.e. the first
> member of an occurance a contiguous block of 1's
> 2. third variable: The date the system was switched OFF: i.e. the
> subsequent first member of an occurance of a contiguous block of 0's
> ... then the next row contains the next member of an occurance of contig.
> block of 1's, and subsequent first member of an occurance of a contiguous
> block of 0's etc etc
> A NULL is placed where for a given system, the date switched OFF is not
> defined
> I am thinking of utilizing a cursor to run through the list.
> Would anyone have a non-sequencial way of looking at addressing the
> problem using normal SQL?
> Any help most appreciated!
> Many thanks
> wiley
>|||Thanks for the suggestion! I knew id be wasting my time trying to find a
non-sequential algorithm.
I was working on the cursor-based solution as soon as i sent my post. I
really took a step back and mapped out the sequence via a flowchart and used
simple goto statements in my solution. I tested and it seems to be working.
I know... I really should be use WHILE statements etc. but i had to hack
this fast for a DTS process algorithm i need to implement tomorrow. I'll
fashion it in terms of WHILE statements, comments, and decent variables etc
in time...
declare @.system char(1), @.currentsystem char(1)
declare @.date int, @.begin int, @.end int
declare @.systemon int, @.previous int
delete resulttable
declare system_cursor cursor for
select [system], [date], systemon
from contract1
order by 1, 2
open system_cursor
a:
fetch next from system_cursor into @.system, @.date, @.systemon
if @.@.FETCH_STATUS = -1
goto z
b:
set @.currentsystem = @.system
if @.systemon <> 1
goto a
c:
set @.begin = @.date
d:
set @.previous = 1
fetch next from system_cursor into @.system, @.date, @.systemon
if @.@.FETCH_STATUS = -1
goto z
if @.currentsystem <> @.system
begin
set @.end = NULL
insert into resulttable select @.currentsystem, @.begin, @.end
goto b
end
if (@.systemon = 1 and @.previous = 1)
goto d
set @.end = @.date
insert into resulttable select @.currentsystem, @.begin, @.end
e:
set @.previous = 0
fetch next from system_cursor into @.system, @.date, @.systemon
if @.@.FETCH_STATUS = -1
goto z
if @.currentsystem <> @.system
goto b
if (@.systemon = 0 and @.previous = 0)
goto e
goto c
z:
close system_cursor
deallocate system_cursor
---
cheers, wiley
"Brian Selzer" <brian@.selzer-software.com> wrote in message
news:utWcAzQFGHA.3308@.TK2MSFTNGP10.phx.gbl...
> Your options are to use a complicated correlated subquery, or to use a
> cursor. This is one of the very few cases where a cursor can provide
> better performance than a set-based operation. Use a table variable and
> insert the values into it as you walk through the table.
> "wiley" <w.smith@.nospam.com> wrote in message
> news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
>|||Why not doing such reports on the client side?
"wiley" <w.smith@.nospam.com> wrote in message
news:u7l0YmRFGHA.216@.TK2MSFTNGP15.phx.gbl...
> Thanks for the suggestion! I knew id be wasting my time trying to find a
> non-sequential algorithm.
> I was working on the cursor-based solution as soon as i sent my post. I
> really took a step back and mapped out the sequence via a flowchart and
> used simple goto statements in my solution. I tested and it seems to be
> working. I know... I really should be use WHILE statements etc. but i had
> to hack this fast for a DTS process algorithm i need to implement
> tomorrow. I'll fashion it in terms of WHILE statements, comments, and
> decent variables etc in time...
> --
> declare @.system char(1), @.currentsystem char(1)
> declare @.date int, @.begin int, @.end int
> declare @.systemon int, @.previous int
> delete resulttable
> declare system_cursor cursor for
> select [system], [date], systemon
> from contract1
> order by 1, 2
> open system_cursor
> a:
> fetch next from system_cursor into @.system, @.date, @.systemon
> if @.@.FETCH_STATUS = -1
> goto z
> b:
> set @.currentsystem = @.system
> if @.systemon <> 1
> goto a
> c:
> set @.begin = @.date
> d:
> set @.previous = 1
> fetch next from system_cursor into @.system, @.date, @.systemon
> if @.@.FETCH_STATUS = -1
> goto z
> if @.currentsystem <> @.system
> begin
> set @.end = NULL
> insert into resulttable select @.currentsystem, @.begin, @.end
> goto b
> end
> if (@.systemon = 1 and @.previous = 1)
> goto d
> set @.end = @.date
> insert into resulttable select @.currentsystem, @.begin, @.end
> e:
> set @.previous = 0
> fetch next from system_cursor into @.system, @.date, @.systemon
> if @.@.FETCH_STATUS = -1
> goto z
> if @.currentsystem <> @.system
> goto b
> if (@.systemon = 0 and @.previous = 0)
> goto e
> goto c
> z:
> close system_cursor
> deallocate system_cursor
> ---
> cheers, wiley
>
> "Brian Selzer" <brian@.selzer-software.com> wrote in message
> news:utWcAzQFGHA.3308@.TK2MSFTNGP10.phx.gbl...
>|||I need this algorithm to pre-process data for an olap cube. Hence the DTS
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:%23X2pbuRFGHA.532@.TK2MSFTNGP15.phx.gbl...
> Why not doing such reports on the client side?
>
>
> "wiley" <w.smith@.nospam.com> wrote in message
> news:u7l0YmRFGHA.216@.TK2MSFTNGP15.phx.gbl...
>|||Wiley,
You may want to give the following query a whirl before you opt for cursors:
select system,
date as [date switched ON],
(select top 1 date
from t1 as sysoff
where sysoff.system = syson.system
and sysoff.isOn = 0
and sysoff.date > syson.date
order by date) as [date switched OFF]
from t1 as syson
where isOn = 1
and coalesce(
(select top 1 isOn
from t1 as prev
where prev.system = syson.system
and prev.date < syson.date
order by date desc), 0) <> 1;
Though I'd definitely compare its performance to cursor code, as the cursor
code may end up being faster in this case.
BG, SQL Server MVP
www.SolidQualityLearning.com
www.insidetsql.com
"wiley" <w.smith@.nospam.com> wrote in message
news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
> Hello everyone, I am in need of some assistance, and was hoping someone
> could help me out.
> Essentially my problem boils down to the following...
> I have a table that contains a date variable and an indication whether the
> system is on or off:
> [system] [date] [isOn]
> A 01 0
> A 04 1
> A 05 1
> A 06 0
> A 20 1
> A 21 0
> A 25 0
> A 27 1
> A 32 1
> A 33 1
> A 34 0
> A 40 1
> B 41 1
> B 45 0
> B 49 1
> B 50 1
> B 51 1
> B 53 1
> B 67 0
>
> What I need to derive is a table that shows the following
> [system] [date switched ON] [date switched OFF]
> A 04 06
> A 20 21
> A 27 34
> A 40 NULL
> B 41 45
> B 49 67
> So each row contains three variable:
> 0. first variable: The system concerned
> 1. second variable: The date the system was switched ON: i.e. the first
> member of an occurance a contiguous block of 1's
> 2. third variable: The date the system was switched OFF: i.e. the
> subsequent first member of an occurance of a contiguous block of 0's
> ... then the next row contains the next member of an occurance of contig.
> block of 1's, and subsequent first member of an occurance of a contiguous
> block of 0's etc etc
> A NULL is placed where for a given system, the date switched OFF is not
> defined
> I am thinking of utilizing a cursor to run through the list.
> Would anyone have a non-sequencial way of looking at addressing the
> problem using normal SQL?
> Any help most appreciated!
> Many thanks
> wiley
>|||Wiley,
try the following:
SET NOCOUNT ON;
SET ANSI_NULLS ON;
USE Your_DB;
IF EXISTS(SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name='SwitchHistory') DROP TABLE SwitchHistory;
CREATE TABLE SwitchHistory(
system_id CHAR(1) NOT NULL,
switch_date INTEGER NOT NULL CHECK(switch_date > 0),
system_state INTEGER NOT NULL CHECK(system_state IN (0, 1)),
PRIMARY KEY(system_id, switch_date));
INSERT INTO switchHistory
SELECT 'A', 1, 0 UNION ALL
SELECT 'A', 4, 1 UNION ALL
SELECT 'A', 5, 1 UNION ALL
SELECT 'A', 6, 0 UNION ALL
SELECT 'A', 20, 1 UNION ALL
SELECT 'A', 21, 0 UNION ALL
SELECT 'A', 25, 0 UNION ALL
SELECT 'A', 27, 1 UNION ALL
SELECT 'A', 32, 1 UNION ALL
SELECT 'A', 33, 1 UNION ALL
SELECT 'A', 34, 0 UNION ALL
SELECT 'A', 40, 1 UNION ALL
SELECT 'B', 41, 1 UNION ALL
SELECT 'B', 45, 0 UNION ALL
SELECT 'B', 49, 1 UNION ALL
SELECT 'B', 50, 1 UNION ALL
SELECT 'B', 51, 1 UNION ALL
SELECT 'B', 53, 1 UNION ALL
SELECT 'B', 67, 0;
SELECT H2.system_id,
MIN(H1.switch_date) AS switch_to_on,
MIN(H6.switch_date) AS switch_to_off
FROM switchHistory AS H1
INNER JOIN switchHistory AS H2
ON H2.system_id = H1.system_id
AND H2.switch_date >= H1.switch_date
AND H2.system_state = H1.system_state
LEFT OUTER JOIN switchHistory AS H6
ON H6.system_id = H2.system_id
AND H6.switch_date > H2.switch_date
AND H6.system_state = 0
WHERE H1.system_state = 1
AND NOT EXISTS(SELECT *
FROM switchHistory AS H3
WHERE H3.system_id = H2.system_id
AND H3.switch_date
BETWEEN H1.switch_date AND H2.switch_date
AND H3.system_state = 0)
AND NOT EXISTS(SELECT *
FROM switchHistory AS H4
WHERE H4.system_id = H2.system_id
AND H4.switch_date = (SELECT MIN(H5.switch_date)
FROM switchHistory AS H5
WHERE H5.switch_date >
H2.switch_date)
AND H4.system_state = 1)
GROUP BY H2.system_id, H2.switch_date;
Andrey Odegov
avodeGOV@.yandex.ru
(remove GOV to respond)
"wiley" <w.smith@.nospam.com>: news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
> Hello everyone, I am in need of some assistance, and was hoping someone
> could help me out.
> Essentially my problem boils down to the following...
> I have a table that contains a date variable and an indication whether the
> system is on or off:
> [system] [date] [isOn]
> A 01 0
> A 04 1
> A 05 1
> A 06 0
> A 20 1
> A 21 0
> A 25 0
> A 27 1
> A 32 1
> A 33 1
> A 34 0
> A 40 1
> B 41 1
> B 45 0
> B 49 1
> B 50 1
> B 51 1
> B 53 1
> B 67 0
>
> What I need to derive is a table that shows the following
> [system] [date switched ON] [date switched OFF]
> A 04 06
> A 20 21
> A 27 34
> A 40 NULL
> B 41 45
> B 49 67
> So each row contains three variable:
> 0. first variable: The system concerned
> 1. second variable: The date the system was switched ON: i.e. the first
> member of an occurance a contiguous block of 1's
> 2. third variable: The date the system was switched OFF: i.e. the
> subsequent first member of an occurance of a contiguous block of 0's
> ... then the next row contains the next member of an occurance of contig.
> block of 1's, and subsequent first member of an occurance of a contiguous
> block of 0's etc etc
> A NULL is placed where for a given system, the date switched OFF is not
> defined
> I am thinking of utilizing a cursor to run through the list.
> Would anyone have a non-sequencial way of looking at addressing the
> problem using normal SQL?
> Any help most appreciated!
> Many thanks
> wiley
>|||Itzik,
Thank you so much for the algorithm you put forward. I have tested it and it
performs far better as compared to my cursor implementation. On a set of
company data containing in excess of 1.5 million rows, your algorithm takes
approx. 1.5 min, whereas mine takes an average of 45 min. My algorithm
doesnt have a proper exiting feature based on instance where the last row of
data read contains isOn = 1 (error on my part). Yours however picks up on
this which is great!
Thanks so much again! much appreciated!
cheers
wiley
"Itzik Ben-Gan" <itzik@.REMOVETHIS.SolidQualityLearning.com> wrote in message
news:ey9Y3ySFGHA.2708@.TK2MSFTNGP11.phx.gbl...
> Wiley,
> You may want to give the following query a whirl before you opt for
> cursors:
> select system,
> date as [date switched ON],
> (select top 1 date
> from t1 as sysoff
> where sysoff.system = syson.system
> and sysoff.isOn = 0
> and sysoff.date > syson.date
> order by date) as [date switched OFF]
> from t1 as syson
> where isOn = 1
> and coalesce(
> (select top 1 isOn
> from t1 as prev
> where prev.system = syson.system
> and prev.date < syson.date
> order by date desc), 0) <> 1;
> Though I'd definitely compare its performance to cursor code, as the
> cursor code may end up being faster in this case.
> --
> BG, SQL Server MVP
> www.SolidQualityLearning.com
> www.insidetsql.com
>
> "wiley" <w.smith@.nospam.com> wrote in message
> news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
>|||I modified my query:
SET STATISTICS IO ON;
SELECT H1.system_id, H1.switch_date AS switch_to_on,
MIN(H4.switch_date) AS switch_to_off
FROM SwitchHistory AS H1
LEFT OUTER JOIN SwitchHistory AS H4
ON H4.system_state = 0
AND H4.system_id = H1.system_id
AND H4.switch_date > H1.switch_date
WHERE H1.system_state = 1
AND NOT EXISTS(SELECT *
FROM SwitchHistory AS H2
WHERE H2.system_state = 1 AND H2.system_id = H1.system_id
AND H2.switch_date = (SELECT MAX(H3.switch_date)
FROM SwitchHistory AS H3
WHERE H3.system_id =
H1.system_id
AND H3.switch_date <
H1.switch_date))
GROUP BY H1.system_id, H1.switch_date;
SET STATISTICS IO OFF;
Andrey Odegov
avodeGOV@.yandex.ru
(remove GOV to respond)
"Andrey Odegov" <avodeGOV@.yandex.ru> wrote in message
news:%23$1mHbVFGHA.208@.tk2msftngp13.phx.gbl...
> Wiley,
> try the following:
> SET NOCOUNT ON;
> SET ANSI_NULLS ON;
> USE Your_DB;
> IF EXISTS(SELECT *
> FROM INFORMATION_SCHEMA.TABLES
> WHERE table_name='SwitchHistory') DROP TABLE SwitchHistory;
> CREATE TABLE SwitchHistory(
> system_id CHAR(1) NOT NULL,
> switch_date INTEGER NOT NULL CHECK(switch_date > 0),
> system_state INTEGER NOT NULL CHECK(system_state IN (0, 1)),
> PRIMARY KEY(system_id, switch_date));
> INSERT INTO switchHistory
> SELECT 'A', 1, 0 UNION ALL
> SELECT 'A', 4, 1 UNION ALL
> SELECT 'A', 5, 1 UNION ALL
> SELECT 'A', 6, 0 UNION ALL
> SELECT 'A', 20, 1 UNION ALL
> SELECT 'A', 21, 0 UNION ALL
> SELECT 'A', 25, 0 UNION ALL
> SELECT 'A', 27, 1 UNION ALL
> SELECT 'A', 32, 1 UNION ALL
> SELECT 'A', 33, 1 UNION ALL
> SELECT 'A', 34, 0 UNION ALL
> SELECT 'A', 40, 1 UNION ALL
> SELECT 'B', 41, 1 UNION ALL
> SELECT 'B', 45, 0 UNION ALL
> SELECT 'B', 49, 1 UNION ALL
> SELECT 'B', 50, 1 UNION ALL
> SELECT 'B', 51, 1 UNION ALL
> SELECT 'B', 53, 1 UNION ALL
> SELECT 'B', 67, 0;
> SELECT H2.system_id,
> MIN(H1.switch_date) AS switch_to_on,
> MIN(H6.switch_date) AS switch_to_off
> FROM switchHistory AS H1
> INNER JOIN switchHistory AS H2
> ON H2.system_id = H1.system_id
> AND H2.switch_date >= H1.switch_date
> AND H2.system_state = H1.system_state
> LEFT OUTER JOIN switchHistory AS H6
> ON H6.system_id = H2.system_id
> AND H6.switch_date > H2.switch_date
> AND H6.system_state = 0
> WHERE H1.system_state = 1
> AND NOT EXISTS(SELECT *
> FROM switchHistory AS H3
> WHERE H3.system_id = H2.system_id
> AND H3.switch_date
> BETWEEN H1.switch_date AND H2.switch_date
> AND H3.system_state = 0)
> AND NOT EXISTS(SELECT *
> FROM switchHistory AS H4
> WHERE H4.system_id = H2.system_id
> AND H4.switch_date = (SELECT MIN(H5.switch_date)
> FROM switchHistory AS H5
> WHERE H5.switch_date >
> H2.switch_date)
> AND H4.system_state = 1)
> GROUP BY H2.system_id, H2.switch_date;
> --
> Andrey Odegov
> avodeGOV@.yandex.ru
> (remove GOV to respond)
> "wiley" <w.smith@.nospam.com>: news:O2uIWVQFGHA.376@.TK2MSFTNGP12.phx.gbl...
>|||Thanks for the query Andrey!
I will be testing it tomorrow. Many thanks!
cheers, wiley
"Andrey Odegov" <avodeGOV@.mail.ru> wrote in message
news:OoqSNpeFGHA.1124@.TK2MSFTNGP10.phx.gbl...
>I modified my query:
> SET STATISTICS IO ON;
> SELECT H1.system_id, H1.switch_date AS switch_to_on,
> MIN(H4.switch_date) AS switch_to_off
> FROM SwitchHistory AS H1
> LEFT OUTER JOIN SwitchHistory AS H4
> ON H4.system_state = 0
> AND H4.system_id = H1.system_id
> AND H4.switch_date > H1.switch_date
> WHERE H1.system_state = 1
> AND NOT EXISTS(SELECT *
> FROM SwitchHistory AS H2
> WHERE H2.system_state = 1 AND H2.system_id =
> H1.system_id
> AND H2.switch_date = (SELECT MAX(H3.switch_date)
> FROM SwitchHistory AS H3
> WHERE H3.system_id =
> H1.system_id
> AND H3.switch_date <
> H1.switch_date))
> GROUP BY H1.system_id, H1.switch_date;
> SET STATISTICS IO OFF;
> --
> Andrey Odegov
> avodeGOV@.yandex.ru
> (remove GOV to respond)
> "Andrey Odegov" <avodeGOV@.yandex.ru> wrote in message
> news:%23$1mHbVFGHA.208@.tk2msftngp13.phx.gbl...
>