Get the count of statuses by date but only count continuous rows

Get the count of statuses by date but only count continuous rows



I have this data:


ID Name Status Date
1 Machine1 Active 2018-01-01
2 Machine2 Fault 2018-01-01
3 Machine3 Active 2018-01-01
4 Machine1 Fault 2018-01-02
5 Machine2 Active 2018-01-02
6 Machine3 Active 2018-01-02
7 Machine2 Active 2018-01-03
8 Machine1 Fault 2018-01-03
9 Machine2 Active 2018-01-04
10 Machine1 Fault 2018-01-04
11 Machine3 Active 2018-01-06



INPUT



and i want these data in output



EXPECTED OUTPUT


Name Last Status Count
Machine1 Fault 3
Machine2 Active 3
Machine3 Active 1 Because Date is not Continuous



*Count : Last number of status in continuous history






What is your code and what error are you getting?

– Tab Alleman
Sep 11 '18 at 15:22






Welcome to SO. If I understood your proeblem correctly, please improve/clarify your question by adding this wording: "Last Status" is the most recently given status of a particular machine "Count" is the number of distinct consecutive days this status has been reported up to the most recent date. If there is a gap in the dates (no record for a particular date/machine/status), the counter is reset

– Caius Jard
Sep 11 '18 at 15:40






You're sample data doesn't match the expected output: Machine2 is Active for 3 days from 2 Jan to 4 Jan Machine3 is Active for 2 days from 1 Jan to 2 Jan

– Mazhar
Sep 11 '18 at 15:48


Machine2 is Active for 3 days from 2 Jan to 4 Jan


Machine3 is Active for 2 days from 1 Jan to 2 Jan






@Mazhar I agree with the error in machine 2, it has been active for the last 3 days, but Machine 3 has been most recently active, and this active is 1 day long report only, so that is correct.

– Caius Jard
Sep 11 '18 at 15:52







I wrote a query which I thought would solve the problem, then saw your requirement about continuous history. If a date is missing, why wouldn't the space between non continuous dates not have the same status? I think it would be OK to assume this.

– Tim Biegeleisen
Sep 11 '18 at 15:53




2 Answers
2



I believe it is as simple as this:


WITH cte1 AS (
SELECT
Name,
Status,
DATEADD(DAY, ROW_NUMBER() OVER (PARTITION BY Name, Status ORDER BY Date DESC) - 1, Date) AS GroupingDate
FROM testdata
), cte2 AS (
SELECT
Name,
Status,
RANK() OVER (PARTITION BY Name ORDER BY GroupingDate DESC) AS GroupingNumber
FROM cte1
)
SELECT Name, Status AS LastStatus, COUNT(*) AS LastStatusCount
FROM cte2
WHERE GroupingNumber = 1
GROUP BY Name, Status
ORDER BY Name



Result and DBFiddle:


| Name | LastStatus | LastStatusCount |
|----------|------------|-----------------|
| Machine1 | Fault | 3 |
| Machine2 | Active | 3 |
| Machine3 | Active | 1 |



In order to understand how this works, look at the intermediate values generated by CTE:


| Name | Status | Date | RowNumber | GroupingDate | GroupingNumber |
|----------|--------|---------------------|-----------|---------------------|----------------|
| Machine1 | Fault | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Active | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine2 | Active | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Fault | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine3 | Active | 06/01/2018 00:00:00 | 1 | 06/01/2018 00:00:00 | 1 |
| Machine3 | Active | 02/01/2018 00:00:00 | 2 | 03/01/2018 00:00:00 | 2 |
| Machine3 | Active | 01/01/2018 00:00:00 | 3 | 03/01/2018 00:00:00 | 2 |



The trick here is that if two numbers are contiguous then subtracting contiguous numbers from them will result in same value. E.g. if we have 5, 6, 8, 9 then subtracting 1, 2, 3, 4 in that order will produce 4, 4, 5, 5.


5, 6, 8, 9


1, 2, 3, 4


4, 4, 5, 5






Neat trick; here Salman establishes an incrementing counter that is added to a decrementing date. For consecutive dates, this naturally produces a fixed output date (3rd jan + 0 = 3rd jan, 2nd jan + 1 = 3rd jan, 1st jan + 2 = 3rd jan), then the data is ranked by this. Rank produces the same "winning position" for equal values so all of the consecutive dates that became the same output date are all ranked first. These are then grouped and counted

– Caius Jard
Sep 12 '18 at 9:09



I think this will work, though SQLFiddle is having a fit at the moment, so I can't test:


SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y
WHERE
rn = 1



It first off uses LAG to look at the current row and the previous row (grouping the data into machine name and status, ordering the data by date) and if the current date is 1 day different to the previous date it records a 1 else a 0



These 1s and zeroes are summed in a running total fashion, resetting when the machine name or status changes (the partition of the sum() over() )



Also we want to consider the data just in terms of the machine name, and we only want the latest record from each machine, so we partition by the machine name, and count in order of date descending, then just pick (with the where clause) the rows that are numbered 1 for each machine



It actually makes a lot more sense if you run the queries separately, like this



Calculate the "is the current report consecutive with the previous report, for the given status and machine" 1 = yes, 0 = no:


SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t



Calculate the "what is the running total of the current block of consecutive reports":


SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x



Then of course, the whole thing but without the where clause, so you can see the data we're discarding:


SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y



Fiddle finally woke up:



http://www.sqlfiddle.com/#!18/77dae/2



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)