Get the count of statuses by date but only count continuous rows

I have this data:

ID Name Status Date 1 Machine1 Active 2018-01-01 2 Machine2 Fault 2018-01-01 3 Machine3 Active 2018-01-01 4 Machine1 Fault 2018-01-02 5 Machine2 Active 2018-01-02 6 Machine3 Active 2018-01-02 7 Machine2 Active 2018-01-03 8 Machine1 Fault 2018-01-03 9 Machine2 Active 2018-01-04 10 Machine1 Fault 2018-01-04 11 Machine3 Active 2018-01-06

INPUT

and i want these data in output

EXPECTED OUTPUT

Name Last Status Count Machine1 Fault 3 Machine2 Active 3 Machine3 Active 1 Because Date is not Continuous

*Count : Last number of status in continuous history

What is your code and what error are you getting?

– Tab Alleman
Sep 11 '18 at 15:22

Welcome to SO. If I understood your proeblem correctly, please improve/clarify your question by adding this wording: "Last Status" is the most recently given status of a particular machine "Count" is the number of distinct consecutive days this status has been reported up to the most recent date. If there is a gap in the dates (no record for a particular date/machine/status), the counter is reset

– Caius Jard
Sep 11 '18 at 15:40

You're sample data doesn't match the expected output: Machine2 is Active for 3 days from 2 Jan to 4 Jan Machine3 is Active for 2 days from 1 Jan to 2 Jan

– Mazhar
Sep 11 '18 at 15:48

Machine2 is Active for 3 days from 2 Jan to 4 Jan

Machine3 is Active for 2 days from 1 Jan to 2 Jan

@Mazhar I agree with the error in machine 2, it has been active for the last 3 days, but Machine 3 has been most recently active, and this active is 1 day long report only, so that is correct.

– Caius Jard
Sep 11 '18 at 15:52

I wrote a query which I thought would solve the problem, then saw your requirement about continuous history. If a date is missing, why wouldn't the space between non continuous dates not have the same status? I think it would be OK to assume this.

– Tim Biegeleisen
Sep 11 '18 at 15:53

2 Answers
2

I believe it is as simple as this:

WITH cte1 AS ( SELECT Name, Status, DATEADD(DAY, ROW_NUMBER() OVER (PARTITION BY Name, Status ORDER BY Date DESC) - 1, Date) AS GroupingDate FROM testdata ), cte2 AS ( SELECT Name, Status, RANK() OVER (PARTITION BY Name ORDER BY GroupingDate DESC) AS GroupingNumber FROM cte1 ) SELECT Name, Status AS LastStatus, COUNT(*) AS LastStatusCount FROM cte2 WHERE GroupingNumber = 1 GROUP BY Name, Status ORDER BY Name

Result and DBFiddle:

| Name | LastStatus | LastStatusCount | |----------|------------|-----------------| | Machine1 | Fault | 3 | | Machine2 | Active | 3 | | Machine3 | Active | 1 |

In order to understand how this works, look at the intermediate values generated by CTE:

| Name | Status | Date | RowNumber | GroupingDate | GroupingNumber | |----------|--------|---------------------|-----------|---------------------|----------------| | Machine1 | Fault | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 | | Machine1 | Fault | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 | | Machine1 | Fault | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 | | Machine1 | Active | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 | | Machine2 | Active | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 | | Machine2 | Active | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 | | Machine2 | Active | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 | | Machine2 | Fault | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 | | Machine3 | Active | 06/01/2018 00:00:00 | 1 | 06/01/2018 00:00:00 | 1 | | Machine3 | Active | 02/01/2018 00:00:00 | 2 | 03/01/2018 00:00:00 | 2 | | Machine3 | Active | 01/01/2018 00:00:00 | 3 | 03/01/2018 00:00:00 | 2 |

The trick here is that if two numbers are contiguous then subtracting contiguous numbers from them will result in same value. E.g. if we have 5, 6, 8, 9 then subtracting 1, 2, 3, 4 in that order will produce 4, 4, 5, 5.

5, 6, 8, 9

1, 2, 3, 4

4, 4, 5, 5

Neat trick; here Salman establishes an incrementing counter that is added to a decrementing date. For consecutive dates, this naturally produces a fixed output date (3rd jan + 0 = 3rd jan, 2nd jan + 1 = 3rd jan, 1st jan + 2 = 3rd jan), then the data is ranked by this. Rank produces the same "winning position" for equal values so all of the consecutive dates that became the same output date are all ranked first. These are then grouped and counted

– Caius Jard
Sep 12 '18 at 9:09

I think this will work, though SQLFiddle is having a fit at the moment, so I can't test:

SELECT [Name], [Status], ct as [Count] FROM ( SELECT [name], [status], [date], 1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct, row_number() over(partition by [name] order by [date] desc) rn FROM ( SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp FROM t ) x ) y WHERE rn = 1

It first off uses LAG to look at the current row and the previous row (grouping the data into machine name and status, ordering the data by date) and if the current date is 1 day different to the previous date it records a 1 else a 0

These 1s and zeroes are summed in a running total fashion, resetting when the machine name or status changes (the partition of the sum() over() )

Also we want to consider the data just in terms of the machine name, and we only want the latest record from each machine, so we partition by the machine name, and count in order of date descending, then just pick (with the where clause) the rows that are numbered 1 for each machine

It actually makes a lot more sense if you run the queries separately, like this

Calculate the "is the current report consecutive with the previous report, for the given status and machine" 1 = yes, 0 = no:

SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp FROM t

Calculate the "what is the running total of the current block of consecutive reports":

SELECT [name], [status], [date], 1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct, row_number() over(partition by [name] order by [date] desc) rn FROM ( SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp FROM t ) x

Then of course, the whole thing but without the where clause, so you can see the data we're discarding:

Fiddle finally woke up:

http://www.sqlfiddle.com/#!18/77dae/2

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy

搜尋此網誌

Dfyjkt