SQL Server : creating new table to break up description field into a new organized table
SQL Server : creating new table to break up description field into a new organized table
In our SQL Server database I have a description
column in dbo.tickets
table. This data is a mess and I am required to pull data from it even though its not really a string it has **
separating some of the text.
description
dbo.tickets
**
Is it possible to pull this description data out creating a new table maybe tying it with the ID and joining the new table to see the data better displayed and to actually use it since someone had the bright idea of dumping it all into the description
.
description
This is what one tickets description looks like:
**You are receiving a Documented Verbal Warning ECR.**
**ECR Category:** $100 or more, Over/Short
**Date of ECR Incident:** 2018-04-12
**Date of ECR Discovery:** 2018-04-12
**Location of ECR Incident:** Palma Sola
**Overage or Shortage:** Shortage
**$ Amount Over/Short:** 138.92
**Checkup/Balance Person:** Jennifer Brierton
**This is considered a serious event and has been reviewed by the CSAD Assistant Director and/or Director for appropriate action.**
**This is being considered your 1st Cash Handling Event.**
You must always strive to ensure a structured procedure is followed when counting, collecting, refunding, and depositing cash and/or checks. Cash handling is a critical part of our job; it is one of our Six Standards of Excellence and is taken into consideration during your Annual Performance Evaluation.
During the counseling session to review this ECR in the Manager’s office, you will be provided the opportunity to make comments regarding this ECR. If you do not have any comments during the counseling session, you must respond stating, “I do not have any commentsâ€.
Any comments made by the Associate/Manager will be reviewed by the CSAD Director and an email will be sent to the Associate and Manager.
**Detailed description of incident**: On 04/12/2018 Michelle processed a title and registration transfer. The total amount of the transaction was $154.35. Michelle in error, ran the debit card for $15.43, therefore creating a shortage in the amount of $138.92. The customer was contacted the same day and returned to pay the shortage.
**Actions you can take to avoid this incident in the future:** Stay focused. Maintain a solid routine. Spend a few extra seconds to double check while making change, counting back change, collecting, and depositing monies. Refer to CCARS-05 (Tips & Hints for Proper Cash Handling).
Per CCARS-11 it states the following: Enter the amount of the debit card transaction in the DEBIT CARD field **directly from the debit card receipt for the transaction.** CCARS and the debit machine do not "talk" over any type of network; this will help ensure the accuracy of the transaction.
**BEST PRACTICE!** Hit the "Debit Card` button on the CCARS screen and (1) the system will display a message reminder to enter the amount from the debit card receipt and (2) automatically place the cursor in the debit field.
By following these steps, they serve as a safety net to ensure the correct amount is collected and entered in CCARS.
I need to figure out how to pull out data like ECR Category, Date of ECR Discovery, Location of ECR Incident, there name... Is there any way to break this up into a new table? To pull the data easier from. (ALL DESCRIPTIONS PRETTY MUCH MATCH THIS FORMAT IT APPEARS.)
Any help with this would be greatly appreciated! I am unsure where to even begin.
What I've tried
select
u.Id, u.Name, t.submitter_id, t.description
from
Users u
join
Tickets t on t.submitter_id = u.Id
where
t.created_at between '2017-11-01' and '2018-08-23'
and ',' + t.tags + ',' like '%,' + 'ecr_administered' + ',%'
(From here) can't figure out how to break up the description so I can pull ECR category, date of ECR category and the persons name and put it into a joining table.
Well, as stated above, but... you might try splitting it to table by new line character and then work on each row. Depending on how pretty much they are the same, you can make assumptions like 3rd rows contain ECR Category.
– Pm Duda
Aug 23 at 21:27
"T-SQL split string". Lots of examples at SO.
– Ivan Starostin
Aug 23 at 21:37
2 Answers
2
All you need to do here, is to find index of each of the field in the header:
**ECR Category:** $100 or more, Over/Short
**Date of ECR Incident:** 2018-04-12
**Date of ECR Discovery:** 2018-04-12
**Location of ECR Incident:** Palma Sola
**Overage or Shortage:** Shortage
**$ Amount Over/Short:** 138.92
**Checkup/Balance Person:** Jennifer Brierton
so you can use substring
function.
substring
In order to gain some performance we can cut out the irrelevant part in such way:
select substring(description, 1, charindex(char(10), description, charindex('**Checkup/Balance Person', description)) ) dscr
from tickets
Simply we look for char(10)
, which is newline character in SQL after last field in header, so we cut out all unnecessary information. I would store it in some temporary table or table variable. Now we will have column with text around 400 chars, which isn't so bad.
char(10)
In below query I assumed #tempTable
with dscr
column :)
#tempTable
dscr
To get all data you need it's just proper use of charindex
together with substring
:
charindex
substring
select substring(dscr, ectCatStart, charindex(char(10), dscr, ectCatStart) - ectCatStart - 1),
substring(dscr, dateOfEcr, charindex(char(10), dscr, dateOfEcr) - dateOfEcr - 1),
substring(dscr, dateOfDiscovery, charindex(char(10), dscr, dateOfDiscovery) - dateOfDiscovery - 1),
substring(dscr, location, charindex(char(10), dscr, location) - location - 1),
substring(dscr, overShort, charindex(char(10), dscr, overShort) - overShort - 1),
substring(dscr, amount, charindex(char(10), dscr, amount) - amount - 1),
substring(dscr, person, charindex(char(10), dscr, person) - person - 1)
from (
select charindex('**ECR Category:**', dscr) + len('**ECR Category:**') + 1 ectCatStart,
charindex('**Date of ECR Incident:**', dscr) + len('**Date of ECR Incident:**') + 1 dateOfEcr,
charindex('**Date of ECR Discovery:**', dscr) + len('**Date of ECR Discovery:**') + 1 dateOfDiscovery,
charindex('**Location of ECR Incident:**', dscr) + len('**Location of ECR Incident:**') + 1 location,
charindex('**Overage or Shortage:**', dscr) + len('**Overage or Shortage:**') + 1 overShort,
charindex('**$ Amount Over/Short:**', dscr) + len('**$ Amount Over/Short:**') + 1 amount,
charindex('**Checkup/Balance Person:**', dscr) + len('**Checkup/Balance Person:**') + 1 person,
dscr
from #tempTable
) a
@DavidBrierton And how is it? I checked with sample data and it worked pretty well. In order to get good performance, try to separate out this parsing apart from joining and other operations.
– Michał Turczyn
Aug 27 at 18:41
@DavidBrierton It is temporary table, that I advised you to use.
– Michał Turczyn
Aug 27 at 18:43
@DavidBrierton Yes, just to store relevant part of data.
– Michał Turczyn
Aug 27 at 18:43
You can do this by converting your string to xml, then using xpath to get the fields you want.
You need to be careful that your string ends up valid xml, so may need some additional replace's. You can see I have got one for an embedded "&".
This example shows how to do it, using the first 5 lines of your description, extracting the first two fields:
declare @tickets table (x varchar(1000))
insert @tickets values ('**You are receiving a Documented Verbal Warning ECR.**
**ECR Category:** $100 or more, Over/Short
**Date of ECR Incident:** 2018-04-12
**Date of ECR Discovery:** 2018-04-12
**Location of ECR Incident:** Palma Sola')
;with x as (
select convert(xml,'<row>'+replace(replace(x,'**','</row><row>'),'&','&')+'</row>') x
from @tickets
)
select
x.value('(/row[. >> (/row[. = "ECR Category:"])[1]])[1]','varchar(100)') as ECRCategory,
x.value('(/row[. >> (/row[. = "Date of ECR Incident:"])[1]])[1]','varchar(100)') as ECRDate
from x
EDIT: The above gives a complete example. To use your original query, do this:
;with t as (
select submitter_id, description, convert(xml,'<row>'+replace(replace(tags,'**','</row><row>'),'&','&')+'</row>') x
from Tickets
where created_at between '2017-11-01' and '2018-08-23'
)
select u.Id, u.Name, t.submitter_id, t.description,
x.value('(/row[. >> (/row[. = "ECR Category:"])[1]])[1]','varchar(100)') as ECRCategory,
x.value('(/row[. >> (/row[. = "Date of ECR Incident:"])[1]])[1]','varchar(100)') as ECRDate
from Users u
join t on t.submitter_id = u.Id
Sorry @DavidBrierton, I copied it from when I used @t to hold the string and neglected to update it. Have edited now. In my example I have created a temp table variable to hold the text so it fits into a nice simple example. I will add another query at the end that uses your tables.
– TomC
Aug 25 at 0:51
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
IMHO doing this with T-SQL would be an overkill. Instead use the programming language you know to do the parsing.
– Cetin Basoz
Aug 23 at 21:22