Match on interval with wildcard
Match on interval with wildcard
I have two values, for example:
from = XY05*
to = XY55*
Then I have a tbl_dff
that contains a lot of strings.
tbl_dff
Codes = ["XY05A", "XY56", "XY555", "AT003", "XY55AB", "XY35QA"
"GA003A", "XY36", "XY100", "XY03",...]
I want to use my variables from
and to
to see if any of these are in my Codes
variable.
from
to
Codes
From the example, I wish to have a match on:
"XY05A"
"XY555"
"XY36"
"XY55AB"
"XY35QA"
Since it is between XY05* - XY55*
. The *
just says, that I do not care what is after.
XY05* - XY55*
*
Hope this makes sense.
XY36
why do you want a match on
XY555
, but not on XY100
?– Wimpel
Sep 5 '18 at 10:12
XY555
XY100
@missuse Yeah. so I want to match all between the interval 05-55, that contains XY.
– KaZyKa
Sep 5 '18 at 10:13
@Wimpel Because I have the wildcard after XY55*, so I can also match on XY55A. I have changed the example, to show, that I also wish to match on letters as well as numbers after the
*
– KaZyKa
Sep 5 '18 at 10:14
*
You can always test if string is >=
XY05
and < XY56
.– Salman A
Sep 5 '18 at 11:11
XY05
XY56
2 Answers
2
Try this pattern: XY(0[5-9]|[1-4]d|5[0-5]).*
.
XY(0[5-9]|[1-4]d|5[0-5]).*
(0[5-9]|[1-4]d|5[0-5])
will match any number between 05
and 55
and any number of any characters after.
(0[5-9]|[1-4]d|5[0-5])
05
55
Demo
It works, what would I need to change if my
from
and to
are changing? For example: from = weq23*
to = weq24*
– KaZyKa
Sep 5 '18 at 10:16
from
to
from = weq23*
to = weq24*
@KaZyKa
weq
replaces XY
, (0[5-9]|[1-4]d|5[0-5])$
becomes 2[34]$
, and 05.|55.
would become 23.|24.
. And pattern would become: weq(2[34]$|23.|24.)
, which can be simplified to weq2[34].
. But this is sepcial, since there are no numbers between 23 and 24 :)– Michał Turczyn
Sep 5 '18 at 10:21
weq
XY
(0[5-9]|[1-4]d|5[0-5])$
2[34]$
05.|55.
23.|24.
weq(2[34]$|23.|24.)
weq2[34].
@KaZyKa You should accept the answer and optionally upvote, if it solved your problem.
– Michał Turczyn
Sep 5 '18 at 10:22
Turczyn. I just played with your Regex. It doesn't solve, if I have XY55AB, but does for XY55A. After the XY55*, i should have able to match on the string, no matter if it has on extra letter or five. So I must say that it did not solve my problem :(
– KaZyKa
Sep 5 '18 at 10:24
@KaZyKa Try update.
– Michał Turczyn
Sep 5 '18 at 10:25
Can you have from
and to
as integers instead of the full strings? That way you can extract the integers from the Codes
vector, and compare them directly to from
and to
:
from
to
Codes
from
to
from <- 5
to <- 55
pattern <- "XY([0-9]+).*"
# use regex to extract the integer part of each string
Codes_int <- as.integer( sub( pattern, "\1", Codes ) )
# return only the `Codes` where the integer is in range
Codes[ Codes_int >= from & Codes_int <= to & grepl( pattern, Codes ) ]
But what if XY codes changes to WEQ f.x.? can I make this as a variable in your
Codes_int <- as.integer( sub( "(asvariable)([0-9]+).*", "\1", Codes ) )
?– KaZyKa
Sep 5 '18 at 10:26
Codes_int <- as.integer( sub( "(asvariable)([0-9]+).*", "\1", Codes ) )
Then you change the code to match what you need. See my edit, you can easily move that variable outside if that's easier for you.
– rosscova
Sep 5 '18 at 10:30
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
and
XY36
is matched since 36 is between 5 and 55?– missuse
Sep 5 '18 at 10:09