Regex to find name in sentence
up vote
0
down vote
favorite
I have some sentence like
1:
"RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
ball is correctly called."
2:
"Nurkic (POR) maintains legal
guarding position and makes incidental contact with Wall (WAS) that
does not affect his driving shot attempt."
I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".
p = r's*(w+?)s[(]'
use this pattern,
I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."
Who can help me?
python regex
add a comment |
up vote
0
down vote
favorite
I have some sentence like
1:
"RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
ball is correctly called."
2:
"Nurkic (POR) maintains legal
guarding position and makes incidental contact with Wall (WAS) that
does not affect his driving shot attempt."
I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".
p = r's*(w+?)s[(]'
use this pattern,
I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."
Who can help me?
python regex
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have some sentence like
1:
"RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
ball is correctly called."
2:
"Nurkic (POR) maintains legal
guarding position and makes incidental contact with Wall (WAS) that
does not affect his driving shot attempt."
I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".
p = r's*(w+?)s[(]'
use this pattern,
I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."
Who can help me?
python regex
I have some sentence like
1:
"RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
ball is correctly called."
2:
"Nurkic (POR) maintains legal
guarding position and makes incidental contact with Wall (WAS) that
does not affect his driving shot attempt."
I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".
p = r's*(w+?)s[(]'
use this pattern,
I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."
Who can help me?
python regex
python regex
edited Nov 9 at 6:39
Alex
711620
711620
asked Nov 9 at 6:31
whichen
114
114
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
You can use the following regex:
(?:[A-Z][a-z][s.a-z]*)+(?=s()
|-----Main Pattern-----|
Details:
(?:)
- Creates a non-capturing group[A-Z]
- Captures 1 uppercase letter[a-z]
- Captures 1 lowercase letter[s.a-z]*
- Captures spaces (' '
), periods ('.'
) or lowercase letters 0+ times(?=s()
- Captures the main pattern if it is only followed by' ('
string
str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called.
Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''
res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )
print(res)
Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3
Match: https://regex101.com/r/OsLTrY/1
Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
– whichen
Nov 9 at 7:33
add a comment |
up vote
1
down vote
Here is one approach:
line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
print(results)
['Oubre Jr', 'Nurkic']
The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr.
or Sr.
, which in turn is followed by a ([A-Z]+)
term.
seems the code coloring does not handle inline ' that well :/ nice solution
– Patrick Artner
Nov 9 at 7:10
Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
– whichen
Nov 9 at 7:18
Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
– tripleee
Nov 9 at 7:21
@tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
– Tim Biegeleisen
Nov 9 at 7:25
If you really want to discuss this further, maybe post a question on Meta Stack Overflow
– tripleee
Nov 9 at 7:30
add a comment |
up vote
0
down vote
You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources
import re
suffs = ["Jr."] # append more to list
rsu = r"(?:"+"|".join(suffs)+")? ?"
# combine with suffixes
regex = r"(w+ "+rsu+")(w3)"
test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."
matches = re.finditer(regex, test_str, re.MULTILINE)
names =
for matchNum, match in enumerate(matches,1):
for groupNum in range(0, len(match.groups())):
names.extend(match.groups(groupNum))
print(names)
Output:
['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']
This should work as long as you do not have Names with non-w
in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.
Explanation:
r"(?:"+"|".join(suffs)+")? ?"
--> all items in the listsuffs
are strung together via|
(OR) as non grouping (?:...) and made optional followed by optional space.r"(w+ "+rsu+")(w3)"
--> the regex looks for any word characters followed by optionalsuffs
group we just build, followed by literal(
then three word characters followed by another literal)
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can use the following regex:
(?:[A-Z][a-z][s.a-z]*)+(?=s()
|-----Main Pattern-----|
Details:
(?:)
- Creates a non-capturing group[A-Z]
- Captures 1 uppercase letter[a-z]
- Captures 1 lowercase letter[s.a-z]*
- Captures spaces (' '
), periods ('.'
) or lowercase letters 0+ times(?=s()
- Captures the main pattern if it is only followed by' ('
string
str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called.
Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''
res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )
print(res)
Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3
Match: https://regex101.com/r/OsLTrY/1
Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
– whichen
Nov 9 at 7:33
add a comment |
up vote
1
down vote
accepted
You can use the following regex:
(?:[A-Z][a-z][s.a-z]*)+(?=s()
|-----Main Pattern-----|
Details:
(?:)
- Creates a non-capturing group[A-Z]
- Captures 1 uppercase letter[a-z]
- Captures 1 lowercase letter[s.a-z]*
- Captures spaces (' '
), periods ('.'
) or lowercase letters 0+ times(?=s()
- Captures the main pattern if it is only followed by' ('
string
str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called.
Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''
res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )
print(res)
Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3
Match: https://regex101.com/r/OsLTrY/1
Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
– whichen
Nov 9 at 7:33
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can use the following regex:
(?:[A-Z][a-z][s.a-z]*)+(?=s()
|-----Main Pattern-----|
Details:
(?:)
- Creates a non-capturing group[A-Z]
- Captures 1 uppercase letter[a-z]
- Captures 1 lowercase letter[s.a-z]*
- Captures spaces (' '
), periods ('.'
) or lowercase letters 0+ times(?=s()
- Captures the main pattern if it is only followed by' ('
string
str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called.
Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''
res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )
print(res)
Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3
Match: https://regex101.com/r/OsLTrY/1
You can use the following regex:
(?:[A-Z][a-z][s.a-z]*)+(?=s()
|-----Main Pattern-----|
Details:
(?:)
- Creates a non-capturing group[A-Z]
- Captures 1 uppercase letter[a-z]
- Captures 1 lowercase letter[s.a-z]*
- Captures spaces (' '
), periods ('.'
) or lowercase letters 0+ times(?=s()
- Captures the main pattern if it is only followed by' ('
string
str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called.
Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''
res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )
print(res)
Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3
Match: https://regex101.com/r/OsLTrY/1
edited Nov 9 at 8:05
answered Nov 9 at 6:54
rv7
1,8731322
1,8731322
Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
– whichen
Nov 9 at 7:33
add a comment |
Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
– whichen
Nov 9 at 7:33
Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
– whichen
Nov 9 at 7:33
Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
– whichen
Nov 9 at 7:33
add a comment |
up vote
1
down vote
Here is one approach:
line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
print(results)
['Oubre Jr', 'Nurkic']
The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr.
or Sr.
, which in turn is followed by a ([A-Z]+)
term.
seems the code coloring does not handle inline ' that well :/ nice solution
– Patrick Artner
Nov 9 at 7:10
Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
– whichen
Nov 9 at 7:18
Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
– tripleee
Nov 9 at 7:21
@tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
– Tim Biegeleisen
Nov 9 at 7:25
If you really want to discuss this further, maybe post a question on Meta Stack Overflow
– tripleee
Nov 9 at 7:30
add a comment |
up vote
1
down vote
Here is one approach:
line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
print(results)
['Oubre Jr', 'Nurkic']
The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr.
or Sr.
, which in turn is followed by a ([A-Z]+)
term.
seems the code coloring does not handle inline ' that well :/ nice solution
– Patrick Artner
Nov 9 at 7:10
Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
– whichen
Nov 9 at 7:18
Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
– tripleee
Nov 9 at 7:21
@tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
– Tim Biegeleisen
Nov 9 at 7:25
If you really want to discuss this further, maybe post a question on Meta Stack Overflow
– tripleee
Nov 9 at 7:30
add a comment |
up vote
1
down vote
up vote
1
down vote
Here is one approach:
line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
print(results)
['Oubre Jr', 'Nurkic']
The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr.
or Sr.
, which in turn is followed by a ([A-Z]+)
term.
Here is one approach:
line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
print(results)
['Oubre Jr', 'Nurkic']
The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr.
or Sr.
, which in turn is followed by a ([A-Z]+)
term.
edited Nov 9 at 7:19
answered Nov 9 at 6:39
Tim Biegeleisen
212k1384132
212k1384132
seems the code coloring does not handle inline ' that well :/ nice solution
– Patrick Artner
Nov 9 at 7:10
Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
– whichen
Nov 9 at 7:18
Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
– tripleee
Nov 9 at 7:21
@tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
– Tim Biegeleisen
Nov 9 at 7:25
If you really want to discuss this further, maybe post a question on Meta Stack Overflow
– tripleee
Nov 9 at 7:30
add a comment |
seems the code coloring does not handle inline ' that well :/ nice solution
– Patrick Artner
Nov 9 at 7:10
Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
– whichen
Nov 9 at 7:18
Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
– tripleee
Nov 9 at 7:21
@tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
– Tim Biegeleisen
Nov 9 at 7:25
If you really want to discuss this further, maybe post a question on Meta Stack Overflow
– tripleee
Nov 9 at 7:30
seems the code coloring does not handle inline ' that well :/ nice solution
– Patrick Artner
Nov 9 at 7:10
seems the code coloring does not handle inline ' that well :/ nice solution
– Patrick Artner
Nov 9 at 7:10
Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
– whichen
Nov 9 at 7:18
Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
– whichen
Nov 9 at 7:18
Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
– tripleee
Nov 9 at 7:21
Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
– tripleee
Nov 9 at 7:21
@tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
– Tim Biegeleisen
Nov 9 at 7:25
@tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
– Tim Biegeleisen
Nov 9 at 7:25
If you really want to discuss this further, maybe post a question on Meta Stack Overflow
– tripleee
Nov 9 at 7:30
If you really want to discuss this further, maybe post a question on Meta Stack Overflow
– tripleee
Nov 9 at 7:30
add a comment |
up vote
0
down vote
You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources
import re
suffs = ["Jr."] # append more to list
rsu = r"(?:"+"|".join(suffs)+")? ?"
# combine with suffixes
regex = r"(w+ "+rsu+")(w3)"
test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."
matches = re.finditer(regex, test_str, re.MULTILINE)
names =
for matchNum, match in enumerate(matches,1):
for groupNum in range(0, len(match.groups())):
names.extend(match.groups(groupNum))
print(names)
Output:
['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']
This should work as long as you do not have Names with non-w
in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.
Explanation:
r"(?:"+"|".join(suffs)+")? ?"
--> all items in the listsuffs
are strung together via|
(OR) as non grouping (?:...) and made optional followed by optional space.r"(w+ "+rsu+")(w3)"
--> the regex looks for any word characters followed by optionalsuffs
group we just build, followed by literal(
then three word characters followed by another literal)
add a comment |
up vote
0
down vote
You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources
import re
suffs = ["Jr."] # append more to list
rsu = r"(?:"+"|".join(suffs)+")? ?"
# combine with suffixes
regex = r"(w+ "+rsu+")(w3)"
test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."
matches = re.finditer(regex, test_str, re.MULTILINE)
names =
for matchNum, match in enumerate(matches,1):
for groupNum in range(0, len(match.groups())):
names.extend(match.groups(groupNum))
print(names)
Output:
['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']
This should work as long as you do not have Names with non-w
in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.
Explanation:
r"(?:"+"|".join(suffs)+")? ?"
--> all items in the listsuffs
are strung together via|
(OR) as non grouping (?:...) and made optional followed by optional space.r"(w+ "+rsu+")(w3)"
--> the regex looks for any word characters followed by optionalsuffs
group we just build, followed by literal(
then three word characters followed by another literal)
add a comment |
up vote
0
down vote
up vote
0
down vote
You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources
import re
suffs = ["Jr."] # append more to list
rsu = r"(?:"+"|".join(suffs)+")? ?"
# combine with suffixes
regex = r"(w+ "+rsu+")(w3)"
test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."
matches = re.finditer(regex, test_str, re.MULTILINE)
names =
for matchNum, match in enumerate(matches,1):
for groupNum in range(0, len(match.groups())):
names.extend(match.groups(groupNum))
print(names)
Output:
['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']
This should work as long as you do not have Names with non-w
in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.
Explanation:
r"(?:"+"|".join(suffs)+")? ?"
--> all items in the listsuffs
are strung together via|
(OR) as non grouping (?:...) and made optional followed by optional space.r"(w+ "+rsu+")(w3)"
--> the regex looks for any word characters followed by optionalsuffs
group we just build, followed by literal(
then three word characters followed by another literal)
You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources
import re
suffs = ["Jr."] # append more to list
rsu = r"(?:"+"|".join(suffs)+")? ?"
# combine with suffixes
regex = r"(w+ "+rsu+")(w3)"
test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."
matches = re.finditer(regex, test_str, re.MULTILINE)
names =
for matchNum, match in enumerate(matches,1):
for groupNum in range(0, len(match.groups())):
names.extend(match.groups(groupNum))
print(names)
Output:
['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']
This should work as long as you do not have Names with non-w
in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.
Explanation:
r"(?:"+"|".join(suffs)+")? ?"
--> all items in the listsuffs
are strung together via|
(OR) as non grouping (?:...) and made optional followed by optional space.r"(w+ "+rsu+")(w3)"
--> the regex looks for any word characters followed by optionalsuffs
group we just build, followed by literal(
then three word characters followed by another literal)
edited Nov 9 at 7:08
answered Nov 9 at 6:52
Patrick Artner
19k51940
19k51940
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53220866%2fregex-to-find-name-in-sentence%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown