Regex to find name in sentence









up vote
0
down vote

favorite












I have some sentence like



1:




"RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
ball is correctly called."




2:




"Nurkic (POR) maintains legal
guarding position and makes incidental contact with Wall (WAS) that
does not affect his driving shot attempt."




I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".



p = r's*(w+?)s[(]' 


use this pattern,
I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."



Who can help me?










share|improve this question



























    up vote
    0
    down vote

    favorite












    I have some sentence like



    1:




    "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
    ball is correctly called."




    2:




    "Nurkic (POR) maintains legal
    guarding position and makes incidental contact with Wall (WAS) that
    does not affect his driving shot attempt."




    I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".



    p = r's*(w+?)s[(]' 


    use this pattern,
    I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."



    Who can help me?










    share|improve this question

























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have some sentence like



      1:




      "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
      ball is correctly called."




      2:




      "Nurkic (POR) maintains legal
      guarding position and makes incidental contact with Wall (WAS) that
      does not affect his driving shot attempt."




      I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".



      p = r's*(w+?)s[(]' 


      use this pattern,
      I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."



      Who can help me?










      share|improve this question















      I have some sentence like



      1:




      "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held
      ball is correctly called."




      2:




      "Nurkic (POR) maintains legal
      guarding position and makes incidental contact with Wall (WAS) that
      does not affect his driving shot attempt."




      I need to use Python regex to find the name "Oubre Jr." ,"Nurkic" and "Nurkic", "Wall".



      p = r's*(w+?)s[(]' 


      use this pattern,
      I can find "['Nurkic', 'Wall']", but in sentence 1, I just can find ['Nurkic'], missed "Oubre Jr."



      Who can help me?







      python regex






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 9 at 6:39









      Alex

      711620




      711620










      asked Nov 9 at 6:31









      whichen

      114




      114






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          You can use the following regex:



          (?:[A-Z][a-z][s.a-z]*)+(?=s()



          |-----Main Pattern-----|



          Details:


          • (?:) - Creates a non-capturing group


          • [A-Z] - Captures 1 uppercase letter


          • [a-z] - Captures 1 lowercase letter


          • [s.a-z]* - Captures spaces (' '), periods ('.') or lowercase letters 0+ times


          • (?=s() - Captures the main pattern if it is only followed by ' (' string


          str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. 

          Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''

          res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )

          print(res)


          Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3



          Match: https://regex101.com/r/OsLTrY/1






          share|improve this answer






















          • Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
            – whichen
            Nov 9 at 7:33

















          up vote
          1
          down vote













          Here is one approach:



          line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
          results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
          print(results)

          ['Oubre Jr', 'Nurkic']


          The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr. or Sr., which in turn is followed by a ([A-Z]+) term.






          share|improve this answer






















          • seems the code coloring does not handle inline ' that well :/ nice solution
            – Patrick Artner
            Nov 9 at 7:10










          • Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
            – whichen
            Nov 9 at 7:18










          • Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
            – tripleee
            Nov 9 at 7:21










          • @tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
            – Tim Biegeleisen
            Nov 9 at 7:25










          • If you really want to discuss this further, maybe post a question on Meta Stack Overflow
            – tripleee
            Nov 9 at 7:30

















          up vote
          0
          down vote













          You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources



          import re

          suffs = ["Jr."] # append more to list

          rsu = r"(?:"+"|".join(suffs)+")? ?"

          # combine with suffixes
          regex = r"(w+ "+rsu+")(w3)"

          test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

          matches = re.finditer(regex, test_str, re.MULTILINE)

          names =
          for matchNum, match in enumerate(matches,1):
          for groupNum in range(0, len(match.groups())):
          names.extend(match.groups(groupNum))

          print(names)


          Output:



          ['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']


          This should work as long as you do not have Names with non-w in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.




          Explanation:




          • r"(?:"+"|".join(suffs)+")? ?" --> all items in the list suffs are strung together via | (OR) as non grouping (?:...) and made optional followed by optional space.


          • r"(w+ "+rsu+")(w3)" --> the regex looks for any word characters followed by optional suffs group we just build, followed by literal ( then three word characters followed by another literal )





          share|improve this answer






















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53220866%2fregex-to-find-name-in-sentence%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            You can use the following regex:



            (?:[A-Z][a-z][s.a-z]*)+(?=s()



            |-----Main Pattern-----|



            Details:


            • (?:) - Creates a non-capturing group


            • [A-Z] - Captures 1 uppercase letter


            • [a-z] - Captures 1 lowercase letter


            • [s.a-z]* - Captures spaces (' '), periods ('.') or lowercase letters 0+ times


            • (?=s() - Captures the main pattern if it is only followed by ' (' string


            str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. 

            Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''

            res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )

            print(res)


            Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3



            Match: https://regex101.com/r/OsLTrY/1






            share|improve this answer






















            • Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
              – whichen
              Nov 9 at 7:33














            up vote
            1
            down vote



            accepted










            You can use the following regex:



            (?:[A-Z][a-z][s.a-z]*)+(?=s()



            |-----Main Pattern-----|



            Details:


            • (?:) - Creates a non-capturing group


            • [A-Z] - Captures 1 uppercase letter


            • [a-z] - Captures 1 lowercase letter


            • [s.a-z]* - Captures spaces (' '), periods ('.') or lowercase letters 0+ times


            • (?=s() - Captures the main pattern if it is only followed by ' (' string


            str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. 

            Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''

            res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )

            print(res)


            Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3



            Match: https://regex101.com/r/OsLTrY/1






            share|improve this answer






















            • Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
              – whichen
              Nov 9 at 7:33












            up vote
            1
            down vote



            accepted







            up vote
            1
            down vote



            accepted






            You can use the following regex:



            (?:[A-Z][a-z][s.a-z]*)+(?=s()



            |-----Main Pattern-----|



            Details:


            • (?:) - Creates a non-capturing group


            • [A-Z] - Captures 1 uppercase letter


            • [a-z] - Captures 1 lowercase letter


            • [s.a-z]* - Captures spaces (' '), periods ('.') or lowercase letters 0+ times


            • (?=s() - Captures the main pattern if it is only followed by ' (' string


            str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. 

            Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''

            res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )

            print(res)


            Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3



            Match: https://regex101.com/r/OsLTrY/1






            share|improve this answer














            You can use the following regex:



            (?:[A-Z][a-z][s.a-z]*)+(?=s()



            |-----Main Pattern-----|



            Details:


            • (?:) - Creates a non-capturing group


            • [A-Z] - Captures 1 uppercase letter


            • [a-z] - Captures 1 lowercase letter


            • [s.a-z]* - Captures spaces (' '), periods ('.') or lowercase letters 0+ times


            • (?=s() - Captures the main pattern if it is only followed by ' (' string


            str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. 

            Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''

            res = re.findall( r'(?:[A-Z][a-z][s.a-z]*)+(?=s()', str )

            print(res)


            Demo: https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3



            Match: https://regex101.com/r/OsLTrY/1







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 9 at 8:05

























            answered Nov 9 at 6:54









            rv7

            1,8731322




            1,8731322











            • Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
              – whichen
              Nov 9 at 7:33
















            • Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
              – whichen
              Nov 9 at 7:33















            Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
            – whichen
            Nov 9 at 7:33




            Since I will have sentence like "McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." . So First I use "re.sub(r''s*','',test_str)" , then re.findall( r'([A-Z][a-z][s.a-z]*)+(?=s()', test_str) . Thanks you very much.
            – whichen
            Nov 9 at 7:33












            up vote
            1
            down vote













            Here is one approach:



            line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
            results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
            print(results)

            ['Oubre Jr', 'Nurkic']


            The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr. or Sr., which in turn is followed by a ([A-Z]+) term.






            share|improve this answer






















            • seems the code coloring does not handle inline ' that well :/ nice solution
              – Patrick Artner
              Nov 9 at 7:10










            • Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
              – whichen
              Nov 9 at 7:18










            • Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
              – tripleee
              Nov 9 at 7:21










            • @tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
              – Tim Biegeleisen
              Nov 9 at 7:25










            • If you really want to discuss this further, maybe post a question on Meta Stack Overflow
              – tripleee
              Nov 9 at 7:30














            up vote
            1
            down vote













            Here is one approach:



            line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
            results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
            print(results)

            ['Oubre Jr', 'Nurkic']


            The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr. or Sr., which in turn is followed by a ([A-Z]+) term.






            share|improve this answer






















            • seems the code coloring does not handle inline ' that well :/ nice solution
              – Patrick Artner
              Nov 9 at 7:10










            • Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
              – whichen
              Nov 9 at 7:18










            • Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
              – tripleee
              Nov 9 at 7:21










            • @tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
              – Tim Biegeleisen
              Nov 9 at 7:25










            • If you really want to discuss this further, maybe post a question on Meta Stack Overflow
              – tripleee
              Nov 9 at 7:30












            up vote
            1
            down vote










            up vote
            1
            down vote









            Here is one approach:



            line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
            results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
            print(results)

            ['Oubre Jr', 'Nurkic']


            The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr. or Sr., which in turn is followed by a ([A-Z]+) term.






            share|improve this answer














            Here is one approach:



            line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
            results = re.findall( r'([A-Z][w+'](?: [JS][r][.]?)?)(?= ([A-Z]+))', line, re.M|re.I)
            print(results)

            ['Oubre Jr', 'Nurkic']


            The above logic will attempt to match one name, beginning with a capital letter, which is possibly followed by either the suffix Jr. or Sr., which in turn is followed by a ([A-Z]+) term.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 9 at 7:19

























            answered Nov 9 at 6:39









            Tim Biegeleisen

            212k1384132




            212k1384132











            • seems the code coloring does not handle inline ' that well :/ nice solution
              – Patrick Artner
              Nov 9 at 7:10










            • Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
              – whichen
              Nov 9 at 7:18










            • Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
              – tripleee
              Nov 9 at 7:21










            • @tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
              – Tim Biegeleisen
              Nov 9 at 7:25










            • If you really want to discuss this further, maybe post a question on Meta Stack Overflow
              – tripleee
              Nov 9 at 7:30
















            • seems the code coloring does not handle inline ' that well :/ nice solution
              – Patrick Artner
              Nov 9 at 7:10










            • Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
              – whichen
              Nov 9 at 7:18










            • Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
              – tripleee
              Nov 9 at 7:21










            • @tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
              – Tim Biegeleisen
              Nov 9 at 7:25










            • If you really want to discuss this further, maybe post a question on Meta Stack Overflow
              – tripleee
              Nov 9 at 7:30















            seems the code coloring does not handle inline ' that well :/ nice solution
            – Patrick Artner
            Nov 9 at 7:10




            seems the code coloring does not handle inline ' that well :/ nice solution
            – Patrick Artner
            Nov 9 at 7:10












            Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
            – whichen
            Nov 9 at 7:18




            Thanks for your quick reply, but like sentences. 3:"Oubre Jr. (WAS) makes contact to Aminu's (POR) head during rebounding." 4:"RAR shows Aminu (POR) wraps Porter Jr. (WAS) around the waist and dislodges him, affecting his ability to grab the rebound." 5:"McCollum (POR) has a hand momentarily on Beal's (WAS) arm but does not affect his FOM off ball." Still will miss "Oubre Jr" "Porter Jr" and "Beal". Could you help?
            – whichen
            Nov 9 at 7:18












            Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
            – tripleee
            Nov 9 at 7:21




            Adding additional requirements in comments to an individual answer. Just accept an answer and ask a new question if at this point you find that you need something else than you originally asked for.
            – tripleee
            Nov 9 at 7:21












            @tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
            – Tim Biegeleisen
            Nov 9 at 7:25




            @tripleee Are suggesting that it's not polite to generate a hideous trail of comments which don't really add any value to the original question? :-P
            – Tim Biegeleisen
            Nov 9 at 7:25












            If you really want to discuss this further, maybe post a question on Meta Stack Overflow
            – tripleee
            Nov 9 at 7:30




            If you really want to discuss this further, maybe post a question on Meta Stack Overflow
            – tripleee
            Nov 9 at 7:30










            up vote
            0
            down vote













            You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources



            import re

            suffs = ["Jr."] # append more to list

            rsu = r"(?:"+"|".join(suffs)+")? ?"

            # combine with suffixes
            regex = r"(w+ "+rsu+")(w3)"

            test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

            matches = re.finditer(regex, test_str, re.MULTILINE)

            names =
            for matchNum, match in enumerate(matches,1):
            for groupNum in range(0, len(match.groups())):
            names.extend(match.groups(groupNum))

            print(names)


            Output:



            ['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']


            This should work as long as you do not have Names with non-w in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.




            Explanation:




            • r"(?:"+"|".join(suffs)+")? ?" --> all items in the list suffs are strung together via | (OR) as non grouping (?:...) and made optional followed by optional space.


            • r"(w+ "+rsu+")(w3)" --> the regex looks for any word characters followed by optional suffs group we just build, followed by literal ( then three word characters followed by another literal )





            share|improve this answer


























              up vote
              0
              down vote













              You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources



              import re

              suffs = ["Jr."] # append more to list

              rsu = r"(?:"+"|".join(suffs)+")? ?"

              # combine with suffixes
              regex = r"(w+ "+rsu+")(w3)"

              test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

              matches = re.finditer(regex, test_str, re.MULTILINE)

              names =
              for matchNum, match in enumerate(matches,1):
              for groupNum in range(0, len(match.groups())):
              names.extend(match.groups(groupNum))

              print(names)


              Output:



              ['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']


              This should work as long as you do not have Names with non-w in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.




              Explanation:




              • r"(?:"+"|".join(suffs)+")? ?" --> all items in the list suffs are strung together via | (OR) as non grouping (?:...) and made optional followed by optional space.


              • r"(w+ "+rsu+")(w3)" --> the regex looks for any word characters followed by optional suffs group we just build, followed by literal ( then three word characters followed by another literal )





              share|improve this answer
























                up vote
                0
                down vote










                up vote
                0
                down vote









                You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources



                import re

                suffs = ["Jr."] # append more to list

                rsu = r"(?:"+"|".join(suffs)+")? ?"

                # combine with suffixes
                regex = r"(w+ "+rsu+")(w3)"

                test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

                matches = re.finditer(regex, test_str, re.MULTILINE)

                names =
                for matchNum, match in enumerate(matches,1):
                for groupNum in range(0, len(match.groups())):
                names.extend(match.groups(groupNum))

                print(names)


                Output:



                ['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']


                This should work as long as you do not have Names with non-w in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.




                Explanation:




                • r"(?:"+"|".join(suffs)+")? ?" --> all items in the list suffs are strung together via | (OR) as non grouping (?:...) and made optional followed by optional space.


                • r"(w+ "+rsu+")(w3)" --> the regex looks for any word characters followed by optional suffs group we just build, followed by literal ( then three word characters followed by another literal )





                share|improve this answer














                You need a pattern that you can match - for your sentence you cou try to match things before (XXX) and include a list of possible "suffixes" to include as well - you would need to extract them from your sources



                import re

                suffs = ["Jr."] # append more to list

                rsu = r"(?:"+"|".join(suffs)+")? ?"

                # combine with suffixes
                regex = r"(w+ "+rsu+")(w3)"

                test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

                matches = re.finditer(regex, test_str, re.MULTILINE)

                names =
                for matchNum, match in enumerate(matches,1):
                for groupNum in range(0, len(match.groups())):
                names.extend(match.groups(groupNum))

                print(names)


                Output:



                ['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']


                This should work as long as you do not have Names with non-w in them. If you need to adapt the regex, use https://regex101.com/r/pRr9ZU/1 as starting point.




                Explanation:




                • r"(?:"+"|".join(suffs)+")? ?" --> all items in the list suffs are strung together via | (OR) as non grouping (?:...) and made optional followed by optional space.


                • r"(w+ "+rsu+")(w3)" --> the regex looks for any word characters followed by optional suffs group we just build, followed by literal ( then three word characters followed by another literal )






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 9 at 7:08

























                answered Nov 9 at 6:52









                Patrick Artner

                19k51940




                19k51940



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53220866%2fregex-to-find-name-in-sentence%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

                    Edmonton

                    Crossroads (UK TV series)