Match pattern in a file and print the matching word (not the whole line) in second column

Match pattern in a file and print the matching word (not the whole line) in second column



I am trying to match the pattern "SHM" in a file containing like the below information and print the word matching the pattern.


LOCALZONE01 ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07



I have tried to use awk to print the 2nd column:


grep "SHM" <filename.txt> | awk -F" " 'print $2'

ASHM001002003VOL01



If I try to print column 1, it’s giving me the below output:


LOCALZONE01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07



Below is my desired output. How can I get it?


ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07






Technically, I could answer your question with printf '%sn' ASHM001002003VOL01 BSHM001002003VOL02 CSHM001002003VOL03 DSHM001002003VOL03_DUP ESHM001002003VOL04 FSHM001002003VOL05 GSHM001002003VOL06_ HSHM001002003VOL07, because you say “a file containing below information” and not “a file containing information like the below”; i.e., you make it sound like the data in the question is the data you want to process, and not just an example.  … (Cont’d)

– G-Man
Sep 14 '18 at 18:20


printf '%sn' ASHM001002003VOL01 BSHM001002003VOL02 CSHM001002003VOL03 DSHM001002003VOL03_DUP ESHM001002003VOL04 FSHM001002003VOL05 GSHM001002003VOL06_ HSHM001002003VOL07






(Cont’d) …  I’m not really serious, but I say this to raise the point that you will rarely get answers that are better than your question, and many answer-writers focus on the example data in the question, and not the question as a whole.  Please edit your question to clarify what properties your input will and will not have, and give examples that illustrate the possibilities.  … (Cont’d)

– G-Man
Sep 14 '18 at 18:20






(Cont’d) …  For example, will the last word on every line contain SHM?  (If so, why even bother stating that as a factor in the question?)  Will the first line always have two words?  If so, will the first word always be LOCALZONE01?  Will it always be 11 characters long?  Will it always be followed by 5 blanks?  Will the second word always start at the 17th character position?  … (Cont’d)

– G-Man
Sep 14 '18 at 18:21


SHM


LOCALZONE01






(Cont’d) …  Will every line after the first always have exactly one word?  Will it always start at the 17th character position?  Does the “second” (final) word have a maximum length? … … Are the whitespace characters spaces, tabs, or a combination? … … How do you define “word”?  For example, if your file has QSHM003.14159PIES, do you want to see that entire string, or just QSHM003 (because the . is a word separator, and so 14159PIES is a separate word)? … (Cont’d)

– G-Man
Sep 14 '18 at 18:21


QSHM003.14159PIES


QSHM003


.


14159PIES






(Cont’d) …  In particular, if any posted answer works for the sample data in your question, but not your real file, please edit your question to show the manner of input that the answer doesn’t handle.   (Do this in addition to leaving a comment on the answer.)   If an answer doesn’t work for your real data, that’s not a reason to ask a new question.

– G-Man
Sep 14 '18 at 18:21




6 Answers
6



If the output you want is always the last field in the file, try this


awk 'if ($NF ~ /SHM/) print $NF' _input_file_



Hope this helps.






more idiomatic awk is "condition action": awk '$NF ~ /SHM/ print $NF'

– glenn jackman
Sep 12 '18 at 20:01



awk '$NF ~ /SHM/ print $NF'



If you have GNU grep available,


grep -Eo '[[:alnum:]_]*SHM[[:alnum:]_]*' < filename.txt



If not, you could ask awk to loop through the fields of each line, looking for SHM:


awk ' for(i=1;i<=NF;i++) if ($i ~ /SHM/) print $i ' < filename.txt






I don't think that grep expression requires GNU grep, nor -E. (sorry, that was before I spotted the -o, but -E would not be needed, right?)

– Kusalananda
Sep 12 '18 at 17:40



grep


grep


-E


-o


-E






Correct, @Kusalananda, as-is. I suppose it could/should be tightened up to require 1-or-more alnum's after the "SHM" text, in which case it would need -E for clarity, if not functionality, as In GNU grep, there is no difference in available functionality between basic and extended syntaxes.

– Jeff Schaller
Sep 12 '18 at 17:44


-E


In GNU grep, there is no difference in available functionality between basic and extended syntaxes.






@jeff I used the awk version. Thanks. I am redirecting the output to another file where i am using these hostnames to capture df output. Surprisingly the redirected output seems to have an extra character "^M" in the redirected output. I filtered it out using sed i.e sed -e "s/^M//" filename but it seems like it is removing the letters starting with M as well. Any thoughts pls

– xrkr
Sep 13 '18 at 2:27


sed -e "s/^M//" filename






Got it . i used sed 's/r//g' to filter it out

– xrkr
Sep 13 '18 at 2:53


sed 's/r//g'






jeff, I want to print only the 2nd column in this case.

– xrkr
Sep 13 '18 at 3:00



Because the first column of your example data has no entries beginning with row #2 and going forward, you'll have to parse it as fixed-width columns. You can do this:


fixed-width


$awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'
ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07






I'd suggest a simplification: awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'

– Jeff Schaller
Sep 12 '18 at 18:38


awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'






@JeffSchaller. Your solutions are always elegant! I will update the answer ;-)

– user88036
Sep 12 '18 at 18:40






I used to chain grep awk and sed together, until I learned awk a bit better!

– Jeff Schaller
Sep 12 '18 at 18:41






@Jeff Schaller I can use this it is not always certain the field widths are in the same way mentioned above meaning the first word can have more than 20 characters in some files. If i use the above one, sometimes i am seeing the output as 04 SHM001002016VOL01

– xrkr
Sep 13 '18 at 2:51


04 SHM001002016VOL01






@xrkr. The command above can handle files contain fixed-width columns. If the width is not equal between the columns, then you would need to convert the file to fixed-width columns. After then, pipe the output to the previous command, something like: awk -F, 'printf("%20s %20s %15sn", $1, $2, $3)' <FILE.txt> | awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'

– user88036
Sep 13 '18 at 10:37



awk -F, 'printf("%20s %20s %15sn", $1, $2, $3)' <FILE.txt> | awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'



You could do this as follows using Perl:


Perl


perl -lne 'print for /w*SHMw*/g' input-file.txt

perl -lane 'print for grep /SHM/, @F' input-file.txt # assuming SHM fields r alphanumeric



Or, with the sed editor in a POSIX-compatible manner, assuming all lines atleast one SHM


sed


sed -ne '
s/[[:alnum:]_]*SHM[[:alnum:]_]*/
&
/;s/.*n(.*n)/1/;P;/n$/!D
' input.txt



Output:


ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_


awk 'NR==1 print $2' filename && awk 'NR>1' filename | sed 's/[[:space:]]*//g'



Output:


ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07



That prints the second column on the first line, the following lines, and then removes whitespace to fix the formatting and return the output you want.






you don't need to call awk multiple times: awk 'NR == 1 print $2 NR > 1 print $1'. This can even be collapsed, but at the expense of readability: awk 'print $(NR == 1 ? 2 : 1)'

– glenn jackman
Sep 12 '18 at 20:00



awk 'NR == 1 print $2 NR > 1 print $1'


awk 'print $(NR == 1 ? 2 : 1)'






@glennjackman True, but it works just the same and it's roughly the same length with the removal of the whitespace.

– Nasir Riley
Sep 12 '18 at 20:04






I guess it's the principle of calling one program versus calling 3 to achieve the same result.

– glenn jackman
Sep 12 '18 at 20:09






@glennjackman I understand, but in his case, it really doesn't affect anything. If it were a script that was doing more work then I, too, would prefer your solution because it wouldn't call as many programs and it would be less tedious.

– Nasir Riley
Sep 12 '18 at 20:26






By the way, the downvote here doesn't make any sense. A downvote is needed when the answer just doesn't work. Mine does work. it may require more than some of the other answers but it gives the desired output and doesn't make things any worse.

– Nasir Riley
Sep 14 '18 at 21:23



awk 'head=substr($0,1,16);mypat=substr($0,17,23);if (mypat~/SHM/). print mypat' filename



Thanks for contributing an answer to Unix & Linux Stack Exchange!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)