Match pattern in a file and print the matching word (not the whole line) in second column
Match pattern in a file and print the matching word (not the whole line) in second column
I am trying to match the pattern "SHM" in a file containing like the below information and print the word matching the pattern.
LOCALZONE01 ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07
I have tried to use awk to print the 2nd column:
grep "SHM" <filename.txt> | awk -F" " 'print $2'
ASHM001002003VOL01
If I try to print column 1, it’s giving me the below output:
LOCALZONE01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07
Below is my desired output. How can I get it?
ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07
printf '%sn' ASHM001002003VOL01 BSHM001002003VOL02 CSHM001002003VOL03 DSHM001002003VOL03_DUP ESHM001002003VOL04 FSHM001002003VOL05 GSHM001002003VOL06_ HSHM001002003VOL07
(Cont’d) … I’m not really serious, but I say this to raise the point that you will rarely get answers that are better than your question, and many answer-writers focus on the example data in the question, and not the question as a whole. Please edit your question to clarify what properties your input will and will not have, and give examples that illustrate the possibilities. … (Cont’d)
– G-Man
Sep 14 '18 at 18:20
(Cont’d) … For example, will the last word on every line contain
SHM
? (If so, why even bother stating that as a factor in the question?) Will the first line always have two words? If so, will the first word always be LOCALZONE01
? Will it always be 11 characters long? Will it always be followed by 5 blanks? Will the second word always start at the 17th character position? … (Cont’d)– G-Man
Sep 14 '18 at 18:21
SHM
LOCALZONE01
(Cont’d) … Will every line after the first always have exactly one word? Will it always start at the 17th character position? Does the “second” (final) word have a maximum length? … … Are the whitespace characters spaces, tabs, or a combination? … … How do you define “word”? For example, if your file has
QSHM003.14159PIES
, do you want to see that entire string, or just QSHM003
(because the .
is a word separator, and so 14159PIES
is a separate word)? … (Cont’d)– G-Man
Sep 14 '18 at 18:21
QSHM003.14159PIES
QSHM003
.
14159PIES
(Cont’d) … In particular, if any posted answer works for the sample data in your question, but not your real file, please edit your question to show the manner of input that the answer doesn’t handle. (Do this in addition to leaving a comment on the answer.) If an answer doesn’t work for your real data, that’s not a reason to ask a new question.
– G-Man
Sep 14 '18 at 18:21
6 Answers
6
If the output you want is always the last field in the file, try this
awk 'if ($NF ~ /SHM/) print $NF' _input_file_
Hope this helps.
more idiomatic awk is "condition action":
awk '$NF ~ /SHM/ print $NF'
– glenn jackman
Sep 12 '18 at 20:01
awk '$NF ~ /SHM/ print $NF'
If you have GNU grep available,
grep -Eo '[[:alnum:]_]*SHM[[:alnum:]_]*' < filename.txt
If not, you could ask awk to loop through the fields of each line, looking for SHM:
awk ' for(i=1;i<=NF;i++) if ($i ~ /SHM/) print $i ' < filename.txt
I don't think that
grep
expression requires GNU grep
, nor -E
. (sorry, that was before I spotted the -o
, but -E
would not be needed, right?)– Kusalananda
Sep 12 '18 at 17:40
grep
grep
-E
-o
-E
Correct, @Kusalananda, as-is. I suppose it could/should be tightened up to require 1-or-more alnum's after the "SHM" text, in which case it would need
-E
for clarity, if not functionality, as In GNU grep, there is no difference in available functionality between basic and extended syntaxes.
– Jeff Schaller
Sep 12 '18 at 17:44
-E
In GNU grep, there is no difference in available functionality between basic and extended syntaxes.
@jeff I used the awk version. Thanks. I am redirecting the output to another file where i am using these hostnames to capture df output. Surprisingly the redirected output seems to have an extra character "^M" in the redirected output. I filtered it out using sed i.e
sed -e "s/^M//" filename
but it seems like it is removing the letters starting with M as well. Any thoughts pls– xrkr
Sep 13 '18 at 2:27
sed -e "s/^M//" filename
Got it . i used
sed 's/r//g'
to filter it out– xrkr
Sep 13 '18 at 2:53
sed 's/r//g'
jeff, I want to print only the 2nd column in this case.
– xrkr
Sep 13 '18 at 3:00
Because the first column of your example data has no entries beginning with row #2 and going forward, you'll have to parse it as fixed-width
columns. You can do this:
fixed-width
$awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'
ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07
I'd suggest a simplification:
awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'
– Jeff Schaller
Sep 12 '18 at 18:38
awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'
@JeffSchaller. Your solutions are always elegant! I will update the answer ;-)
– user88036
Sep 12 '18 at 18:40
I used to chain grep awk and sed together, until I learned awk a bit better!
– Jeff Schaller
Sep 12 '18 at 18:41
@Jeff Schaller I can use this it is not always certain the field widths are in the same way mentioned above meaning the first word can have more than 20 characters in some files. If i use the above one, sometimes i am seeing the output as
04 SHM001002016VOL01
– xrkr
Sep 13 '18 at 2:51
04 SHM001002016VOL01
@xrkr. The command above can handle files contain fixed-width columns. If the width is not equal between the columns, then you would need to convert the file to fixed-width columns. After then, pipe the output to the previous command, something like:
awk -F, 'printf("%20s %20s %15sn", $1, $2, $3)' <FILE.txt> | awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'
– user88036
Sep 13 '18 at 10:37
awk -F, 'printf("%20s %20s %15sn", $1, $2, $3)' <FILE.txt> | awk 'BEGIN FIELDWIDTHS = "16 40" /SHM/ print $2'
You could do this as follows using Perl
:
Perl
perl -lne 'print for /w*SHMw*/g' input-file.txt
perl -lane 'print for grep /SHM/, @F' input-file.txt # assuming SHM fields r alphanumeric
Or, with the sed
editor in a POSIX-compatible manner, assuming all lines atleast one SHM
sed
sed -ne '
s/[[:alnum:]_]*SHM[[:alnum:]_]*/
&
/;s/.*n(.*n)/1/;P;/n$/!D
' input.txt
Output:
ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
awk 'NR==1 print $2' filename && awk 'NR>1' filename | sed 's/[[:space:]]*//g'
Output:
ASHM001002003VOL01
BSHM001002003VOL02
CSHM001002003VOL03
DSHM001002003VOL03_DUP
ESHM001002003VOL04
FSHM001002003VOL05
GSHM001002003VOL06_
HSHM001002003VOL07
That prints the second column on the first line, the following lines, and then removes whitespace to fix the formatting and return the output you want.
you don't need to call awk multiple times:
awk 'NR == 1 print $2 NR > 1 print $1'
. This can even be collapsed, but at the expense of readability: awk 'print $(NR == 1 ? 2 : 1)'
– glenn jackman
Sep 12 '18 at 20:00
awk 'NR == 1 print $2 NR > 1 print $1'
awk 'print $(NR == 1 ? 2 : 1)'
@glennjackman True, but it works just the same and it's roughly the same length with the removal of the whitespace.
– Nasir Riley
Sep 12 '18 at 20:04
I guess it's the principle of calling one program versus calling 3 to achieve the same result.
– glenn jackman
Sep 12 '18 at 20:09
@glennjackman I understand, but in his case, it really doesn't affect anything. If it were a script that was doing more work then I, too, would prefer your solution because it wouldn't call as many programs and it would be less tedious.
– Nasir Riley
Sep 12 '18 at 20:26
By the way, the downvote here doesn't make any sense. A downvote is needed when the answer just doesn't work. Mine does work. it may require more than some of the other answers but it gives the desired output and doesn't make things any worse.
– Nasir Riley
Sep 14 '18 at 21:23
awk 'head=substr($0,1,16);mypat=substr($0,17,23);if (mypat~/SHM/). print mypat' filename
Thanks for contributing an answer to Unix & Linux Stack Exchange!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Technically, I could answer your question with
printf '%sn' ASHM001002003VOL01 BSHM001002003VOL02 CSHM001002003VOL03 DSHM001002003VOL03_DUP ESHM001002003VOL04 FSHM001002003VOL05 GSHM001002003VOL06_ HSHM001002003VOL07
, because you say “a file containing below information” and not “a file containing information like the below”; i.e., you make it sound like the data in the question is the data you want to process, and not just an example. … (Cont’d)– G-Man
Sep 14 '18 at 18:20