sed - Addressing using two strings
sed - Addressing using two strings
I am picking up sed. I am having a trouble understanding how the line addressing in sed works when a pattern is used to specify line address.
I have a sample text file named emp.lst
with the following contents:
emp.lst
2233|a.k. shukla |g.m. |sales |12/12/52|6000
9876|jai sharma |director |production|12/03/50|7000
5678|sumit chakrobarty|d.g.m. |marketing |19/04/43|6000
2365|barun sengupta |director |personnel |11/05/47|7800
5423|n.k. gupta |chairman |admin |30/08/56|5400
1006|chanchal singhvi |director |sales |03/09/38|6700
6213|karuna ganguly |g.m. |accounts |05/06/62|6300
1265|s.n. dasgupta |manager |sales |12/09/63|5600
4290|jayant Choudhury |executive|production|07/09/50|6000
2476|anil aggarwal |manager |sales |01/05/59|5000
6521|lalit chowdury |director |marketing |26/09/45|8200
3212|shyam saksena |d.g.m. |accounts |12/12/55|6000
3564|sudhir Agarwal |executive|personnel |06/07/47|7500
2345|j.b. saxena |g.m. |marketing |12/03/45|8000
0110|v.k. agrawal |g.m. |marketing |31/12/40|9000
As I understand, line address can be specified either in the form of line number(s) of a pattern to match as text or regular expression.
I understand how sed -n '1p' emp.lst
and sed -n '1,2p' emp.lst
print line 1 and line 1 & 2 respectively without echoing all lines (-n
).
sed -n '1p' emp.lst
sed -n '1,2p' emp.lst
-n
I also understand and appreciate how sed -n '/director/p' emp.lst
match all the lines containing the string director
, and outputs:
sed -n '/director/p' emp.lst
director
9876|jai sharma |director |production|12/03/50|7000
2365|barun sengupta |director |personnel |11/05/47|7800
1006|chanchal singhvi |director |sales |03/09/38|6700
6521|lalit chowdury |director |marketing |26/09/45|8200
Now, when I specify multiple patters as sed -n '/director/,/executive/p' emp.lst
, the output shown is:
sed -n '/director/,/executive/p' emp.lst
9876|jai sharma |director |production|12/03/50|7000
5678|sumit chakrobarty|d.g.m. |marketing |19/04/43|6000
2365|barun sengupta |director |personnel |11/05/47|7800
5423|n.k. gupta |chairman |admin |30/08/56|5400
1006|chanchal singhvi |director |sales |03/09/38|6700
6213|karuna ganguly |g.m. |accounts |05/06/62|6300
1265|s.n. dasgupta |manager |sales |12/09/63|5600
4290|jayant Choudhury |executive|production|07/09/50|6000
6521|lalit chowdury |director |marketing |26/09/45|8200
3212|shyam saksena |d.g.m. |accounts |12/12/55|6000
3564|sudhir Agarwal |executive|personnel |06/07/47|7500
What does this output represent?
Is it all lines containing the pattern director
and executive
? Clearly no, as there are some lines not containing either one of the patterns.
director
executive
Is it all lines starting with first one matching either one of the patters till the last one matching either one of the patterns? No again, as if I go by that logic, one line (2476|anil aggarwal |manager |sales |01/05/59|5000
) is missing from the output.
2476|anil aggarwal |manager |sales |01/05/59|5000
I have not been able to clearly deduce how the command sed -n '/director/,/executive/p' emp.lst
is working? I have gone through the sed man page and have yet been unable to deduce.
sed -n '/director/,/executive/p' emp.lst
How do I approach understanding the working?
For context, I am running sed
command built into macOS High Sierra 10.13.6 running in Bash version 4.4.
sed
Note: I am a sed
newbie. Please edit any mistake or incorrect terminology that I may have used.
sed
2 Answers
2
https://www.gnu.org/software/sed/manual/sed.html#Range-Addresses:
An address range can be specified by specifying two addresses separated by a comma (,
). An address range matches lines starting from where the first address matches, and continues until the second address matches (inclusively):
,
$ seq 10 | sed -n '4,6p'
4
5
6
Thus 1,2p
does not mean "print lines 1 and 2" but "print all lines between line 1 and line 2". The difference becomes more clear with e.g. 3,7p
, which will not just print line 3 and 7, but lines 3, 4, 5, 6, 7.
1,2p
3,7p
/director/,/executive/p
prints all lines between a starting line (matching director
) and an ending line (matching executive
).
/director/,/executive/p
director
executive
In your case, you have two matching ranges (each starting with director
and ending with executive
):
director
executive
director
executive
I have a follow up query and felt it makes more sense to ask here than to write an altogether separate question. Is there a way to select just the first/last/any intermediate one of the many matching blocks?
– Nimesh Neema
Sep 10 '18 at 10:47
From man sed
:
man sed
0,addr2
Start out in "matched first address" state, until addr2 is found.
This is similar to 1,addr2, except that if addr2 matches the very
first line of input the 0,addr2 form will be at the end of its range,
whereas the 1,addr2 form will still be at the beginning of its range.
This works only when addr2 is a regular expression.
Not 100% sure if this is the manual section that applies but it looks like you have 2 blocks from "director" to "executive" in your output above.
There happen to be some other "director" lines between the first "director" and first succeeding "executive".
That section does not apply. There is no
0,
in OP's code.– melpomene
Sep 9 '18 at 23:31
0,
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks a lot. Your explanation was very helpful and I can now clearly understand the working. I experimented with various placement(s) of
director
andexecutive
within the file. man pages although logically precise and concise are generally hard to understand for newbie.– Nimesh Neema
Sep 9 '18 at 23:39