grep doesn't output until EOF if piped through cat
grep doesn't output until EOF if piped through cat
Given this minimal example
( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; )
it outputs LINE 1
and then, after one second, outputs LINE 2
, as expected.
LINE 1
LINE 2
If we pipe this to grep LINE
grep LINE
( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | grep LINE
the behavior is the same as in the previous case, as expected.
If, alternatively, we pipe this to cat
cat
( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | cat
the behavior is again the same, as expected.
However, if we pipe to grep LINE
, and then to cat
,
grep LINE
cat
( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | grep LINE | cat
there is no output until one second passes, and both lines appear on the output immediately, which I did not expect.
Why is this happening and how can I make the last version to behave in the same way as the first three commands?
cat
cat
@DouglasHeld When called without arguments,
cat
simply reads stdin
and outputs into stdout
. Of course, I came up with this question with a lot of complex stuff in place of echo
and cat
, but these turned out to be irrelevant, since the problem shows up with much simpler examples.– lisyarus
Sep 5 '18 at 21:11
cat
stdin
stdout
echo
cat
@DouglasHeld: Piping to cat is often useful to force stdout to not be a terminal. For instance, this is an easy way to get many commands to not use colorized output.
– wchargin
Sep 7 '18 at 5:01
I swear this is a dupliciate of another question on Stack Overflow!
– iBug
Sep 7 '18 at 6:46
@wchargin thank you very much, you have taught me something new about posix that I never knew.
– Douglas Held
Oct 12 '18 at 22:01
3 Answers
3
When (at least GNU) grep
’s output is not a terminal, it buffers its output, which is what causes the behaviour you’re seeing. You can disable this either using GNU grep
’s --line-buffered
option:
grep
grep
--line-buffered
( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | grep --line-buffered LINE | cat
or the stdbuf
utility:
stdbuf
( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | stdbuf -oL grep LINE | cat
Turn off buffering in pipe has more on this topic.
Simplified explanation
Like many utilities, this not being something peculiar to one program, grep
varies its standard output between being line buffered and fully buffered. In the former case, the C library buffers output data in memory until either the buffer holding those data is filled or a linefeed character is added to it (or the program ends cleanly), whereupon it calls write()
to actually write the buffer contents. In the latter case, only the in-memory buffer becoming full (or the program ending cleanly) triggers the write()
.
grep
write()
write()
More detailed explanation
This is the well-known, but slightly wrong, explanation. In fact, standard output is not line buffered but smart buffered in the GNU C library and BSD C library. Standard output is also flushed when reading standard input exhausts its in-memory buffer (of pre-read input) and the C library has to call read()
to fetch some more input and it is reading the beginning of a new line. (One reason for this is to prevent deadlock when another program connects itself to both ends of a filter and expects to be able to operate line-by-line, alternating between writing to the filter and reading from it; like "coprocesses" in GNU awk
for example.)
read()
awk
C library influence
grep
and the other utilities do this — or, more strictly, the C libraries that they use do this, because this is a defined feature of programming in the C language — based upon what they detect their standard output to be. If (and only if) it is not an interactive device, they choose full buffering, otherwise they choose smart buffering. A pipe is considered to be not an interactive device, because the definition of being an interactive device, at least in the world of Unix and Linux, is essentially the isatty()
call returning true for the relevant file descriptor.
grep
isatty()
Workarounds to disable full buffering
Some utilities like grep
have idiosyncratic options such as --line-buffered
that change this decision, which as you can see is mis-named. But a vanishingly small fraction of the filter programs that one could use actually have such an option.
grep
--line-buffered
More generally, one can use tools that dig into the specific internals of the C library and change its decision making (which have security problems if the program to be altered is set-UID, and are also specific to particular C libraries, and indeed are specific to programs written in or layered on top of the C language), or tools such as ptybandage
that do not change the internals of the program but simply interpose a pseudo-terminal as standard output so that the decision comes out as "interactive", to affect this.
ptybandage
Further reading
Underrated answer. Thanks for the info!
– Délisson Junio
Sep 5 '18 at 17:37
If the phrase "line buffered" is a misnomer, then it's not really the fault of
grep
, but of the underlying library calls, setbuf
/setvbuf
. I don't know of a reliable online reference for the C standard, but e.g. the Linux and FreeBSD man pages along with the POSIX description of setvbuf
call it "line buffered". Even the symbolic constant for it is _IOLBF
.– ilkkachu
Sep 5 '18 at 21:19
grep
setbuf
setvbuf
setvbuf
_IOLBF
Well now you've learned better. This buffering strategy is described in the GNU C library doco, albeit briefly. Laurent Bercot is more forthright on the matter. I have mentioned it too.
– JdeBP
Sep 6 '18 at 0:35
@ilkkachu The C standard does indeed use "line buffered". Per 7.21.3 Files, paragraph 3: "When a stream is unbuffered, ... When a stream is fully buffered, ... When a stream is line buffered, characters are intended to be transmitted to or from the host environment as a block when a new-line character is encountered. ..." In fact, the C Standard uses the exact phrase "line buffered" five times. So it's not a misnomer.
– Andrew Henle
Sep 6 '18 at 14:41
Furthermore, the approach described here as "smart buffering", as I understand it, seems to be just what the C standard describes as "line buffering". Specifically, in addition to flushing the buffer at newlines, "When a stream is line buffered, characters are intended to be transmitted to or from the host environment as a block when [...] input is requested on an unbuffered stream, or when input is requested on a line buffered stream that requires the transmission of characters from the host environment." So this is not a GNU or BSD quirk, but rather what the language calls for.
– John Bollinger
Sep 6 '18 at 22:44
Use
grep --line-buffered
to make grep not buffer more than one line at a time.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
cat
concatenates files. What are you trying to do by piping intocat
?– Douglas Held
Sep 5 '18 at 20:09