Table of Contents
Introduction
*-nix terminal programs are designed to be very simple. This affords the system a type of modularity not possible if the programs were more complex. The idea of a program is to do one job well with few options for alteration and provide consistently formatted output. Knowing when to use which programs forms the basic building blocks of any Linux user. Knowing how to link these building blocks together can help you accomplish more advanced tasks and create intelligent workflows. The way we link these blocks (or program functionalities) together is by output redirection, or using the output from one program as the input of another.
Devices, Streams, and Buffers (oh my)
Before we go further, it is important to nail down some terminology. In output redirection, you will hear much talk of streams, buffers, and devices. To grossly oversimplify, a device is simply a physical object which helps either read output from a program (i.e. a screen or printer) or handle input to that program (i.e. a keyboard, mouse, or touchscreen). Devices are often represented by special files in Linux systems. You can take a look at those by taking a look inside the /dev
directory on your system:
❯ ls /dev ─╯
acpi_thermal_rel fuse loop17 null shm tty22 tty42 tty62 ttyS24 vboxdrv vcsu2
ashmem hidraw0 loop18 nvidia0 snapshot tty23 tty43 tty63 ttyS25 vboxdrvu vcsu3
autofs hidraw1 loop19 nvidiactl snd tty24 tty44 tty7 ttyS26 vboxnetctl vcsu4
block hidraw2 loop2 nvme0 **stderr** tty25 tty45 tty8 ttyS27 vboxusb vcsu5
btrfs-control hidraw3 loop20 nvme0n1 **stdin** tty26 tty46 tty9 ttyS28 vcs vcsu6
bus hidraw4 loop21 nvme0n1p1 **stdout** tty27 tty47 ttyS0 ttyS29 vcs1 vcsu63
cec0 hidraw5 loop22 nvme0n1p2 tty tty28 tty48 ttyS1 ttyS3 vcs2 vfio
char hpet loop23 nvme0n1p3 tty0 tty29 tty49 ttyS10 ttyS30 vcs3 vga_arbiter
console hugepages loop3 nvme0n1p4 tty1 tty3 tty5 ttyS11 ttyS31 vcs4 vhci
core input loop4 nvme0n1p5 tty10 tty30 tty50 ttyS12 ttyS4 vcs5 vhost-net
cpu kmsg loop5 nvme0n1p6 tty11 tty31 tty51 ttyS13 ttyS5 vcs6 vhost-vsock
cpu_dma_latency kvm loop6 nvme0n1p8 tty12 tty32 tty52 ttyS14 ttyS6 vcs63 video0
cuse log loop7 nvram tty13 tty33 tty53 ttyS15 ttyS7 vcsa video1
disk loop0 loop8 port tty14 tty34 tty54 ttyS16 ttyS8 vcsa1 watchdog
dma_heap loop1 loop9 ppp tty15 tty35 tty55 ttyS17 ttyS9 vcsa2 watchdog0
dri loop10 loop-control psaux tty16 tty36 tty56 ttyS18 udmabuf vcsa3 wmi
drm_dp_aux0 loop11 mapper ptmx tty17 tty37 tty57 ttyS19 uhid vcsa4 zero
drm_dp_aux1 loop12 media0 pts tty18 tty38 tty58 ttyS2 uinput vcsa5
drm_dp_aux2 loop13 mei0 random tty19 tty39 tty59 ttyS20 urandom vcsa6
fb0 loop14 mem rfkill tty2 tty4 tty6 ttyS21 usb vcsa63
fd loop15 mqueue rtc tty20 tty40 tty60 ttyS22 userio vcsu
full loop16 net rtc0 tty21 tty41 tty61 ttyS23 v4l vcsu1
You’ll notice many devices have names like cpu
or disk
which correspond to actual physical objects inside or connected to the computer. Conversely, there are also devices like random
and tty
don’t have a strong physical correlation but rather refer to specific concepts. Those devices represent virtual objects (i.e. all the tty
s are just virtual console screens and nvme0n*
s are disk partitions). Regardless of these discrepancies, the devices that are of most interest for this article are the stdin
, stdout
, and stderr
buffers.
Streams is a concept used to describe the flow of data through a device. For example, when I run several commands in the terminal:
❯ cat example.txt
banana
❯ echo pumpernickle
pumpernickle
all of the output is displayed on the same terminal screen. That means that the device, regardless of which program wrote the data, still had the output at one point in its stream. The same idea applies when considering input. If I looked at the text that my keyboard typed constantly as one flowing stream, I would be able to see different character strokes regardless of which program I was interacting with.
Buffers are like streams due to the fact that they get their information to or from devices. However, instead of the data flowing freely and ephemerally to and from the device, buffers save that info in memory so that chunks of data can be sent or read at the appropriate times. When we deal with the standard buffers, we are normally accessing or transmitting data from cached sources to make input or output from programs more manageable.
The Holy Trinity of Buffers
Before we can understand how to pass output between programs we need to know how the Linux operating system classifies input and output. Anyone who has programmed in C before should be familiar with the holy trinity of buffers: standard out, standard in, and standard error.
stdout
stdout
is where all the normal, non-error output of a program should go. Whether you are writing your first hello world program or formatting fancy tables which print to the console, you are most likely printing those to stdout. For those familiar with C, you’ll recall that to write something to a file, you need to provide a file descriptor which normally returns an int
associated with that file. By default, 0-2 are reserved on *-nix systems because they are assigned to the three default I/O buffers, 1 being the file descriptor number for stdout
. To see this explicit association on your Linux box, enter:
❯ ls -la /dev/stdout
lrwxrwxrwx 1 root root 15 Jun 21 09:36 /dev/**stdout** -> /proc/self/fd/**1**
Knowing the file descriptor number will provide some interesting shorthand for some of the advanced redirection commands we’ll do later.
stdin
stdin
is the buffer where data is sent to a program. You can think of your keyboard being the ultimate stdin
device because it is how you ultimately enter in information into the computer. Apart from your physical keyboard providing input to programs, you will see shortly that there are other ways to pass input into a program. The file descriptor for stdin
is 0 as seen by executing:
❯ ls -la /dev/stdin
lrwxrwxrwx 1 root root 15 Jun 21 09:36 /dev/**stdin** -> /proc/self/fd/**0**
stderr
stderr
is a special buffer where all error messages of a program should go. It is worth mentioning that there are lots of programmers who do not go out of their way to use this buffer when they are printing errors. stderr
does not show up by default on a terminal screen unless you specify it to do so when executing the program. You have to use different functions/parameters when you are specifying that you are writing to stderr
instead of stdout
(i.e. in C using perror
instead of printf
or in Python including the file=sys.stderr
option). But what is the utility of stderr
if everything that I want to communicate to the user can go to stdout
? Having distinct buffers for distinct types of output can help suppress unnecessary output in special cases. For example, if I decided to put all non-critical errors in stderr
, I could do some stream manipulation (see section below) and have a cleaner output at runtime. The file descriptor for stderr
is 2 as seen by:
❯ ls -la /dev/stderr
lrwxrwxrwx 1 root root 15 Jun 21 09:36 /dev/**stderr** -> /proc/self/fd/**2**
Redirection Operators
Up to this point we have assumed the input for a program comes from your keyboard (i.e. you providing the values for arguments) and that the output is printed to the terminal screen. By using redirection operators, we can plug programs into each other (i.e. feeding ones output to another’s input).
Read/Write
The first helpful set of operators to know are <
and >
. Think of them as arrows that indicate where the data will flow. When we use these operators, we always do it between a command and a file. Failing to use a file (i.e. something that returns a string, like echo
) will cause the operator to either write a new file or complain that the file to be read in doesn’t exist.
Write (>)
For example, if I wanted to create a file that contained the list of contents of my current directory I could do:
❯ ls > folder_contents.txt
But wait a second, I didn’t have a file called folder_contents.txt
before waiting for that input! Why doesn’t this crash and tell me that there is no such file? With the output redirector, if the target file does not exist, then the shell will create one! To verify that the folder_contents.txt
actually has the output of ls
:
❯ cat folder_contents.txt
bkp_sss
blog
christopolise.github.io
clgui
....
But be careful! If you run another write >
operation on this file, it will overwrite the contents:
❯ echo "Ooops :P" > folder_contents.txt
❯ cat folder_contents.txt
Ooops :P # Notice that there are no file directory contents anymore :(
If you want to add more to the existing contents of the file, you will need to use the append operator (see more below).
Read (<)
As you would expect, the read operator functions syntactically in the same way as write does. Reading in also requires a command + <
+ a file(s):
❯ cat < .bashrc
#
# ~/.bashrc
#
[[ $- != *i* ]] && return
colors() {
local fgc bgc vals seq0
printf "Color escapes are %s\n" '\e[${value};...;${value}m'
....
Oftentimes if you are trying pass a file into a program that takes a file as an input by default, the <
operator is not necessary:
❯ cat .bashrc
#
# ~/.bashrc
#
[[ $- != *i* ]] && return
colors() {
local fgc bgc vals seq0
printf "Color escapes are %s\n" '\e[${value};...;${value}m'
Append (»)
Sometimes you will want to have the option to append data to a file instead of overwriting everytime. An example of this could be to write a script with a for loop that needs to update a log file. Instead of using the write operator, you would use the append >>
operator instead:
❯ echo "hello" > example.txt
❯ echo "world" > example.txt
❯ cat example.txt
hello
world
heredocs/strings
All of this creation of files with commands can leave your directories a little messy. Creating files for each command you execute can be wasteful and a nuisance. How can we take advantage of this interprocess communication if nothing is formatted right? Especially if the output of the previous command is returned as a string instead of a file!
heredocs («)
A here-document, or heredoc for short, is a special way to format a string output to act as a document so that it can be compatible with commands that only accept documents. It is a powerful tool, even if the formatting is a little wonky. To perform a heredoc operation for a command we use the <<
operator. This will be followed by some sort of text delimiter or a special word or value that will mark that we have reached the end of the intended input. In the example below I use the word DELIMITER
to be our delimiter (for redundancy’s sake):
❯ cat "I can't cat a string" # This fails because cat only accepts files
cat: "I can't cat a string": No such file or directory
❯ cat < echo "The input operator won't work either"
zsh: no such file or directory: echo
❯ cat << DELIMITER
heredoc> This should work!
heredoc> It will cat any text that I type
heredoc> Until I type the delimiter word above
heredoc> DELIMITER
This should work!
It will cat any text that I type
Until I type the delimiter word above
here strings («<)
You’ll notice that heredocs support multiline input like a normal file would. However this becomes inefficient and annoying if all you are trying to do is input a single word or string. Here strings are exactly like heredocs, but with an abbreviated syntax that has no delimiters:
❯ cat <<< "Hello world"
Hello world
For more information about how these operators are used day-to-day, here is a useful article that will help you understand the full impact and utility of heredocs/here strings:
Manipulating Streams
By default the information that is returned by a program to the terminal is the information put into the stdout
and stderr
streams by the program. Why do I need two streams if both print by default? This is done to keep separate event types separate when spitting everything out to the terminal (i.e. separating normal output from debugging statements).
Let’s take the following python script that prints a string to both stdout
and stderr
:
import sys
print("I'm printing on stdout!") # This string goes to stdout
print("I'm printing on stderr!", file=sys.stderr) # This string goes to stderr
When we run the example with no stream merging modifiers, we get the following output:
❯ python3 buffers.py
I'm printing on stdout!
I'm printing on stderr!
What happens if I only want to see the errors of the program instead of the normal output so I can quickly diagnose errors? This is a perfect opportunity to tell all the output to go to the garbage! In Linux, whenever you want to send something to the garbage (whether it be a file or just output from a program) you can send it to /dev/null
. If we only want the info on stderr
we can do the following:
❯ python3 buffers.py 1> /dev/null
I'm printing on stderr!
By putting a 1>
in front of the garbage location, we forced all the data that would print on file descriptor 1
, or stdout
, to the garbage location. If we wanted just the normal program output and no errors (wouldn’t that be nice), we can send the info on file descriptor 2
to /dev/null
instead:
❯ python3 buffers.py 2> /dev/null
I'm printing on stdout!
But wait a minute, I’ve seen the >
operator before. Isn’t that just writing something to a file? What’s with the numbers? Why didn’t I have to put one there before? It turns out that when you don’t include a number, the inferred buffer that the terminal uses is stdout
or 1
. This means that if you used 1>
or >
it would be the exact same thing!
❯ python3 buffers.py 1> /dev/null
I'm printing on stderr!
❯ python3 buffers.py > /dev/null
I'm printing on stderr!
Likewise, using the I/O buffer file descriptors will work on any normal write that you do, making logging only errors a very easy thing to do:
❯ python3 buffers.py 2> buffer.log
I'm printing on stdout!
❯ cat buffer.log
I'm printing on stderr!
Notice how only the stderr
output went to the log file but the info on stdout
still printed in the terminal. That is because we did not give 1
any special instructions. What if we wanted both streams to go to the same destination? We can combine them by sending the output to a file like we normally do and then pushing the contents of stderr
into stdout
like so:
❯ python3 buffers.py > buffer.log 2>&1
❯ cat buffer.log
I'm printing on stderr!
I'm printing on stdout!
2>&1
writes the contents of stderr
into the location of stdout
(In this case, think of the &
symbol like a pointer reference in C. If you hate C, than think of it as a goofy looking A that stands for “at” i.e. “at stdout
”). By pushing stderr
into stdout
we have combined them into one write buffer that will end up in our buffer.log
file, providing an example of possibly the only time it is ever okay to cross streams.
Interprogram Operations
The last, but probably most important redirection that we’ll talk about in this article is passing information from one program directly to another.
Pipes (|)
A pipe is an operator which takes the stdout
of one program and put it into the stdin
of another program. This type of flow is very handy because instead of needing interim files which store output to be used as input for another file, the input/output lines are connected directly:
❯ # Way 1:
❯ ls > dir_contents.txt # Store contents of current directory in file
❯ cat dir_contents.txt
3DPrinting
Android
AndroidStudioProjects
ans.txt
Arduino
....
❯ rm dir_contents.txt # Remove to keep the folder clean
❯
❯ # Way 2 (with pipe):
❯ ls | cat
3DPrinting
Android
AndroidStudioProjects
ans.txt
Arduino
....
NOTE: Not all commands that take in some sort of text as an input will work with pipes. An example of this would be echo
which prints its arguments and not input from stdin
:
❯ ls | echo # Returns nothing because echo has no arguments added to it
❯ ls | echo banana # Just prints banana because that was its only argument, ignores stdout from ls
banana
mkfifo
mkfifo
is an interesting addition to the interprocess tool chain. Much like a pipe |
, a file created from mkfifo
acts like the tangible connection between the input and output of these two processes. However, it differs from a pipe in the sense that (1) it is named and (2) it can pipe information across terminal sessions.
To make a mkfifo
file (or named pipelines) as they’re officially known, you need to do the following:
❯ mkfifo my_pipe_file
You can verify that this is a pipeline instead of a normal file by checking it with:
❯ ls -l my_pipe_file
prw-r--r-- 1 christopolise christopolise 0 Jun 23 10:43 my_pipe_file
The p
at the beginning of the the permissions block indicates that this file is a p
ipe.
To use your brand new named pipeline across terminal sessions, you can write to it in one window:
❯ ls > my_pipe_file
Notice that after hitting enter that the file doesn’t finish its execution and it seems like it is hanging. This is because something is in the pipe and it is waiting to come out the other side. In another terminal window/tab you can retrieve the info from the pipe with a command that will read from it:
❯ cat < my_pipe_file
3DPrinting
Android
AndroidStudioProjects
ans.txt
Arduino
....
Once the information in the pipe is read into the latter program, the program will finish in both terminal windows. This functionality also works in the reverse direction (i.e. reading from the pipe in the first window and writing in the second).
Conclusion
A solid knowledge of output redirection is the difference between a novice Linux user and a more competent, advanced user. With this you can now make more complex pieces instead of relying on the utility of just a single program.
As a quick example, using ls
allows you to see the contents of your directory, and grep
searches for patterns in a file. Combining those two provides you the ability to find files that match a specific pattern in a certain directory:
❯ ls /dev | grep stdin
stdin
With the variety of different programs that can be mixed and matched on a Linux machine, the possibilities of what you can do become virtually limitless.