Table of Contents
Introduction
The first two points Unix Philosophy as documented by Doug McIlroy:
- Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
- Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
Downside of the Unix Philosophy
After you have used the terminal for a while, you will start to see that certain tasks are pretty repetitive. You’ve probably noticed—as evident in the terminal article—there are myriad of programs that live on your Linux box. However, you’ve also probably noticed that all of these programs tend to be pretty simple (i.e. cd
is pretty good at changing directories, but not much else). This follows the Unix philosophy of having many programs that do few things so they can link together like building blocks to do great things. Unfortunately, all of this modularity leads to a lot of typing.
Scripting to the Rescue
Scripts are programs written in the shell’s language to automate tasks. Think of it like a recipe; apart from your list of ingredients, there is also the procedure, or step by step instructions to make the meal correctly. When writing a script, you are writing the procedure the shell will execute, step by step until the list is completed.
The following is a simple script that prints out Hello world!
, then the date, then your user name. You’ll notice that the main components of a script tend to be programs run in sequence.
# Example Script
#!/bin/bash
echo "Hello world!"; # Step one
date; # Step two
whoami; # Step three
Basics
Everything in Linux is a file, including shell scripts. Per convention, these files end in .sh
, but can also end in .zsh
or .bash
, etc. for specific shells. In this tutorial, we will focus on bash scripting. While sh, bash, zsh, or dash are similar to each other, there are idiosyncrasies that prevent them from being interchangeable. This won’t be addressed in this tutorial.
Skeleton
When creating a script, the terminal needs to know what kind of language it is interpreting. This is done on the first line of the script by putting a shebang (#!
) followed by the path of the program language. For bash scripts you would put:
#!/bin/bash
Hint: This works for any scripting language, including Python!
#!/usr/bin/env python3
Comments
Scripts can get hairy fairly easily. Especially if you are developing one over the course of days or weeks; a certain line that was clear to you before can become nothing short of an alien language. This is where comments come into play. By peppering your script with comments, you will save yourself from countless hours of trying to relearn what you were doing before. Comments are pretty straightforward in bash. There are three types: inline, full-line, and block comments.
Inline
Good for a quick follow up clarifier, inline comments are put after a statement:
echo "This will be printed!"; # This comment will not be printed
Full-line
Full-line comments are good for when you want a nice header explanation for a block of code that follows or even as a popping visual separation:
# The following code will print out the contents of the directory and filter for a keyword
ls -al | grep 'keyword';
####################################################################
# This is the second section of the code
echo "This text won't print" > /dev/null;
Block
Sometimes specific blocks of information are necessary for someone who will view your script contents at a later time. Block comments are useful for function descriptions or even author information:
: << 'COMMENT'
Author: Christopher Kitras
Date modified: Apr 27, 2022
email: kchristm@byu.edu
COMMENT
echo "This text prints! But the multiline comment won't"
Variable Names
Assigning values to a variable is as simple as putting the variable name, the equals sign, and the value. This will work for any data type you wish (i.e. ints, strings, etc). NOTE: There are NO spaces between the variable name, the equals sign, and the value. This is because if you have a variable name by itself, the script will try to evaluate it as though it were a program.
foo="bar"
bar=2
baz=-1.75
Whenever we want to reference one of these variables, we will not use just the variable name, but rather $
and the variable name. This $
helps the script distinguish between a program to be executed and the variable:
variable="hello world!"
echo variable # Will output the word "variable"
echo $variable # Will output hello world!
One caveat to note is that you CANNOT assign the output of a program by having the variable name, the equals sign, and then the command. Instead, you must execute the command in-line and assign that value to the variable. This can be done in a few ways:
foo=$(echo "bar") # foo will equal bar
bar=`cat text.txt` # bar will equal the contents of text.txt
Conditional Logic
Following a procedure in a recipe tends to be a linear experience. There is normally one path (steps 1 - n) that will allow you accomplish a singular outcome. However, scripts and tasks for a computer rarely tend to be that direct. Most times there are forks in the road and the path we take will be determined by the answer to a single question or condition. Take a look at the following script which sees if there are more than 100 files in the current folder:
#!/bin/bash
# Checks to see the amount of files and folders in a directory and comments accordingly
if [`ls | wc -l` -ge 100]; then
echo "Woah there bucko, time to clean up"; # Runs when there are 100+ files/folders
else
echo "I guess you're okay"; # Runs if there are less than 100 files/folders
fi
To let the shell know we are starting a conditional statement, we start with an if
keyword. This is followed by a conditional statement (covered in a following section in more detail below) contained in square brackets []
(in our example, we are executing inline a command that returns the number of files and folders are in the current folder where the script is being run). Whenever the condition in the brackets is true, the part following the then
keyword will be executed, else
the opposite statement will be run. To close the scope of the conditional the fi
(which is just if
backwards) keyword must be used (as opposed to a set of braces or specific indentation as found in other languages).
If you want to have a statement that checks multiple conditions sequentially, you will have to use the else if keyword or elif
. If the first condition is true, the statement following it’s then
keyword will be executed. If not, the script will sequentially check all of the comparisons until one is true or the else
keyword is reached:
#!/bin/bash
# Checks to see the amount of files and folders in a directory and comments accordingly
if [`ls | wc -l` -le 10]; then
echo "You have less than or equal to 10 files/folders"; # Runs when there are 0-10 files/folders
elif [`ls | wc -l` -le 20 ]; then
echo "You have less than or equal to 20 files/folders"; # Runs if there are less than 11-20 files/folders
elif [`ls | wc -l` -le 30 ]; then
echo "You have less than or equal to 30 files/folders"; # Runs if there are less than 21-30 files/folders
elif [`ls | wc -l` -le 40 ]; then
echo "You have less than or equal to 40 files/folders"; # Runs if there are less than 31-40 files/folders
else
echo "You have more than 40 files/folders"; # Runs if there are less than 41+ files/folders
fi
For the more programmatically savvy of you who which to branch out (pun intended), syntax also exists for using a switch statement using the case
and esac
keywords. More information about that can be found here.
Repeating Logic
Inevitably, you will want a script to repeat the same action several times within the same process. Whether repeating the same command, looping through numeric values, or iterating through the output of a program, there are several ways that looping can maximize the efficiency of your script.
For loops
When you want to loop for
a distinct amount of times, you will want to use a for
loop. In bash you can iterate (or pass through) both numbers or the output of a program. Say we want to print from one to ten in the console. Instead of using echo
10 times, we could simply do the following:
for i in {1..10}
do
echo $i
done
The for
keyword is used to indicate the type of loop, the i
is our iterator (it can be any variable name), the in
word lets us know that we are counting through the values that follow in the braces {}
. True to bash form, you’ll notice that the scope of a call is done with words instead of special whitespace or punctuation. In a for
loop, the do
and done
keywords are what encapsulate the code to be executed in one iteration.
But let’s say we don’t need to iterate through a range of numbers, but rather we want to repeat a command for every file/folder in a directory. For loops can take care of that too! Let’s print out the name of every file in the directory using a loop (yes this can just be done with the ls
command, but I’m trying to make a point here 😉):
for file in `echo *`
do
echo $file
done
echo *
prints all the files and folders in a directory in one block while our loop will list every single one of those files on a new line by printing out each value in the block following in
individually.
for
loops are a powerful tool to help minimize the amount of repetitive code you write. From iterating through a range of numbers, to specific lists, or even the output of a command, large repetitive tasks can be condensed to a few lines of code. For a deeper dive into the different types of for
loops, this article goes into fairly decent detail.
While loops
But what if the amount of times we want to repeat something isn’t certain? Much like conditional statements above, a while
loop will look at a certain condition and act upon it. while
the condition doesn’t evaluate to true, the code between the scope keywords will keep executing:
while [`date +%H` -ne 22]
do
echo "It's not 10pm!"
done
As mentioned earlier, conditions are placed within square brackets, and this condition happens to check to see if the hour is 22
(as in 22:00). While the time is not between 10pm - 11pm, this script will print out It's not 10pm!
. Pretty annoying, no? Similar to the for
loop, the scope encapsulating keywords for a while
loop are also do
and done
.
NOTE: For the more programmatically savvy among you, it is also worth noting that basic loop keywords such as break
and continue
and their subsequent logic are still valid in bash.
Comparison Operators
As you’ve probably noticed in the last few sections, conditional statements are crucial for any logic that makes a script smart. Whether we are checking to see if the output of a program contains a desired string or making sure that a program exits with the right code, knowing the right comparison operator for the right occasion makes all the difference. Admittedly, the syntax for these in bash can get a little confusing, which is why it is important to know whether we are comparing string values or number values.
String comparison
A string is any value assigned to a variable which contains a list of (or string of) characters (letters, symbols, and/or numbers) lumped together as a single object encapsulated in quotes t (i.e. “hello” or “b@nAn!123”). The following table shows all the available string comparison operators:
Operator | Description | Example |
---|---|---|
= or == | Is Equal To | if [ “$foo” == “$bar” ] |
!= | Is Not Equal To | if [ “$foo” != “$bar” ] |
> | Is Greater Than (ASCII comparison) | if [ “$foo” > “$bar” ] |
>= | Is Greater Than Or Equal To | if [ “$foo” >= “$bar” ] |
< | Is Less Than | if [ “$foo” < “$bar” ] |
<= | Is Less Than Or Equal To | if [ “$foo” <= “$bar” ] |
-n | Is Not Null | if [ -n “$foo” ] |
-z` | Is Null (Zero Length String) | if [ -z “$foo”] |
Number comparison
For the math heads out there, you can also compare numbers (i.e. integers and floats) to one another. This tends to be useful when you are trying to compare outputs of programs or even with you are checking the exit status of a program and want to act accordingly. The following table shows the operators that can be used with numbers:
Operator | Description | Example |
---|---|---|
-eq | Is Equal To | if [ $foo -eq 200 ] |
-ne | Is Not Equal To | if [ $foo -ne 1 ] |
-gt | Is Greater Than | if [ $foo -gt 15 ] |
-ge | Is Greater Than Or Equal To | if [ $foo -ge 10 ] |
-lt | Is Less Than | if [ $foo -lt 5 ] |
-le | Is Less Than Or Equal To | if [ $foo -le 0 ] |
== | Is Equal To | if (( $foo == $bar )) NOTE: Used within double parentheses |
!= | Is Not Equal To | if (( $foo != $bar )) |
< | Is Less Than | if (( $foo < $bar )) |
<= | Is Less Than Or Equal To | if (( $foo <= $bar )) |
> | Is Greater Than | if (( $foo > $bar )) |
>= | Is Greater Than Or Equal To | if (( $foo >= $bar )) |
Functions
Now that you understand how to carry out basic logical actions in a bash script, you might find yourself greedy with power. As you frantically write down conditional after conditional and loop after loop, you realize that your scripting has become monotonous again. Surveying your prized code, you realize despite all the tricks you’ve learned thus far, your script still looks repetitive. Functions to the rescue!
Assuming that you are familiar with functional programming in general, bash functions are interesting in the sense of their syntax (just like with everything else in this tutorial). There are two ways to declare a function in bash:
function banana {
echo "banana";
}
### OR ####
apple() {
echo "apple";
}
banana # prints out 'banana'
apple # prints out 'apple'
Both are equally valid and neither has an advantage over the other. It is simply a matter of preference.
If you are looking for something a little more intelligent out of your functions (i.e. accepting parameters to calculate), I’m afraid the novel bash syntax strikes again. Although the apple format has parenthesis ()
following the function name, they are purely ornamental. Nothing can be put in them. Does this mean that bash functions don’t support parameters? Of course not! It’s just a little more obtuse than that. Say I have a function that adds two numbers together:
add(){
echo `expr $1 + $2`
}
$1
and $2
are the first and second parameters that get passed into that function, respectively. So when I run
add 1 2 # That's right, no parenthesis, just arguments like any other unix program
I should get an output of 3
. The convention of $
and then a number is specific to bash arguments, which means if they are used in the scope of a function, they are the arguments passed into the function. If they are used globally, it means they are the arguments passed into the script!
Misc. Tips and Tricks
As you may have noticed, many of the topics covered in this scripting tutorial show the basics of the bash programming language. For the student who is looking for more information beyond the scope of the basics, a good direction would be to Google how to accomplish x fundamental programming paradigm in bash specifically. If it exists, it can be thrown in a script. Below are a list of tutorials that are helpful in common situations.
Output Redirection
A lot of what you need to know about output redirection can be found in the following article. Using redirection can be a wonderful means to suppress output from programs in a script or saving files from the script itself.
Input Output Redirection in Linux/Unix Examples
Script Arguments
Much like the arguments that go inside of a function, the arguments that go inside of a script are determined by $
and then a number. When one of these variables are found in a general scope (i.e. not inside a function), they are determined to be script variables. The following article goes more in depth on script arguments and the different types that exist.
Bash Beginner Series #3: Passing Arguments to Bash Scripts
Special Variables
By now you have noticed that bash programming is full of its own idiosyncratic components. As you get deeper into scripting, some of these special shorthand variables will help you give the robustness that your script needs:
Special Variable | Description |
---|---|
$0 | The name of the bash script. |
$1 , $2… $n | The bash script arguments. |
$$ | The process id of the current shell. |
$# | The total number of arguments passed to the script. |
$@ | The value of all the arguments passed to the script. |
$? | The exit status of the last executed command. |
$! | The process id of the last executed command. |
Conclusion
As a researcher, writing scripts can be one of the most invaluable tools on your belt. Automating passive tasks will help you save the time you need for problem solving. Keep in mind that learning to write scripts is almost the same as learning to program in a new language, so it will likely require the same amount of effort. There is so much more to writing scripts than what could be covered in this tutorial, but with these basics and some practice, you will be an expert sysadmin in no time!