The Story Behind grep
The grep command is famous in Linux and Unix circles for three reasons. Firstly, it is tremendously useful. Secondly, the wealth of options can be overwhelming. Thirdly, it was written overnight to satisfy a particular need. The first two are bang on; the third is slightly off.
Ken Thompson had extracted the regular expression search capabilities from the ed editor (pronounced ee-dee) and created a little program—for his own use—to search through text files. His department head at Bell Labs, Doug Mcilroy, approached Thompson and described the problem one of his colleagues, Lee McMahon, was facing.
McMahon was trying to identify the authors of the Federalist papers through textual analysis. He needed a tool that could search for phrases and strings within text files. Thompson spent about an hour that evening making his tool a general utility that could be used by others and renamed it as grep. He took the name from the ed command string g/re/p , which translates as “global regular expression search.”
You can watch Thompson talking to Brian Kernighan about the birth of grep.
Simple Searches With grep
To search for a string within a file, pass the search term and the file name on the command line:
Matching lines are displayed. In this case, it is a single line. The matching text is highlighted. This is because on most distributions grep is aliased to:
Let’s look at results where there are multiple lines that match. We’ll look for the word “Average” in an application log file. Because we can’t recall if the word is in lowercase in the log file, we’ll use the -i (ignore case) option:
Every matching line is displayed, with the matching text highlighted in each one.
We can display the non-matching lines by using the -v (invert match) option.
There is no highlighting because these are the non-matching lines.
We can cause grep to be completely silent. The result is passed to the shell as a return value from grep. A result of zero means the string was found, and a result of one means it was not found. We can check the return code using the $? special parameters:
Recursive Searches With grep
To search through nested directories and subdirectories, use the -r (recursive) option. Note that you don’t provide a file name on the command line, you must provide a path. Here we’re searching in the current directory “.” and any subdirectories:
The output includes the directory and filename of each matching line.
We can make grep follow symbolic links by using the -R (recursive dereference) option. We’ve got a symbolic link in this directory, called logs-folder. It points to /home/dave/logs.
Let’s repeat our last search with the -R (recursive dereference) option:
The symbolic link is followed and the directory it points to is searched by grep too.
Searching for Whole Words
By default, grep will match a line if the search target appears anywhere in that line, including inside another string. Look at this example. We’re going to search for the word “free.”
The results are lines that have the string “free” in them, but they’re not separate words. They’re part of the string “MemFree.”
To force grep to match separate “words” only, use the -w (word regexp) option.
This time there are no results because the search term “free” does not appear in the file as a separate word.
Using Multiple Search Terms
The -E (extended regexp) option allows you to search for multiple words. (The -E option replaces the deprecated egrep version of grep.)
This command searches for two search terms, “average” and “memfree.”
All of the matching lines are displayed for each of the search terms.
You can also search for multiple terms that are not necessarily whole words, but they can be whole words too.
The -e (patterns) option allows you to use multiple search terms on the command line. We’re making use of the regular expression bracket feature to create a search pattern. It tells grep to match any one of the characters contained within the brackets “[].” This means grep will match either “kB” or “KB” as it searches.
Both strings are matched, and, in fact, some lines contain both strings.
Matching Lines Exactly
The -x (line regexp) will only match lines where the entire line matches the search term. Let’s search for a date and time stamp that we know appears only once in the log file:
The single line that matches is found and displayed.
The opposite of that is only showing the lines that don’t match. This can be useful when you’re looking at configuration files. Comments are great, but sometimes it’s hard to spot the actual settings in amongst them all. Here’s the /etc/sudoers file:
We can effectively filter out the comment lines like this:
That’s much easier to parse.
Only Displaying Matching Text
There may be an occasion when you don’t want to see the entire matching line, just the matching text. The -o (only matching) option does just that.
The display is reduced to showing only the text that matches the search term, instead of the entire matching line.
Counting With grep
grep isn’t just about text, it can provide numerical information too. We can make grep count for us in different ways. If we want to know how many times a search term appears in a file, we can use the -c (count) option.
grep reports that the search term appears 240 times in this file.
You can make grep display the line number for each matching line by using the -n (line number) option.
The line number for each matching line is displayed at the start of the line.
To reduce the number of results that are displayed, use the -m (max count) option. We’re going to limit the output to five matching lines:
Adding Context
Being able to see some additional lines—possibly non-matching lines—for each matching line is often useful. it can help distinguish which of the matched lines are the ones you are interested in.
To show some lines after the matching line, use the -A (after context) option. We’re asking for three lines in this example:
To see some lines from before the matching line, use the -B (context before) option.
And to include lines from before and after the matching line use the -C (context) option.
Showing Matching Files
To see the names of the files that contain the search term, use the -l (files with match) option. To find out which C source code files contain references to the sl.h header file, use this command:
The file names are listed, not the matching lines.
And of course, we can look for files that don’t contain the search term. The -L (files without match) option does just that.
Start and End of Lines
We can force grep to only display matches that are either at the start or the end of a line. The “^” regular expression operator matches the start of a line. Practically all of the lines within the log file will contain spaces, but we’re going to search for lines that have a space as their first character:
The lines that have a space as the first character—at the start of the line—are displayed.
To match the end of the line, use the “$” regular expression operator. We’re going to search for lines that end with “00.”
The display shows the lines that have “00” as their final characters.
Using Pipes with grep
Of course, you can pipe input to grep , pipe the output from grep into another program, and have grep nestled in the middle of a pipe chain.
Let’s say we want to see all occurrences of the string “ExtractParameters” in our C source code files. We know there’s going to be quite a few, so we pipe the output into less:
The output is presented in less.
This lets you page through the file listing and to use less’s search facility.
If we pipe the output from grep into wc and use the -l (lines) option, we can count the number of lines in the source code files that contain “ExtractParameters”. (We could achieve this using the grep -c (count) option, but this is a neat way to demonstrate piping out of grep.)
With the next command, we’re piping the output from ls into grep and piping the output from grep into sort . We’re listing the files in the current directory, selecting those with the string “Aug” in them, and sorting them by file size:
Let’s break that down:
ls -l: Perform a long format listing of the files using ls. grep “Aug”: Select the lines from the ls listing that have “Aug” in them. Note that this would also find files that have “Aug” in their names. sort +4n: Sort the output from grep on the fourth column (filesize).
We get a sorted listing of all the files modified in August (regardless of year), in ascending order of file size.
RELATED: How to Use Pipes on Linux
grep: Less a Command, More of an Ally
grep is a terrific tool to have at your disposal. It dates from 1974 and is still going strong because we need what it does, and nothing does it better.
Coupling grep with some regular expressions-fu really takes it to the next level.
RELATED: How To Use Basic Regular Expressions to Search Better and Save Time