Course


Find the colleges in the ranklist (grep, pipe and wc)

Find the colleges in the ranklist with grep, pipe and wc commands.

Find the colleges

Let’s now proceed to our first analysis: To list all the lines in the data file that contain the phrase “college”, we need to introduce you with the command grep (global regular expression print). In a nutshell, grep allows you to look through all the lines in a file but only output those that match a pattern. In our case, we want to find all the lines in the dataset that contain “college”. Here’s how we do it:

Bash (5.0.0)
  • Show Input  

Note the use of the command csvlook (get it by running pip install csvkit) to beautify the output. Here, the grep command takes two command-line arguments: the first is the pattern, and the second is the file in which we want to search for this pattern. If you run this command you should see some lines that contain the string “college”:

Image

Note that we have put -i option to make the matching case insensative. Also, find that the logic by mistake identified two universities as college! due to the fact that their names contained the string ("college"). So, you need to be careful, while using grep in data analytics and particularly before reaching a decision!

Video Demo

Learn Practical Data Sciences with Bash Shell