Learn to Analyze Text Data in Bash Shell and Linux - Video Lectures by Learn Scientific Programming

Learn to Analyze Text Data in Bash Shell and Linux - Video Lectures

An animated course to illustrate the use of Bash, AWK and SED to analyze text data, voiced by A. Collinwood, authored by Ahmed Arefin, PhD

Get access for $10


1000+ students have taken this innovative project-based data learning course (includes video lectures and an eBook with source codes and data sets)

Can you build a script to count the number of sequences in a Big data consisting hundred thousands of nucleotide sequences in 30 seconds? You may wonder to know, this wouldn't take more a than a few words in Bash!  Three simple projects to demonstrate the use of Bash shell in processing csv formatted text data sets. As seen on the: 

This course starts with some practical bash-based flat file data mining projects involving: 
  • University ranking data
  • Facebook data
  • Crime Data

There are several examples of practical data mining that will have a flow of importing specific data resources into flat text-type files. Bash can run different programs (grep, sort, sed, and so on) on those files, clean, optimise and extract preliminary views (cut, csvlook, view, cat, head, etc.) of the data. There is one part of data mining, which involves unstructured data and then transforming it into a structured one (awk, shell). A scripting language like Bash can be very useful for doing the transformation. 

Bonus contents:

What's included?

Video Icon 14 videos File Icon 6 files


Project 1: US News University Ranking Data
Data preview (head and csvlook commands)
3 mins
Find the colleges in the ranklist (grep, pipe and wc)
3 mins
Find the number of Institutes from a given and all states (cut and sort)
3 mins
Finding the correlation between university tuition and ranks (tail and redirect)
3 mins
Commands demonstration
2 mins
Project data set: unirank.csv
12.8 KB
Project 2: Facebook data mining
Data preview (head command)
2 mins
Find the number of status and most popular status entry (cut, sort, grep, awk)
4 mins
Building a function to find the most vibrant Facebook status (Bash functions)
3 mins
Commands demonstration
3 mins
Project dataset: facebookdata.csv
791 KB
Project 3: Australian cities crime statistics
Data preview (head and csvlook commands)
5 mins
Finding rows and columns stats (wc, sed, csvstat)
4 mins
Finding the top most crime per city (awk)
3 mins
Finding the best city in Australia (Bash shell programming)
3 mins
Commands demonstration
3 mins
Project dataset: crimedata-au.csv
1.81 KB
Bonus Contents
Learn of Analyze Data in Bash Shell and Linux (eBook: PDF)
3.73 MB
Source codes
3.39 MB
3.39 MB


Do I get an instant access?

You do indeed, mate! Simply sign up and you'll have instant access to all of the course resources right away.

Who is this course for?

This course is suitable for anyone who wants to deal with some kinds of data! such as students or researchers who want to add Bash and other command line tools to their bag of tricks, scientists who want to learn to explore and analyze the data that their lab generates, or even journalists who want to polish their reporting by analyzing publicly-available data sets.

Why should I use Bash shell scripting for data mining?

Bash may not be the best way to handle all kinds of data, but there often comes a time when you are provided with a pure Bash environment, such as what we get in the common Linux-based Supercomputers and you just want an early result or view of the data before driving into the real programming, using Python, R and SQL, SPSS, and so on. Expertise in data-intensive languages comes at the price of spending a lot of time on them. In contrast, bash scripting is simple, easy to learn and perfect for mining textual data.

Is this course easy to follow?

Yes! I wanted to create a super beginner friendly reading material that would help the people who are not very much familiar with Bash/Linux, but willing to use the power of it. I hope you will enjoy it!