Day 21: Beginner guide to regex

Long awaited

egrep

This has been asked for by many of my mentees.

Regex can become overwhelming at times.

Particularly, when you have no clue what you are doing.

What is regex?

Wikipedia excerpt ~ A Regular expression is a sequence of characters that define a search pattern.

The secret

The secret technique to understand regex is simply by practicing.

This is a universal technique for all things in general.

The more you practice, the better you understand.

Generating numbers

We’ll start by generating non-random numbers between 1 to 1000.

seq 1 1000 > nonrandom_line_file.txt

Here we are using seq that generates numbers within a range.

Matching everything

This is the trick, if you are able to match everything.

You’ll be able to progress to filtering what you require.

cat nonrandom_line_file.txt | egrep '[1-9]'
997
998
999
1000
#We check if we match all the 1000 numbers
#By doing a word count on the number of lines
cat nonrandom_line_file.txt | egrep '[1-9]' | wc -l
1000

Match only 2-digit numbers

We know that 2-digit numbers would start with 10 and will end with 99.

The range 10 - 99

We split the range to 10 - 90 and 90 - 99.

So for the first digit we have:

[1-9]

And for the second digit, we have.

[0-9]

Combining the two we have the full range:

[1-9][0-9]
# if we try this we will get 100-900, as we are not specifying start and # end of text

Start of text is denoted by

Adding it in the mix:

^[1-9][0-9]

Here, what we are saying is that: 1st digit starts with 1 to 9. And 2nd digit is 0 to 9.

This time, we’re matching even more numbers.

998
999
1000
codax@gaming:~/Projects/tests/grep$ cat nonrandom_line_file.txt | egrep '^[1-9][0-9]' | wc -l
991

Adding an “end of text”

End of text denoted by:

Combining everything:

^[1-9][0-9]$

Adding in grep

# grep -e or egrep can be used, they are the same
95
96
97
98
99
codax@gaming:~/Projects/tests/grep$ cat nonrandom_line_file.txt | egrep '^[1-9][0-9]$'

Match only 3-digit numbers

We know that 3-digit numbers would start with 100 and will end with 999.

cat nonrandom_line_file.txt | egrep '^[1-9][0-9][0-9]$'

As seen above, we just add a 3rd for a 3rd digit between 0-9

Will add more examples in the coming week or so !

\Codarren/

Written on January 21, 2021