# Day 21: Beginner guide to regex

## Long awaited

This has been asked for by many of my mentees.

Regex can become overwhelming at times.

Particularly, when you have no clue what you are doing.

## What is regex?

Wikipedia excerpt ~ A **Reg**ular **ex**pression is a sequence of characters that define a search pattern.

## The secret

The secret technique to understand regex is simply by practicing.

This is a universal technique for all things in general.

The more you practice, the better you understand.

## Generating numbers

We’ll start by generating non-random numbers between 1 to 1000.

```
seq 1 1000 > nonrandom_line_file.txt
```

Here we are using seq that generates numbers within a range.

## Matching everything

This is the trick, if you are able to match everything.

You’ll be able to progress to filtering what you require.

```
cat nonrandom_line_file.txt | egrep '[1-9]'
997
998
999
1000
#We check if we match all the 1000 numbers
#By doing a word count on the number of lines
cat nonrandom_line_file.txt | egrep '[1-9]' | wc -l
1000
```

## Match only 2-digit numbers

We know that 2-digit numbers would start with 10 and will end with 99.

The range 10 - 99

We split the range to 10 - 90 and 90 - 99.

**So for the first digit we have:**

```
[1-9]
```

**And for the second digit, we have.**

```
[0-9]
```

**Combining the two we have the full range:**

```
[1-9][0-9]
# if we try this we will get 100-900, as we are not specifying start and # end of text
```

**Start of text is denoted by**

```
^
```

**Adding it in the mix:**

```
^[1-9][0-9]
```

Here, what we are saying is that: 1st digit starts with 1 to 9. And 2nd digit is 0 to 9.

**This time, we’re matching even more numbers.**

```
998
999
1000
[email protected]:~/Projects/tests/grep$ cat nonrandom_line_file.txt | egrep '^[1-9][0-9]' | wc -l
991
```

**Adding an “end of text”**

End of text denoted by:

```
$
```

**Combining everything:**

```
^[1-9][0-9]$
```

**Adding in grep**

```
# grep -e or egrep can be used, they are the same
95
96
97
98
99
[email protected]:~/Projects/tests/grep$ cat nonrandom_line_file.txt | egrep '^[1-9][0-9]$'
```

## Match only 3-digit numbers

We know that 3-digit numbers would start with 100 and will end with 999.

```
cat nonrandom_line_file.txt | egrep '^[1-9][0-9][0-9]$'
```

As seen above, we just add a 3rd for a 3rd digit between 0-9