Regex is the thing that you only learn when you need it. Unless you are processing a considerable amount of data, you likely won’t use it.
Does that imply that, as a software engineer, we should forget about it and worry about it when that time comes? Are we not supposed to take responsibility to learn it?
Programmers think that Regex is hard. As with every skill, it requires practice to master. To help you with it, I wrote this article to cover the basics of Regex and show a simple application of how you can use it.
Content
- Reasons to learn Regex
- Understand Regex
- Regex structure and special characters
- Example using Regex and JavaScript
- Resources
Reasons to learn Regex
Stuck in limbo, googling about the Regex pattern to the problem we are trying to solve. Does this sound familiar? I bet at least one of you were in a comparable situation before. But, don't you think it would be easier to know the in and out of Regex? Indeed, this would have reduced the time searching for answers.
Regex provides a more concise way of solving problems that need some form of parsing. An example is the split function. Turning your string into tokens before applying some sort of logic is lengthy to put in place. Turnouts that this implementation is limited compared to using Regex.
Hopefully, the next part excites you as we are going to cover more of Regex.
Understand Regex
Regex is also called regulation expression. It is a set of string characters that define an expression for the patterns of data you are looking for. It has been there for a long time, since the 1980s, and its primary use was for searching and parsing strings.
An example of Regex for looking for email address having a ".com" domain can be: /.+@.+\.com/
.
Don't worry if it does not make sense now. In the next part I will cover what the characters in the above expression mean.
Regex structure and special characters The first thing to know is that there are two ways to define a Regex pattern: Using a regular string literal
var pattern = /abc/
Calling RegExp constructor
var pattern = new RegExp('abc')
When to use which? Regular string literal is when you know the pattern in advance. Contrarily, RegExp constructor when you use dynamic data during runtime.
Special characters in Regex extend the ability to create more complex Regex pattern. Let's look at some fundamental ones.
The string, "From: dinys18@dinmon.tech", will be used in each of the below scenarios. And to give the result of the Regex pattern, an arrow will be used. But in no way this will work using JavaScript.
^
- The caret symbol matches the start of a string
var re = /^ From: / => From:
$
- The dollar sign symbol matches the end of a string
var re = /tech$/ => tech
.
- The period character matches any single character
var re = /.@/ => s@ // Any single character and @ sign
[0-9]
- Character set. Matches any character enclosed with the brackets.
var re = /[0-9]/ => 1 and 8, not to be confused by 18
*
- Asterisk character matches any character before it, at least one, i.e., either zero or one.
var re = /.*:/ => From: // Any multiple of character until semi column
+
- Plus sign character matches any character before it, one or more times.
var re = /@[a-z]+/ => dinmon // Start at @ sign, include any multiple of lowercase characters
Lastly, characters like asterisks, plus sign and period are special characters in Regex. What if you wanted to use them in your regular Regex expression. Thankfully there is a way by using special characters in your pattern, you would need to escape them. Meaning adding \
(slash) in front of them, so that they are no longer considered as special characters, but as the regular character.
var re = /\..*/ => .tech // Start at the period character, include any characters afterwards
Now that we have covered various ways to construct a regular expression let's go ahead and combined it with JavaScript. That will allow us to perform more complex operations like extraction, replacement and so forth.
Example using Regex and JavaScript
In this section I will cover how to use Regex combined with JavaScript to perform an extraction onto a string. For that, I will implement a file simulator that allows the creation of duplicate folder names.
So to avoid duplicate folder name, we need to append a string to the folder name to make the new folder’s name unique. For this will add an index enclosed in brackets to represent the number of times the folder is duplicated.
Before we start constructing the regular expression, let's start breaking down the various scenarios to handle: A folder's name with any characters, e.g, python A folder's name with any characters and a digit enclosed in a bracket, e.g python (0)
First, we need to get the of the duplicated folder's name with any characters.
var regex = /.+/
Then look for the enclosed bracket with a number.
var regex2 = /\([0-9]+\)/
You will notice that we escaped the two brackets that surround the number by using a slash. In the middle of the enclosed bracket, we used a character set from zero to nine to define a number. As we need more that one number, we added the plus sign to cater for numbers of two or more digits.
This sounds good but isn’t it redundant to use two Regex expression on a single string we are trying to pass? What if we could do that in one line? To achieve this, will extract both the folder’s name and the number using the curly brackets around them.
The final expression will look like:
var regex = /(.+) \(([0-9]+)\)/
To execute the Regex expression, call the match function with the above expression as an argument.
var name = 'Folder (0)'
var matchFound = name.match(regex) => ['Folder (0)', 'Folder ', '0']
The above result of match function will return null if no value found or the values extracted. Check the match() function reference for more detail.
Note: The first value of the array will be the string you passed in, and the rest is the extracted values.
I leave the next part for you to complete so that the function getDuplicateName return the folder’s name and the index at the end of the folder if it is a duplicate.
function getDuplicateName(list, name) {
var regex = /(.+) \(([0-9]+)\)/
var matchFound = name.match(regex) ?? []
var [, baseName, index] = matchFound;
var isDone = (matchFound.length > 0) ? !(!!baseName) : !list.includes(name)
var count = index ? Number(index) + 1 : 0
var newName = name
baseName = baseName ?? name
while (!isDone) {
newName = `${baseName} (${count})`
if (!list.includes(newName)) {
isDone = true
continue
}
count++
}
return newName
}
Resources
- Regex Crossword - A fun way to learn Regex
- MDN Regular Expression - For additional reference to the content covered in here
If you want to look at the full source code, visit the GitHub repository or the demo of the file simulator.
If you like what you read, consider following on Twitter to find valuable content.