Using Regular Expression in MySQL

Regular expression is a powerful tool which gives you a concise and flexible way to identify strings of text, for instance characters, words or patterns of characters. As an example, you can use regular expression to find email, IP address, phone number, social security number… which has their own patterns. In computing world, regular expression usually refers as regex in short. Regular expression uses its own language with special syntax which can be interpreted by a regular expression processor. Regular expression is now widely use in almost platform from programming language to database world. MySQL built-in supports regular expression which allows you to search for strings of text by using special operator call REGEXP. Here is the syntax:

SELECT column_list 
FROM table_name
WHERE column REGEXP pattern

The advantage of using regular expression is you are not limited to search for a string based on a fixed pattern in comparison with LIKE operator. In addition, patterns in regular expression can help you to reduce the lengthy SQL statements with AND and OR operators in condition of WHERE clause.

The disadvantage of using regular expression is that it is quite difficult to understand and maintain such a complicated pattern. Therefore you should write comment the meaning of regular expression when you use it in SQL statement. In some cases, the performance of data retrieval may degrade if you use complicated pattern in non-indexed column.

Let’s take a look at an example of using regular expression in MySQL. Suppose you want to find out all employees who has last name starting with M, B or T. we can use regular expression in SQL statement like this:

SELECT lastname,firstname 
FROM employees
WHERE lastname REGEXP '^(M|B|T)'

Here is the output:

lastname  firstname
-------- ---------
Murphy Diane
Bondur Gerard
Bow Anthony
Thompson Leslie
Tseng Foon Yue
Bondur Loui
Bott Larry
Marsh Peter

The pattern says find all last name which has the initial characters is started with M or (|) B or T.