Pages

11 January 2012

Using Regular Expressions to Process Texts in Java

Regular expressions make processing texts simpler with less amount of code; on the other hand its syntax confused developers. Counting vowels and consonants, words, sentences etc. in a text, is a good exercise for those beginners who is trying to improve himself/herself in algorithms, a new programming language or both. However, this is also a bit challenging question for even a 2+ years "experienced" programmer; if regular expressions must be used. Some developers, who is programming (and thinking themselves as a best) on java language, don' t know how or when exactly to use regular expressions while coding java in any of J2ME, J2EE or J2SE. Superficial knowledge about fundamentals of java force developers settle for messy java codes such as iterating over array of characters and weird fields of if and loop statements. This increases the complexity of  source code and becomes more difficult to maintain, in later project phases.

Regular expressions is widely applicable in other platforms, programming languages and operating systems; but usages and implementations may vary just because of syntax differences between them. Finding validation patterns(expressions) for email, domain name, citizen id, date, IP, phone number and so on becomes easier and reusable with proved patterns already used and published on the internet.

What Exactly Regular Expression (Regex or Regexp) is?
A regular expression is written in a formal language that can be interpreted by a regular expression processor, which is a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

Java has a regular expression library in java.util.regex package that stands as the regular expression processor in the formal definition. There are four classes inside the library: Matcher, Pattern, MatchResult and PatternSyntaxException classes which handles compiling pattern and matching operations. Example usage is below:

Word Selector Regular Expression Example
Pattern pattern = Pattern.compile("[a-zA-Z]+"); //1
Matcher matcher = pattern.matcher("Why should i use regex?"); //2
while(matcher.find()) { //3
   System.out.print(matcher.group()); //4
}
@Line 1: Compiling pattern for input([a-zA-Z]+) and craete a Pattern object. If pattern syntax is not valid; PatternSyntaxException is thrown. This pattern will select only "one or more" "lower case" or "upper case" letter combinations in a text.
@Line 2: Pattern object is creating a matcher for search and select the pattern in parameter text.
@Line 3-4: Find the matched text, select it with group method and print it to System.out

Output
Why
should
i
use
regex

No comments:

Post a Comment

Thx for reading! Comments are appreciated...