understanding regular expressions to finite state machines

i recently came across a question on designing a state machine for detecting a binary string sequence. the solution to such a problem has two approaches. the very commonly followed approach is analyze the “given” sequence over and over again and try to scribble state transitions and test the patterns on it to check if breaks. i believe this is the way hardware engineers work this out, primarily because they spend some time which is considerably less than what they spend for other problems they analyse :) i say hi to “theory of relativity” here..

the second approach is to have rules to analyze such problems because they only consist of binary patterns (0s and 1s) and a rule based solution is like some kind of a program, so you don’t necessarily have to do much testing or alter, adjust your state machine as you try to solve the problem. the theory that deals with such kind of problems of pattern matching and sequence detection is the famous regular expression or the re. regular expressions can be analyzed as finite automata, or what can be called as our commonly known finite state machine.

so the best way to deal with such problems is to represent the sequence as a regular expression and just try converting it to finite state machine based on rules.

So some notations w.r.t re:

a* : an a* would mean 0 or more matches of the letter “a” followed by successive letters. for instance a*b matches “ab”, “aab”, “aaab” or just “b”.

a+ : an a+ would mean 1 or more matches of the letter “a” followed by successive letters. for instance a+b matches “ab”, “aab”, “aaaab” but will not match just b.

a? : an a? would mean 0 or 1 matches of the letter “a” followed by successive letters. for instance a?b matches “ab” or just “b” but will not match “aaab”.

the problem statement mentioned the sequence detection for the string 101011, which also meant that the state machine should be able to sustain 101010111 or a 111101011 as a match since both these string involve the string 101011.

Looking more closely at the sequence, a few points to notice.

a. match as many 1s preceding this sequence so that we end up matching 11101011, or 11111111010111 or 11111111111111101011. to put this in the basic atom terminology, we match (1+)01011.

b. match as many 10s preceding this sequence so that we end up matching 10101011 or 101010101011 … so we can end up consuming any number of 10s once we get past the first occurrence of 10. to put this in the basic atom terminology, we match (1+)0(10)+11

so that is pretty much translating problem into a regex string. The following slideshow will try to illustrate a method to translate a regular expression into a state machine. There might have been mistakes in my understanding and representation. Comment or e-mail me if you think there is something wrongly interpreted.

Thanks
Shyam

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: