Chapter IV Regular Expressions



Motivation Describe languages




Example 	 Let  S  = {0, 1}.  Describe L = {0, 1}


Solution	 Let L = {1} U {0}.






Note We abbreviate this by L = language (1 Ú 0).  The book writes 


L = language (1 + 0).





Example 	Let à = {1}.  Describe L = {all strings which are empty or whose digits are all 1}.  
Solution Take all strings over à; i. e., L = language {1}*.

Note We abbreviate {1}* by 1*.
Example Let à = {0, 1}.  Describe L = {all strings of 0's and 1's}.

Solution Take L = language (0 + 1)*.  We can also write this as L = language (0*1*)* or as L = language (1*0*)*.

Note Many people use Ú rather than +.  In fact, + usually stands for another operation.

Example Let à = {0, 1}.  Describe L = {all positive binary numbers}.

Solution We have L = {w Ï à*| w begins with 1}.  We can write this as L = language 1(0 + 1)*

Example 	Let à = {0, 1}.  Describe L = {all positive binary even numbers}.  

Solution Take all strings which begin with 1 and end in 0; i. e., 
L = language 1(0 + 1)*0.

Example 	Let à = {0, 1}.  Describe L = {all non-negative binary even numbers}.  
Solution Take all strings which end in 0, so 
L = language (1(0 +1)*0 + 0).

Example 	Let à = {a, b}. Describe L = {strings of a's and b's | each string contains aa}.
Solution Let L = language ((a + b)* aa (a + b)*).  Note that some strings in L have more than one occurrence of aa.

Example 	Let à = {a, b}.  
Describe L = {strings of even length over {a, b}}.
Solution The shortest such strings are aa, ab, ba, bb, and ò.  Then
L = language (aa Ú ab Ú ba Ú bb)*.  Another way to write this language is L = language ((a + b) (a + b))*.

Note There are different ways to represent the same language.


Lemma 	(Distributive law) (a + b) c = ac + bc.
Proof 	Let L = language ((a + b) c), and let 
M = language (ac + bc).  Let w Ï L.  Then w begins with either a or b, and the rest of w consists of c; hence, w = ac or w = bc.  Thus 
w Ï M.  The converse is similar. ¾

Note In general, other laws of algebra do not hold; for example, 

ab ± ba
a + a = a

Definition The set of regular expressions is defined as follows:

Rule 1	If x Ï à, then x is a regular expression.

Rule 2	If r1, r2 are regular expressions, then so are
(r1)  r1r2  r1 + r2  r1*

Rule 3	Nothing else is a regular expression.

Definition 	

If S, T are languages over the same alphabet, then 
ST = {w1w2 | w1 Ï S, w2 Ï T}.

Note If S = language (r1) and T = language (r2) then 
ST = language (r1r2).

Note Not every language is defined by a regular expression; for example, L = {anbn | n ³ 0) is not defined by a regular expression.

Theorem  Every finite language can be described by a regular expression.
Proof	Let L = {w1, . . . wn}.  Then each wj can be described by a regular expression (write out the letters and show this is a regular expression).  Then take the union.  ¾

Example 	L = {xn | 0 ² n ² 5}.  Write r.
Solution	L = language (ò + x + x2 + x3 + x4 + x5)

Example	Let à = {b}.  Describe L = {w | w is any string of b's}.
Solution	L = language (b*).

Example	Let à = {b}.  Describe L = {w | w is a string of b's, w ± ò}.
Solution	L = language (bb*) or L = langauge (b*b).  This is often 
written L = language (b+).

Example	Let à = {a, b}.  Describe L = {w | w contains exactly 3 a's}.
Solution 	L = language (b*a b*a b*a b*).

Example	Let à = {a, b}.  Describe L = {w |  a appears in w as aaa}.
Solution	L = language (aaa + b)* or L = language ((aaa)*b*)*

Example	Let à = {a, b}.  Describe L = {w | w does not contain aaa}. 
Solution	Beginning of word 
			no a's bb*
			one a  a
		        two a's aa
Then each of these latter expressions must be followed by at least one b.  Hence, the word must begin with bb* or abb* or aabb*.

Middle of word -    (a + aa)bb*, repeated indefinitely

End of word -         b* or a or aa

We have 
L = language (bb* + abb* + aabb*) ((a + aa)bb*)*(b* + a + aa).

We can simplify this expression.  The first factor can be written
(ò + a + aa) bb*
The second factor can be written 
((ò + a + aa)bb*)*
Hence, we can write
L = language ( ((ò + a + aa)bb*)+ (b* + a + aa))

Problem 14, page 61	
Show that the following pairs of regular expressions define the same language.
(i)	(ab)*a	and      a(ba)*
(ii)	(a* + b)*	and      (a + b)*
(iii)	(a* + b*)*	and      (a + b)*
Solution	(i) Let S = language ((ab)*a) and let T = language (a(ba)*).  Suppose w Ï S.  Then w = (ab)na for some n Ï N, so 
we can rewrite w as a(ba)n.  Hence w Ï T.  The reverse inclusion is proven in a similar manner.
(ii) Let S = language ((a* + b)*) and let T = language ((a + b)*).  Let 
w Ï S.  Then w = an1 bk1 . . . Hence w Ï T.  The reverse inclusion is similar.
(iii) Let S = language ((a* + b*)*) and let T = language((a + b)*).  
Let w Ï S.  The argument is the same as for (ii).	

Problem 19, page 62
Let R, S, and T be languages and assume that ò µ S.  Then  
R = SR + T if and only if R = S*T.
Proof (=>) Let w Ï SR + T.  If w Ï T then w = òw Ï S*T.  If w Ï SR then w Ï S (SR + T) = S2R + ST.  If w Ï ST then we are done.  Otherwise, substitute SR + T for R again to get w Ï S3R + S2T.  Continue until we have w Ï SnT for some n.  If this does not happen, then the length of w is greater than n for any n Ï N, contradiction.
(<=) Let R = S*T and let w Ï S*T.  If w Ï T then we are done.  If 
w Ï ST, then since T Þ R we have w Ï SR.  If w Ï S2T then 
w Ï S(ST) Þ SR.  Continue this way to show that either w Ï SR or 
w Ï T.  Hence, w Ï SR + T in general. ¾