Equivalence of Regular Expressions and FSA (Chapter 7, cont'd)
--------------------------------------------------------------
  - We can use R.E. to denote specific languages. Similarly,
    FSA can be used to denote languages.
  - We show that for every R.E. R there is a FSA M such that the language
    accepted by M is the same as the language accepted by R, i.e
    L(R) = L(M). 
  - Also, we show that for every FSA M there is a R.E. R that accepts
    the same language as M does, i.e. L(M) = L(R).
  - These two facts together imply that R.E. and FSA have the same *power*.
  
R.E. to FSA.
  - We use the defintion of a R.E. and induction:
    . Basis: if R = empty then M contains only one state and no accepting state.
             if R = e then M contains only one state which is also an accetping state.
             if R = a for some a in S then M contains two states (q_0 and q_1)
               where q_0 is the inital state, q_0 -a-> q_1, and q_1 is accepting.
    . Ind. Step: Suppose R = (S + T), where S and T are regular expressions s.t.
        for each of them there is a FSA that accepts the same language as it does. 
        Call these FSA M and M', where L(M) = L(S) and L(M') = L(T).
        Then L(R) = L(M) union L(M') is accepted by some FSA (as we saw for union
        operation). Similarly, if R = ST then L(R) = L(M) o L(M') is accepted by
        some FSA and if R = S^* then L(R) = L(M)^* is accepted by some FSA.
  - Example: construct a FSA that accepts the language denoted by (1(00)+0)^*
    First consider 00. The FSA for this would be: q_1 -0-> q_2 -0-> q_3 and
    q_2 is accepting. Then for 1(00) we get q_0 -1-> q_1 -0-> q_2 -0-> q_3.
    For 0 we have the FSA q'_0 -0-> q'_1 with q'_1 being the accepting. To take
    the union of these we get the machine with initial state s which has 
    two e-transitions to q_0 and q'_0 and the accepting states are q_3 and q'_1.
    To make the Kleene stare, we add e-transitions from q_3 and q'_1 to s.
     
FSA to R.E.
  - Without loss of generality, suppose that M is a DFSA with states Q = {1..n}.
  - For each two states i and j in Q and an integer k >= 0 we define a set of
    strings L^k_{i,j} which are all those strings x in S^*, such that if we start
    at state i (in M) and the input is x then the computation of M over x does
    not involve any state larger than k and ends in state j.
  - We want to prove that the following predicate:
      P(k): for each i,j in Q, there is a R.E. R^k_{i,j} that denotes L^k_{i,j}
    holds for all k >= 0.
  - Why proving this solves the problem?
  - Because: assume that f_1,...f_t are the accepting states of M. Then
    the language accepted by M is the set of all strings that take M from
    the initial state s to one of f_1...f_t. So:
            { empty                    if t = 0
     L(M) = {
            { L^n_{s,f_1} union L^n_{s,f_2}...union L^n_{s,f_t}
    and since each of the L^n_{s,f_i} can be represented by R^n_{s,f_i}, therefore,
    L(M) can be represented by: R = R^n_{s,f_1} + R^n_{s,f_2} + ... + R^n_{s,f_t},
    if t > 0, and R = empty, if t = 0.
  - To prove P(k), first we write a recursive relation for L^k_{i,j}:
                 {{a in S: d(i,a) = j}          if i <> j
     L^0_{i,j} = {
                 {{e} union {a in S: d(i,a)=j}  if i = j

     for 0 <= k < n:
     L^{k+1}_{i,j} = L^k_{i,j} union (L^k_{i,k+1} o (L^k_{k+1,k+1})^* o L^k_{k+1,j}) 
  - Here is an explanation: for L^0_{i,j}, if i <> j the only string that
    takes M from i to j without going through any other state (because k = 0) is the
    string which has only the symbol(s) on the transition arrow(s) from i to j.
    If i = j, you can go from i to i by going through nothing else by reading empty
    string (e), or by reading those symbols that take M to i again.

    For L^{k+1}_{i,j}, the strings that take M from i to j using states from {1..k+1}
    can be devided into two groups: those that do NOT take M through k+1 at all, or those
    that take M through k+1 at least once. The first group are by definition those that
    take M through states {1..k} only, i.e L^k_{i,j}. For the second group, 
    we divide the computation of each string into 3 parts: the first part is from 
    state i up to the point that M goes to k+1, for the first part. This gives 
    L^k_{i,k+1}. The second part is a concatination of zero or more of strings 
    that take M from k+1 back to k+1 without using states higher than k, i.e 
    (L^k_{k+1,k+1})^*. The last part is from state k+1 to state j without going
    through any state higher than k, i.e. L^k_{k+1,j}.
  - Having this recursive definition for L, it is easy to prove P(k) by induction.
    Basis: k = 0. In this case, L^k_{i,j} contains those symbols that take M from
      i to j (possibily e as well). So R = empty or a_1 + a_2 + ... + a_m , where
      a_x takes M from i to j.
    Ind. Step: Assume that P(m) holds for arbitrary m >= 0. To prove P(m+1) we must
      give a R.E. R^{m+1}_{i,j} for L^{m+1}_{i,j}. From the definition of
      L^{m+1}_{i,j}, we get:
          R^{m+1}_{i,j} = R^m_{i,j} + R^m_{i,k+1}(R^m_{k+1,k+1})^* R^m_{k+1,j}
  - Example: consdier the following FSA with q_1 being the start state and q_2 
    being the accepting state:
    q_1 -0-> q_1 -1-> q_2 -0-> q_2 -1-> q_1

           k=0  |  k = 0  |  k = 1      |    k = 2 
       -----------------------------------------------------------------------
       R^k_{1,1}| e + 0   |   0^*       | 0^* + 0^*1(0 + 10^*1)^*10^*
       R^k_{1,2}|   1     |  0^*1       | 01^* + 0^*1(0 + 10^*1)^*(e + 0 + 10^*1)
       R^k_{2,1}|   1     |  10^*       | 10^* + (e + 0 + 10^*1)(e + 0 + 10^*1)^*10^*
       R^k_{2,2}| e + 0   |e + 0 + 10^*1| (e+0+10^*1)+(e+0+10^*1)(0+10^*1)^*(e+0+10^*1)

Regular Languages
-----------------
  - We have shown that a language is accepted by a FSA iff there is a R.E. for it.
  - A language is called regular iff it is denoted by some R.E., or equivalently,
    iff it is accepted by a FSA (deterministic or non-deterministic).
  - This propery of regular languages can be helpful to build R.E. for them using 
    a FSA for them (or vice versa).
    Example: Suppose that L and L' are two regular languages denoted by two 
      R.E.'s R and R', respectively. To construct a R.E. for the language 
      L'' = (L intersection L') one way is to first construct a FSA for L call it M 
      and a FSA for L' call it M'. We can do this using R and R'. Then using the 
      technique we had in Theorem 7.22 of the text, we construct a FSA for L'', 
      call it M''. Then we construct a R.E. from M''.

Non-regular languages
---------------------
  - It might seem that we can use R.E. to represent every language
  - But this is not true! That is, there are languages that are not regular.
    For example, the language {0^n1^n: n>= 0} is not regular. Intuitively, since 
    each FSA has a finite number of states, it has a bounded memory (states are 
    *memory* of a FSA). If there is a FSA M for this language with X states, and if 
    we choose n large enough, (say larger than 2^X), then there is no way for M
    to *remember* how many 0's it has seen to compare with the number of 1's that 
    it sees.
  - These arguments can be made formal in the form of a theorem:
  - Theorem (Pumping Lemma): 
    Let L be a regular language over S. Then there is some n in N s.t. every string
    x in L with length at least n has the following property:
    
     There are strings u,v,w in S^* such that:
      . x = u v w  
      . v <> e
      . u and v together have length at *most* n, i.e |uv| <= n
      . u v^k w in L, for all k >= 0
  - Example: we want to show that L = {0^m 1^m: m >= 0} is not regular.
    By way of contradiction, assume that L is regular, and let n be the number 
    stated in Pumping Lemma. This means that every string in L with length n, and 
    in particular x = 0^n 1^n (which has lenght 2n), has the properties listed 
    above. So x = u v w, with v <> e, and u v^k w in L, for k >= 0.
    Since |u v| <= n, it follows that u and v only contain 0's, and therefore,
    v = 0^i for some i >= 1 (since v <> e). But then u v^0 w = u w has less 0's 
    than 1's and therefore cannot be in L, but it is in L, a contradiction.