---------------------
Algorithm Correctness (cont'd)
---------------------

Example 3: recursive binary search
    RecBinSearch(x,A,f,l)
        if f = l then
            if A[f] = x then
                return f;
            else
                return 0;
            end if
        else
            m := (f + l) div 2;    { integer division }
            if A[m] >= x then
                return RecBinSearch(x,A,f,m);
            else
                return RecBinSearch(x,A,m+1,l);
            end if
        end if
    End RecBinSearch

Precondition:
  - x and the elements of A are numbers, 1 <= f <= l <= length(A), and
    A[f..l] is a sorted array (in nondecreasing order).

Postcondition:
  - Return t such that 1 <= t <= length(A) and A[t] = x, if such a t exits,
    otherwise return 0.

Proof of Correctness:

  - Termination and partial correctness proved together by induction
  - In this example, a good choice for the parameter of the induction can be
    the number of elements of A[f..l], i.e. l-f+1. Let's define n = l-f+1
  - By induction on n and assuming that precondition holds prove postcondition 
    Note that the inductive structure of the proof will follow the recursive 
    structure of the algorithm (so structure is easy to determine).
    This is the general structure of proof of correctness for recursive 
    programs.

    Now, let's define S(n) to be:
       "if f and l are integers s.t. 1 <= f <= l <= length(A) and 
        length(A)=n and A[f..l] is sorted, when RecBinSearch(x,A,f,l) is 
        called this call terminates and returns some t such that f <= t <=l
        and A[t] = x if such t exits; and returns 0 otherwise".

    We prove by induction on n that S(n) holds for all n >= 1. (Note that
    array A has at least one element and therefore, n >= 1).
 
    Base case: n = 1, i.e., f = l.  Then, algo terminates ('if' part of
        outside 'if' statement performs only 1 test and returns;
        equivalently, there is no loop and no other call), and algo returns
        f if A[f] = x and returns 0 otherwise, which satisfies postcond.
    Ind. Hyp.: Let k >= 1 be an arbitrary integer and assume that algo is
        correct for all inputs of size j, for 1 <= j <= k.
    Ind. Step: Consider the call RecBinSearch(x,A,f,l) when l-f+1 = k+1 >= 2.
        Since f =/= l, the "else" branch of the outside "if" statement will be 
        executed. By Lemma 2.2 of the textbook, f <= (f + l) div 2 < l.
        Based on the comparison in line "if A[m] >= x then", there are 
        two cases:
        Case 1: if A[m] >= x, then RecBinSearch(x,A,f,m,x) is called/executed. 
            Let i = length(A[f..m]). Since  m < l (because (f + l) div 2 < l), 
            therefore, m-f+1 < l-f+1, i.e i <= k. That is, we can use Ind. 
            Hyp. if we show that the precondition required in the statement 
            of P(i) is satisfied. 
            But we have 1 <= f <= m <= l <= length(A). Also, A[f..l] was 
            sorted (by precondition) and since RecBinSearch does not change A,
             A[f..m] will be sorted when the call RecBinSearch(x,A,f,m,x) 
            is made. Therefore, by Ind. Hyp. P(i) holds, which means that, 
            this call terminates and returns the index of x in A[f..m] or 
            0 if x does not appear in A. Hence, this main call also 
            terminates and returns the correct value.
        Case 2: if A[m] < x.  then RecBinSearch(x,A,m+1,l) is called.
            Again, let i = length(A[m+1..l]). Since f < m+1, therefore, 
            l-m < l-f+1, i.e i <= k.  Also, 1 <= m+1 <= l <= length(A). 
            Furthermore, A[m+1..l] is sorted since A is sorted. So we can 
            use the Ind. Hyp., by which P(i) holds.
            That is, call RecBinSearch(x,A,m+1,l) terminates and returns 
            the correct value, i.e the index of x, if it is in A[m+1..l] 
            which would be the index of x in A, and 0 otherwise.
            Hence, the main  call also terminates and returns the correct value.
        In all cases, the postcondition is satisfied.
    Therefore, by induction, RecBinSearch is correct.

--------------------
Algorithm Complexity  [Chapter 3 of textbook]
--------------------
  - Standard measures for algorithms: space (memory) and time (speed).  In 
    this course, we will consider only time.
  - Want measure independent of one machine or implementation, and related
    only to algorithm in an abstract way.
  - One candidate: t(x) = # steps of algo on input x.
  - More generally: worst-case running time T(n) = max{t(x):size(x)=n}.
  - Need to define "step" and "size" precisely.
  - Generally, 1 step = 1 basic operation (assignment, array access,
    arithmetic operation, test, etc.).  "Size" depends on input (for lists
    or graphs, size = number of elements).

  - Example: LinearSearch(x,A)
     0. n := length(A);
     1. i := 1;
     2. while i <= n do
     3.     if A[i] = x then
     4.         return i;
            end if
     5.     i := i + 1;
        end for
     6. return 0;
    
    We count 1 step for line 0, 1 step for line 1, 1 step for the 
    comparison on line 2, 2 steps for line 3 (to access A[i] and compare 
    with x), 1 step for line 4, 2 steps (1 addition, 1 assignment) for 
    line 5, and 1 step for line 6.
  - In general, for a given x, if x appears at index j, then 5 steps will 
    be performed for every index smaller than j, 4 steps at index j, and 2 
    step for initialization for a total of 5(j-1)+6 = 5j+1.  If x does not 
    appear, then 5 steps will be performed for every index in A, 1 last 
    step for the last loop test, 1 step for line 6, and 2 steps at the 
    beginning, for a total of 5n + 4 steps.
  - By definition, then, T(n) = 5n+4 (maximum value of t(x,A) over all
    inputs (x,A) of size n).
  - If we had defined different cost values for each of lines 1-6, the
    function for T(n) could be (for example) 7n+5, which is still
    linear in n. We prefer to ignor the additive constants AND the 
    constant factors.
  - We need a formal mechanism to reason about functions while ignoring
    these constant factors.
  - For any function g : N -> R^+ (i.e., g takes natural numbers as
    arguments and returns positive real numbers),
      . O(g) = { f : N -> R^+ | exists n_0 in N exists c in R^+
                forall n >= n_0  f(n) <= c g(n) }
      . Omega(g) = { f : N -> R^+ | exists n_0 in N exists c in R^+
                forall n >= n_0  f(n) >= c g(n) }
      . Theta(g) = { f : N -> R^+ | exists n_0 in N exists c_1,c_2 in R^+
                forall n >= n_0  c_1 g(n) <= f(n) <= c_2 g(n) }
  - Intuitively, f is O(g) means "f grows slower than or as fast as g", f
    is Omega(g) means "f grows faster than or as fast as g", and f is
    Theta(g) means "f grows as fast as g".
  - Another way of looking at these notations is:
    . f is O(g) can be written "g is an asymptotic upper bound for f";
    . f is Omega(g) can be written "g is an asymptotic lower bound for f";
    . f is Theta(g) can be written "g is an asymptotic tight bound for f".

  - Example 1: Prove 2n^2 + 6n + 2 is O(n^2).
    Scratch work (NOT part of proof): for all n >= 1, 2n^2 <= 2n^2 
    and 6n <= 6n^2 and 2 <= 2n^2, so 2n^2 + 6n + 2 <= 2n^2 + 6n^2 + 2n^2 = 
    10n^2. [Alternatively, 2n^2 <= 2n^2 and 6n+2 <= n^2 for all n >= 7 so 
    we could pick n_0 = 7 and c = 3.]
    
    Proof: Let n_0 = 1 and c = 10.  Then for all n >= 1,
        2n^2 + 6n + 2 <= 2n^2 + 6n^2 + 2n^2 = 10n^2.  Hence, by 
        definition, 2n^2 + 6n + 2 is O(n^2).
    Moral: to prove that n_0 and c "exist", must state *specific* values 
    for them and prove that for all n >= n_0, f(n) <= c g(n) for those 
    specific values of n_0 and c.

  - NOTE: the values c and n_0 should be **fixed**. In other words, 
    finding a different value of c for different values of n doesn't prove
    anything useful!!

  - Example 2: Prove n^4 is NOT O(n^2).
    Scratch work (NOT part of proof): to prove n_0 and c do not exist, 
        must give general argument that no value will work; usually, prove 
        by contradiction: assume n_0 and c exist such that for all 
        n >= n_0, n^4 <= c n^2.  Then for all n >= n_0, n^2 <= c (divide 
        by n^2). But there are definitly values of k >= n_0 such that 
        k^2 > c. For example, picl k to be max(c+1, n_0).
    Proof: For a contradiction, assume n^4 is O(n^2), i.e., there are n_0 
        in N, c in R^+ such that n^4 <= c n^2 for all n >= n_0, i.e., 
        n^2 <= c for all n >= n_0.  Let k = max{n_0,c+1}.  
        Then, k >= n_0 so we should have k^2 <= c.  
        But k^2 >= (c+1)^2 = c^2 + 2c + 1 > c, which is a contradiction.
        Hence, n^4 is not O(n^2).

    In General, to prove that f(n) is not O(g(n)), do a proof by 
    contradiction, where the main step is to show how to find a value of k 
    as a function of arbitrary n_0 and c such that k >= n_0 and f(k) > c 
    g(k).

  - Example 3: Let L: N ---> N be the function 
    L(n) = Log_2(1) + Log_2(2) + ... + Log_2(n). (This is a very important 
    function). Prove that L(n) in Theta(n log_2 n)

    Proof: We must show that L(n) in O(n log_2 n) and L(n) in Omega(nlog_2 n)
        Uppoer bound: L(n) = log_2 1 + log_2 2 + ... + log_2 n
                          <= log_2 n + log_2 n + ... + log_2 n
                           = n log_2 n
        So with n_0 = 1 and c = 1, L(n) in O(n log_2 n).
        Lower bound:  
           L(n) = log_2 1 + log_2 2 + ... + log_2 n
               >= log_2 (ceil n/2) + log_2(ceil n/2 + 1) + ... + log_2 (n)
               >= log_2(ceil n/2) + log_2(ceil n/2) + ... + log_2(ceil n/2)
                = (ceil n/2 + 1)log_2(ceil n/2)
               >= (n/2)log_2(n/2)
                = (1/2)n(log_2 n - log_2 2)
                = (1/2)n(log_2 n - 1)
                = (1/2)n log_2 n - (1/2)n
               >= (1/4)n log_2 n             for n >= 4, since for
                                             such n, (n/2) <= (n/4)log_2 n
       Therefore, with n_0 = 4 and c = (1/4), L(n) in Omega(n log_2 n).
       Since L(n) in O(n log_2 n) and L(n) in Omega(n log_2 n), L(n) in 
       Theta(n log_2 n).
  
Some tips:
  - Here are some tips that might be helpful to quickly check whether f is 
    in O(g) or Omega(g) or Theta(g), without giving a proof.
    ***NOTE***: if you are asked to give a proof you MUST give a formal 
    proof, that is you must find those constants n_0 and c that are 
    described in definition of BigO or Omega.

    . For any k >= 1 and any e>0:  (log n)^k is in O(n^e).
      Example:  (log n)^5 is in O(n), and (log n)^100 is in O(n^{1/2}).

    . If f and g are polynomials, say 
         f = a_k*n^k + a_{k-1}*n^{k-1} + ... + a_1*n + a_0
      and 
         g = b_m*n^m + b_{m-1}*n^{m-1} + ... + b_1*n + b_0,

       - if k <= m then f is in O(g)
       - if k >= m then f is in Omega(g)
       - if k = m then f is in Theta(g)
    . Read more properties about BigO and Omega on pages 95-98 of the text 
      book.