CMPUT 325 - Assignment #2

Due Date: 11:59pm, Thursday 21/Oct/2004, using

try c325 a2 your_file
You should submit one file containing all your programs along with documentation. (Include answers to dry-lab questions as comments.) Your programs will be run against our solutions to provide feedback to your work. You may submit many versions; the last one before the deadline will be retained as your submission. (If you submit prior to the deadline, but plan to use one or two of your excused late days, you must tell the TAs to disregard your pre-deadline submission.) Questions about the try command can be directed to tommy@cs.ualberta.ca

Assignment Marks: There are a total of 85 points possible on this assignment.  It is worth 12% of the total course grade. Your programs must be readable and understandable as well as correct. You should read the guidelines as given in programming style and marking guideline. You could lose up to 1/2 of the marks for bad style even if your programs are correct.

Restrictions:

Except where stated otherwise, you may only use the built-in Lisp functions and special forms shown here

You may write one or more functions to solve any given problem below. In some cases it may be desirable to decompose a problem into smaller ones. However, if a problem has a straightforward single-function solution, it is a bad idea to solve it in a complex way by decomposition.



#1 (5 points):  Write the Lisp function
    (mir-rim L)
that takes a n-element list L as its argument, and returns a 2n-element list formed by doubling each element of L, in reversed order. The run-time for your code should be only linear in the length of the input list.

As examples

>(mir-rim '(3 1 t) )
(t t 1 1 3 3)

>(mir-rim '(() (a b) t) ) (t t (a b) (a b) () ())


#2 (5 points):  The class notes show a non-trivial Lisp form that, when evaluated, returns exactly the same form. Write a new Lisp form, Reverso that, when evaluated, returns a new form that is identical to the original EXCEPT its top-level elements are reversed. That is, if the original form is ((lambda ...) (b c) d) (where of course each of ..., b, c and d can itself be a non-trivial expression), then
>( (lambda ...) (b c) d)
(d (b c) (lambda ...))

>(setf Reverso '( (lambda ...) (b c) d))
( (lambda ...) (b c) d)

>(eval Reverso)
( d (b c) (lambda ...) )

>(equal (eval Reverso) (reverse Reverso))
t

Notes:
#3 (10 points):  Write the Lisp function
      (create-predicate bool_expr)
that takes a boolean expression (defined below), and returns a function, call it bool_fn. This bool-fn takes one argument assgn, which is an assignment, and returns the value the original bool_expr would have given to this assgn --- t or nil.

The boolean expression bool_expr should match the syntax:

    <Bool>  ::= <Var> | (and <Bool>*) | (or <Bool>*) | (not <Bool>)
where <Var> is a variable (a Lisp atom, which is typically of the form "x8" or "y"; you may assume it is not eq to and, or or not), moreover the Kleene "*" means 0 or more copies. Eg,
     (and (not x1) (or (not x5) x1)) x2)
which of course corresponds to
     -x1 & (-x5 v x1) & x2

An assignment is just as we used in HW#1: A list of 2-element lists, whose CAR is the name of a variable and whose CADR is either t or nil, to indicate that that variable is either True or False. Eg,

     ((x1 nil) (x5 nil) (x3 t) (x2 nil) ) 
means x1, x2, and x5 are each false and x3 is false. The bool_fn function produced should be specific to the bool_expr argument --- ie, it should NOT simply call GeneralBoolEval on bool_expr and the assignment. Of course, it may use some other functions, from the allowed list, and perhaps other that you have defined from that list.

As examples

> (create-predicate '(and (not x1) (or x5 (not x1)) (and x2)) )
(lambda (assn) (cond [[NOT SHOWN] ) ))

> (funcall (create-predicate '(and (not x1) (or (not x5) x1) x2)) '((x1 nil) (x5 nil) (x3 t) (x2 nil) ) ) nil
> (let ((my-fn (create-predicate '(and (not x1) (or (not x5) x1) x2)))) (funcall my-fn '((x2 t) (x1 nil) (x4 t) (x5 nil)) ) ) t

You may assume that the argument to bool_fn specifies the value of all of the variables that appeared in bool_expr (and perhaps others as well).


#4 (20 points):  Learning a (Simple) Classifier

In general, a "classifier" is a function that maps an unlabeled instance to a label -- eg, a "DiseaseX classifier" maps a description of a patient to either "DiseaseX = true" or "DiseaseX = false". There are many learning, or "induction", systems that produce such classification functions from a "labeled data sample" Data = { [e^i, c^i]}_i, where each e^i = [e_1^i, ... e_n^i] is a record listing n attributes of a specific patient (eg, Old=true, Male = false, Smoker = true, ...) and c^i is the label for this i-th patient (eg, DiseaseX = true).

One of the simplist types of learning is "Consistency Filtering":

Here, both attributes and labels are Boolean, and (initially) H is the class of simple conjunctions; eg, h_3(.) = +x1 & -x3.

There can, in general, be many functions in H consistent with the training data; we will focus on the most conservative ones -- ie, the ones that return t as infrequently as possible. In particular, given no positive training instances, it would never return t; and given a single positive training instances, it would return t only for that single instance, but no others. In general, given a set of positive instances, it should return the "smallest" conjunction that maps those instances to t, where "smallest" means it says t as rarely as possible. (Notice this is probably the conjunction containing the MOST literals.)

(RG added 17/Oct)
Here, the classifer h(.) is a boolean predicate,

   h: {0,1}^n -> {0,1}
whose "positive values" are
   POS(h)  =   h^{-1}(1)   =   {x \in {0,1}^n | h(x)=1 }.
We want classifier that keeps this POS(h) set as SMALL as possible, requiring only that POS(h) include every positive instance seen.

#4a (5 points):  Write the Lisp function

       (Learn1CNF data)
that takes as input a datasample data, and returns the boolean function suggested above -- corresponding to a "conservative" conjunction. (That function should take an assignment and return t or nil.)

This conservative consistency filtering algorithm is fairly straightforward, as it involves just doing something (for you to determine!) with the literals of the positive examples. It does not depend on the negative examples.
(Hint: it may help to view a conjunction as a set of literals.)

The data will be a list, whose elements are of the form

     (class . assignment)
where class is either + or -, and assignment is the same as above (and HW#1).

>(setf dataA '(
 (+ (x1 t)   (x2 t)   (x3 nil) (x4 t))
 (+ (x1 t)   (x2 nil) (x3 nil) (x4 nil))
 (- (x1 t)   (x2 nil) (x3 t)   (x4 nil))
 (- (x1 nil) (x2 nil) (x3 t)   (x4 nil))
))
...

>(setf fn (Learn1CNF dataA))
...

>(funcall fn '((x1 t)   (x2 nil) (x3 t)   (x4 t)))
nil

>(funcall fn '((x1 t)   (x2 t) (x3 nil)   (x4 nil)))
t
You may assume the data is complete -- in that each instance specifies the value for every attribute.
(RG added 15/Oct)
In addition, you may assume there is a 1CNF formula, h, that is consistent with the input dataset data --- ie, for each positive datapoint x, h(x) = true, and for each negative datapoint x, h(x) = false.

#4b (15 points):  The name "1CNF" refers to the class whose elements are each a conjunction of "1-element clauses", where each "1-element clause" contains (at most) 1 literal --- note this corresponds to a conjunction. (Recall a literal is either a positive or a negative variable.) Each member of the "2CNF" class is a conjunction of 2-element clauses, where each 2-element clause contains at most 2 literals. (Note there are (n choose 2) * 2^2 such 2-element clauses.)

Write the conservative consistency filtering Lisp function

   (Learn2CNF data)
that takes as input a datasample data, and returns a 2CNF function. As above, this function should return t for each positive example, but only as rarely as possible.
(RG added 15/Oct)
Again, you can assume the datasample is complete, and consist with some 2CNF formula.

Once again, it will help to think of a 2CNF function as a set of 2-element clauses, where this set should contain as many of these 2-element clauses as possible. (This is a hint!) In particular, you should think of when a positive example requires Learn2CNF to remove some specific 2-element clause. (Hint: Consider the positive instance that includes ((A t) (B t) ...). and consider any 2CNF formula that includes the 2-element clause (or (not A) (not B)). What value would that formula assign to this instance?)


(RG added 15/Oct)
To see an example where Learn1CNF and Learn2CNF differ, consider the dataset

>(setf dataB '(
 (+ (x1 t)   (x2 t)   )
 (+ (x1 nil) (x2 nil) ) 
))
...

>(setf fn1 (Learn1CNF dataB))
...
>(setf fn2 (Learn2CNF dataB))
...

>(funcall fn1 '((x1 t)   (x2 nil)) )
t
>(funcall fn2 '((x1 t)   (x2 nil)) )
nil

In fact, fn1 should return t for every assignment (ie, it is the 1CNF formula with 0 clauses), while fn2 should return t only for the two positive cases it saw ((x1 t) (x2 t)) and ((x1 nil) (x2 nil)), as it is the 2CNF formula (+x1 v -x2) & (-x1 v +x2).


(RG added actual formulas above on 17/Oct)


#5 (30 points):  The "sat" question in the last assignment (Question 6 in HW#1) suggested the very naive, and inefficient, "Generate-and-Test" way to seek a satisfying assignment: sequentially consider each COMPLETE assignment, until finding one that satisfies the formula.

In many cases, we can tell that a PARTIAL assignment is unsatisfiable. For example, if the formula is

    F1 = (-x1) & (x2 v -x7) & ...
we know that any assignment that includes
    A1 = ((x1 t) ... )
cannot satisfy F1. This observation means we can eliminate 2^(n-1) possible candidate assignments! (A clause with a single literal is called a "unit clause".)

This question explores ways to produce a function that is more effective in answering SAT questions.

#5a (5 points): Write the Lisp function

    (my-reduce cnf var val)
that produces a new CNF formula corresponding to the given cnf, but with the variable var set to the value val.
(RG changed from "reduce" to "my-reduce"; 14/Oct)
The algorithm should go through cnf = (c1 & c2 & ... & cr) clause by clause, and consider three cases for each clause c_i, depending on whether var appears in c_i with the same val (ie, positively if val = +, negatively if val = -), the opposite value, or does not appear. Eg
>(my-reduce '( ((+ x1) (- x2) (+ x3))
            ((+ x2) (+ x4) (- x5))
	    ((- x1) (+ x2))
	    ((- x3)) )
         'x1  t)
( ((+ x2) (+ x4) (- x5))
  ((+ x2))
  ((- x3)) )
Note the first clause ((+ x1) (- x2) (+ x3)) is removed as it involves "x1 is True", the second clause ((+ x2) (+ x4) (- x5)) is unaffected as it does not involve x1, (similarly for the 4th clause ((- x3))), and the third clause ((- x1) (+ x2)) is reduced to ((+ x2)) as it involves x1 is False.

#5b (25 points): Write the Lisp function

(clever-sat cnf)
that has the same behavior as sat: It returns either a satisfying assignment to cnf if one exists, and otherwise, return Unsat. (This value means a small change from HW#1.) Here, however, you are encouraged to use any ideas you wish, to produce code that is as efficient as possible.

There are a number of tricks one can use here. Eg, you may want to think of ways to use my-reduce on unit clauses you find, or use resolution and some other derivation schemes. (Feel free to peek at the notes for the 2nd part of the course, or to ask Google for advice --- look for "boolean satisfiability" or whatever. For this question, you are allowed to use Random, which is not in pure-Lisp.)

Note that you will get NO credit for using a simple Generate-and-Test routine; ie, you should not simply re-submit your earlier program (from HW#1, Q#6). (G-and-T routines will simply time-out on some of our testing cases.)

We will have a "bake-off", to see whose algorithm scores best across a large test-set, and announce the winner in class!
(You may want to use the (time [form]) macro to determine how long your functions are taking.)


#6 (5 points): Write the Lisp function

   (my-random max seed)
that returns an infinite lazy list of (pseudo)random numbers between 0 and max-1. The sequence of random numbers are generated using a simple linear congruential generator. It generates an internal sequence of values, where the k+1st number is computed from the k-th using the formula
   n_{k+1} : = ((alpha * n_k ) + beta) mod gamma
To create numbers in the range 0 to max-1, we take the mod of each term in the sequence:
   r_k = n_k  mod  max
(this is the value returned).
We use: alpha = 1366, beta= 150889 and gamma=714025, and start with n_0 = seed (specified as second argument to my-random). Eg,
>(setf r (my-random 20 46341))
(15 . (LAMBDA-CLOSURE ...))

>(setf r (lcdr r))
(4  . (LAMBDA-CLOSURE ...))

>(setf r (lcdr r))
(18  . (LAMBDA-CLOSURE ...))
Notice the first answer 15 is r_1, obtained from n_1.

(RG changed from "n_1" to "n_0" above.)


#7 (6 points)  Simplifications: Lambda Calculus ["Dry Lab"]

Reduce each of the following expressions to their normal forms, if possible.  (Be sure to show the intermediate steps.)   Or explain why this is not possible.

  1. (  ( x | (x a)) b)
  2. (  ( x | (x a))  (  y | b) )
  3. (  ( x | (x a))  (  x | b) )
  4. (  ( x | ( y| (x y)))  (  x | b) )
  5. (  ( x | ( x| (x a)))  (  y | b) )
  6. (  ( x | ( y| (x a)))  (b y) )
  7. (  ( x | (y x))   ( x | (y x)) )
  8. (  ( x | (x y))   ( x | (x y)) )
  9. (  ( x | (x y))   ( x | (x x)) )
  10. (  ( x | (x x))   ( x | (x y)) )
  11. (  ( x| (r (x x))) (( z | ( y | (y (z y)))) ( w|w)))
  12. (  ( x | (r (x x))) (( z | ( y | (y (z y)))) ( w|q)))
(Apparently the character "" does not show up on some browsers. It should be a LAMBDA. For your write-up, feel free to use "L", or whatever. Just be sure to specify this in your write-up.)
#8 (4 points)  Reduction Order: Lambda Calculus ["Dry Lab"]

Consider the following expression:

Show the first reduction if this was evaluated in Normal Order; and show the final result.
Show the first reduction if this was evaluated in Applicative Order; and show the final result.
 



End of Assignment 2.