Skip to content

Latest commit

 

History

History
111 lines (88 loc) · 5.08 KB

hw4.md

File metadata and controls

111 lines (88 loc) · 5.08 KB

 

home | copyright ©2019, tjmenzie@ncsu.edu

syllabus | src | submit | chat

Homework 4

WARNING! Now things are getting...interesting. This might take 2 weeks to complete. But make you must hand in something each week, even if it is incomplete (else you will lose points).

Using the Abcd class, score the performance of

  • ZeroR
  • and Nb

on weathernom and diabetes. In all cases:

  • Read rows, one at a time
  • For each row:
    • If you've already seen, say, 4 rows then make a prediction about the class of the newest row.
    • Always update a Tbl with that row (but always remember that, if you are making predictions, then do that before the update). Why? Well, this ensures that you are not using data from the row to predict for that row.

For sample implementation of this see Abcd1.

ZeroR

ZeroR predicts that the class of the next row is the mode of the classes seen so far.

  • A sample ZeroR implementation is given here.

NB

Nb (Naive Bayes) builds one table for each separate class. So:

  • Each time a row is read
  • Look at its class
  • Ensure that you have a table from that class. That is:
    • If you have not seen that class before, then create a new table.
  • Add the new row to the table for that row's class.
  • Also, maintain a separate table for "overall"; i.e. each new row updates
    1. One table, just for the class of that row
      • So if the data contains 5 classes, we maintain five tables.
    2. A second table that stores info on all rows
  • A sample Nb implementation is given here.
    • The main loop of this Nb looks at each row. For each row, it then looks at each class table and finds the one that "likes" this row the most.
    • Note that, at its core, this code asks each column of each class how much it likes some column of the current row.
    • My code spreads out that like implemetnation between NumLike and SymLike each row how much it "likes" this

Output

Your code should print something like the following (and don't worry if the numbers do not exactly match).

Here's AbcdReport describing what happens after ZeroR runs over weathernon and diabetes. In the following, we read 3 rows before doing any classification. Note that, diabetes that testedpositive is in the minority so the recalls (pd) and false alarms for this class are very low (since ZeroR rarely predicts them).

#--- zerorok -----------------------

weathernon
    db |    rx |   num |     a |     b |     c |     d |  acc |  pre |   pd |   pf |    f |    g | class
  ---- |  ---- |  ---- |  ---- |  ---- |  ---- |  ---- | ---- | ---- | ---- | ---- | ---- | ---- |-----
  data |    rx |    12 |     6 |     3 |     3 |       | 0.50 | 0.00 | 0.00 | 0.33 | 0.00 | 0.00 | no
  data |    rx |    12 |     0 |     3 |     3 |     6 | 0.50 | 0.67 | 0.67 | 1.00 | 0.67 | 0.00 | yes

diabetes
    db |    rx |   num |     a |     b |     c |     d |  acc |  pre |   pd |   pf |    f |    g | class
  ---- |  ---- |  ---- |  ---- |  ---- |  ---- |  ---- | ---- | ---- | ---- | ---- | ---- | ---- |-----
  data |    rx |   766 |    24 |    25 |   243 |   474 | 0.65 | 0.66 | 0.95 | 0.91 | 0.78 | 0.16 | tested_negative
  data |    rx |   766 |   474 |   243 |    25 |    24 | 0.65 | 0.49 | 0.09 | 0.05 | 0.15 | 0.16 | tested_positive

Here's AbcdReport describing what happens after Nb runs over weathernon and diabetes. In the following, we read 4/20 rows of weathernon/diabetes before doing any classification. Note now the recall (pd) and false alarms (pf) for testedpositive and much healthier.

#--- nbok -----------------------

weathernon
    db |    rx |   num |     a |     b |     c |     d |  acc |  pre |   pd |   pf |    f |    g | class
  ---- |  ---- |  ---- |  ---- |  ---- |  ---- |  ---- | ---- | ---- | ---- | ---- | ---- | ---- |-----
  data |    rx |    11 |     5 |     2 |     3 |     1 | 0.55 | 0.25 | 0.33 | 0.38 | 0.29 | 0.43 | no
  data |    rx |    11 |     1 |     3 |     2 |     5 | 0.55 | 0.71 | 0.62 | 0.67 | 0.67 | 0.43 | yes

diabetes
    db |    rx |   num |     a |     b |     c |     d |  acc |  pre |   pd |   pf |    f |    g | class
  ---- |  ---- |  ---- |  ---- |  ---- |  ---- |  ---- | ---- | ---- | ---- | ---- | ---- | ---- |-----
  data |    rx |   765 |   176 |   115 |    90 |   384 | 0.73 | 0.81 | 0.77 | 0.34 | 0.79 | 0.71 | tested_negative
  data |    rx |   765 |   384 |    90 |   115 |   176 | 0.73 | 0.60 | 0.66 | 0.23 | 0.63 | 0.71 | tested_positive