Skip to content

Commit

Permalink
doco
Browse files Browse the repository at this point in the history
  • Loading branch information
timmenzies committed Sep 19, 2024
1 parent 85c072f commit 481712c
Show file tree
Hide file tree
Showing 2 changed files with 160 additions and 0 deletions.
124 changes: 124 additions & 0 deletions docs/hw4.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>HW4 : Concepts</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
/* The extra [class] is a hack that increases specificity enough to
override a similar rule in reveal.js */
ul.task-list[class]{list-style: none;}
ul.task-list li input[type="checkbox"] {
font-size: inherit;
width: 0.8em;
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
</style>
<link rel="stylesheet" href="style.css" />

<link rel="icon" type="image/x-icon" href="favicon.ico">

<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<div class=wrapper>
<p>
csc 591-024, (8290)<br>
csc 791-024, (8291)<br>
fall 2024, special topics in computer science<br>
Tim Menzies, timm@ieee.org, com sci, nc state
<hr>
<a href="index.html">home</a>
:: <a href="timetable.html">timetable</a>
:: <a href="syllabus.html">syllabus</a>
:: <a href="https://docs.google.com/spreadsheets/d/17m4BWszQvmI3fINgs-C-zyAfavOta7K9qFol5yhmw8w/edit?usp=sharing">groups</a>
:: <a href="https://moodle-courses2425.wolfware.ncsu.edu/course/view.php?id=4181&bp=s">moodle</a>
:: <a href="https://github.com/txt/se4ai24/blob/main/LICENSE">license</a> </p>
<img src="img/brain.png" align=left width=280
style="padding: 10px; padding-right: 15px; -webkit-filter: drop-shadow(-10px 10px 10px #222); filter: drop-shadow(-10px 10px 10px #222); ">


<header id="title-block-header">
<h1 class="title">HW4 : Concepts</h1>
</header>
<ol type="1">
<li>What is the standard line on “how much data is enough?”
<ul>
<li>From Peter Norvig</li>
<li>From regression theory</li>
<li>From semi-supervised learning</li>
</ul></li>
<li>Describe each of the following. What are their implications for
human decision making
<ul>
<li>Streaming over zero-diversity data</li>
<li>STM, LTM</li>
<li>Shrikanth’s early bird effect</li>
<li>two results (Valerdi’s work; repertory grids) commenting on the rate
at which we can extract considered opinions from humans</li>
</ul></li>
<li>In English, describe the following math results and their
implications for data mining
<ul>
<li>chessboard model</li>
<li>probable correctness theory.</li>
</ul></li>
<li>Few-shot learning (FSL):
<ul>
<li>describe it (use this as a guide:
https://www.promptingguide.ai/techniques/fewshot)</li>
<li>In what sense does FSL mean we can look at fewer examples?</li>
<li>In what sense does FSL require many, many examples</li>
</ul></li>
<li>Describe an active learning cycle. include the words acquisition
function, exploit, explore, warm/code start</li>
<li>Acquisition functions. Distinguish and define the following terms.
 diversity, perversity, (population|surrogate|pool|stream)-based</li>
<li>What is PCA? (Us this as a guide:
https://en.wikipedia.org/wiki/Principal_component_analysis)
<ul>
<li>How is ezr’s recursive use of “twoFar” an analog of PCA?</li>
<li>Recursive twoFar generates a tree of clusters. What does ezr’s leaf
function do? Assuming balances cluster tree are already built, in “big
O” notation, what is leaf’s runtime?</li>
<li>The ezr functions half and cluster sort the sort flag. What is the
impact on #evalautions oil sortp=True</li>
<li>(HARD): Currently, in 24Aug12, ezr’s half (called from brach)
requires two evals per depth of tree. Can you reduce that to one (for
all levels except the top one)</li>
</ul></li>
<li>Describe, in Engkish, the k-means algorithm (hint: see “kmeans” in
https://github.com/timm/noml/blob/main/src/mink.py)
<ul>
<li>How is the same/different as recursive twoFar clustering?</li>
</ul></li>
<li>What is “discretization?” (hint: see
https://www.blog.trainindata.com/data-discretization-in-machine-learning/)
<ul>
<li>What is gaussian discretization? Hint: unsuper, the “bin” function
in https://github.com/timm/noml/blob/main/src/unsuper.py</li>
<li>How should the gaussian discretizer handle symbolic columns
(hint:trick question).</li>
<li>How does unsuper divide numeric columns (hint: see COLS’bins
function and its use of i.bin)</li>
<li>Why does unsuper need to merge (some) of the bins it generated
?</li>
<li>How does unsuper’s merges function combine tow adjacent bins?</li>
</ul></li>
</ol>




</div>
</body>
</html>
36 changes: 36 additions & 0 deletions docs/hw4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
% HW4 : Concepts


1. What is the standard line on “how much data is enough?”
* From Peter Norvig
* From regression theory
* From semi-supervised learning
2. Describe each of the following. What are their implications for human decision making
* Streaming over zero-diversity data
* STM, LTM
* Shrikanth’s early bird effect
* two results (Valerdi’s work; repertory grids) commenting on the rate at which we can extract considered opinions from humans
3. In English, describe the following math results and their implications for data mining
* chessboard model
* probable correctness theory.
4. Few-shot learning (FSL):
- describe it (use this as a guide: https://www.promptingguide.ai/techniques/fewshot)
- In what sense does FSL mean we can look at fewer examples?
- In what sense does FSL require many, many examples
5. Describe an active learning cycle. include the words acquisition function, exploit, explore, warm/code start
6. Acquisition functions. Distinguish and define the following terms.  diversity, perversity, (population|surrogate|pool|stream)-based
7. What is PCA? (Us this as a guide: https://en.wikipedia.org/wiki/Principal_component_analysis)
- How is ezr’s recursive use of “twoFar” an analog of PCA?
- Recursive twoFar generates a tree of clusters. What does ezr’s leaf function do? Assuming balances cluster tree are already built, in “big O” notation, what is leaf’s runtime?
- The ezr functions half and cluster sort the sort flag. What is the impact on #evalautions oil sortp=True
- (HARD): Currently, in 24Aug12, ezr’s half (called from brach) requires two evals per depth of tree. Can you reduce that to one (for all levels except the top one)
8. Describe, in Engkish, the k-means algorithm (hint: see "kmeans" in https://github.com/timm/noml/blob/main/src/mink.py)
- How is the same/different as recursive twoFar clustering?
9. What is "discretization?" (hint: see https://www.blog.trainindata.com/data-discretization-in-machine-learning/)
- What is gaussian discretization? Hint: unsuper, the "bin" function in https://github.com/timm/noml/blob/main/src/unsuper.py
- How should the gaussian discretizer handle symbolic columns (hint:trick question).
- How does unsuper divide numeric columns (hint: see COLS'bins function and
its use of i.bin)
- Why does unsuper need to merge (some) of the bins it generated ?
- How does unsuper's merges function combine tow adjacent bins?

0 comments on commit 481712c

Please sign in to comment.