Tasks #2

matthewr6 · 2019-11-24T21:44:15Z

Biterm model finalizing + experiments (Chris)
~~TKM experiments (Matthew)~~
~~K-means experiments (different # clusters)~~ (Sasankh)
LDA experiments (different featurizers/clusters) (Matthew)
Pseudo tf-idf topic extractor (based on clusters rather than documents - tf within cluster, idf against other clusters - Chris
~~Centroid vector extractor (Matthew)~~
MultinomialNB max probability-based extractor (Sasankh)
Brief descriptions/writeups of each featurizer, model (+ extractor), and experiment (if experiment is interesting; from results and visualizations)
Poster (Sasankh + Chris)
Start on paper (Chris + Sasankh)

Graphs (Matthew - dependent on data from experiments)

accuracy differences
for most successful model - plot of clusterings
- cluster topics
- examples of articles

matthewr6 · 2019-11-27T00:17:54Z

update on TKM model - very poor clustering (see PCA graphs), decided to drop it

update on centroid vector extractor - done

Provide feedback