INDEX
Explanations
texts related to voting and results, with some mentions of travel and jobs
references to personal actions or experiences
New Auto-Interp
Negative Logits
withd
-0.84
sequently
-0.77
advoc
-0.65
Due
-0.64
shown
-0.63
Additionally
-0.62
ividual
-0.62
Furthermore
-0.61
similar
-0.61
Similarly
-0.60
POSITIVE LOGITS
ain
0.82
damned
0.79
!)
0.72
roar
0.71
ya
0.70
)!
0.70
huh
0.70
?!
0.70
damn
0.70
kidding
0.69
Activations Density 1.076%