INDEX
Explanations
proper nouns related to political figures and organizations
New Auto-Interp
Negative Logits
dup
-0.86
ramids
-0.86
robe
-0.81
oad
-0.72
ramid
-0.72
ength
-0.68
Sleeping
-0.64
atron
-0.64
towers
-0.63
oop
-0.63
POSITIVE LOGITS
debated
1.11
discussion
0.97
unresolved
0.95
debate
0.93
discussed
0.89
dispute
0.86
hotly
0.86
topic
0.85
topics
0.85
moot
0.85
Activations Density 0.361%