INDEX
Explanations
mentions of the name "Clinton"
mentions of a specific political figure, primarily "Clinton"
New Auto-Interp
Negative Logits
teenth
-0.76
DIS
-0.70
AY
-0.68
oya
-0.66
Conan
-0.66
raints
-0.66
umar
-0.65
orescent
-0.65
semble
-0.64
Tu
-0.64
POSITIVE LOGITS
Rodham
1.02
herself
0.97
Clinton
0.97
istas
0.94
INTON
0.94
aide
0.94
confid
0.91
Clinton
0.90
0.90
aides
0.88
Activations Density 0.063%