INDEX
Explanations
mentions of politicians
references to politicians and political figures
New Auto-Interp
Negative Logits
urious
-0.74
actory
-0.73
ventory
-0.72
uran
-0.68
gged
-0.67
wered
-0.65
Cancel
-0.64
DEP
-0.64
uras
-0.63
Condition
-0.63
POSITIVE LOGITS
clinton
1.08
appoint
0.82
hips
0.82
correctness
0.76
icians
0.74
impe
0.68
woman
0.68
junk
0.67
elected
0.67
hip
0.67
Activations Density 0.030%