INDEX
Explanations
references to politicians and political figures
New Auto-Interp
Negative Logits
actory
-0.73
uras
-0.68
ļé
-0.65
awar
-0.64
Condition
-0.64
Infinite
-0.64
Width
-0.64
ventory
-0.64
wered
-0.64
Resurrection
-0.63
POSITIVE LOGITS
clinton
1.08
hips
0.91
hip
0.84
icians
0.80
appoint
0.80
woman
0.74
makers
0.73
correctness
0.72
elected
0.72
Franch
0.69
Activations Density 0.019%