INDEX
Explanations
mentions of politicians
references to politicians
New Auto-Interp
Negative Logits
urious
-0.70
uras
-0.69
uran
-0.69
actory
-0.66
Condition
-0.66
lights
-0.66
âĶĢâĶĢâĶĢâĶĢ
-0.65
ventory
-0.64
zig
-0.64
gged
-0.64
POSITIVE LOGITS
clinton
1.06
hips
0.85
appoint
0.81
icians
0.80
correctness
0.79
impe
0.78
hip
0.77
elected
0.73
ervatives
0.73
woman
0.70
Activations Density 0.024%