INDEX
Explanations
references to political figures, particularly presidents
mentions of the word "President."
New Auto-Interp
Negative Logits
sensit
-0.74
fitt
-0.74
magnitude
-0.71
bane
-0.70
IMAGES
-0.69
âĶĢâĶĢâĶĢâĶĢ
-0.69
conditioning
-0.67
legs
-0.65
intoler
-0.65
vain
-0.64
POSITIVE LOGITS
ial
1.19
ially
1.15
President
0.98
hip
0.94
IAL
0.88
clinton
0.88
doms
0.85
Donald
0.84
president
0.82
Barack
0.81
Activations Density 0.013%