INDEX
Explanations
mentions of political figures' last names
names of political figures and their affiliations
New Auto-Interp
Negative Logits
ossier
-0.78
displayText
-0.77
myster
-0.76
endorsements
-0.72
endors
-0.68
accompan
-0.66
proble
-0.65
vana
-0.64
ilater
-0.64
assistants
-0.64
POSITIVE LOGITS
bre
0.82
cone
0.72
rame
0.70
fruit
0.69
Nar
0.68
hend
0.66
hurst
0.65
pload
0.65
hoff
0.63
Robot
0.63
Activations Density 0.359%