INDEX
Explanations
politicians' names and geographical locations
references to political figures and their affiliations
New Auto-Interp
Negative Logits
heid
-0.84
xon
-0.80
estern
-0.79
abwe
-0.75
yout
-0.72
agate
-0.70
tty
-0.69
ascript
-0.69
allo
-0.69
Zot
-0.64
POSITIVE LOGITS
ULAR
0.73
remission
0.61
metic
0.60
aez
0.58
utive
0.58
Marie
0.57
isolation
0.57
faint
0.56
Saiyan
0.56
Lauder
0.56
Activations Density 0.315%