INDEX
Explanations
names of political figures or organizations
New Auto-Interp
Negative Logits
Ùĩ
-0.68
caution
-0.67
MORE
-0.62
oise
-0.62
Rica
-0.59
Hels
-0.59
hold
-0.59
Haz
-0.59
Albion
-0.58
Saud
-0.57
POSITIVE LOGITS
ptoms
1.31
pson
1.30
posium
1.16
otor
1.03
aceutical
0.96
otion
0.93
ichael
0.91
otos
0.90
achine
0.90
iasm
0.89
Activations Density 0.017%