INDEX
Explanations
phrases related to political endorsements and implications
New Auto-Interp
Negative Logits
IPA
-0.16
382
-0.15
owski
-0.15
odia
-0.15
476
-0.14
à¸Ĺาà¸ĩ
-0.14
cth
-0.14
bens
-0.14
Enlarge
-0.13
pell
-0.13
POSITIVE LOGITS
iant
0.17
ient
0.16
ÑĪел
0.15
Hood
0.15
ients
0.14
noxious
0.14
nar
0.14
udy
0.14
fora
0.13
poisoned
0.13
Activations Density 0.005%