INDEX
Explanations
phrases emphasizing freedom and rights-related themes
New Auto-Interp
Negative Logits
icom
-0.16
achuset
-0.16
eral
-0.15
amak
-0.15
annes
-0.14
abant
-0.14
ges
-0.14
antis
-0.14
çĥ¦
-0.14
olla
-0.14
POSITIVE LOGITS
odo
0.19
y
0.16
ALES
0.14
à¤Ĥध
0.14
ODO
0.14
GÃľ
0.14
kö
0.13
eldorf
0.13
ëł
0.13
.nz
0.13
Activations Density 0.088%