INDEX
Explanations
phrases that indicate advocacy, support, and enhancement of rights or well-being
New Auto-Interp
Negative Logits
ovol
-0.16
avid
-0.15
h
-0.14
ivirus
-0.14
obi
-0.14
milano
-0.13
ä»°
-0.13
Dummy
-0.13
eger
-0.13
ÑĥÑĢи
-0.13
POSITIVE LOGITS
.si
0.16
¼åIJĪ
0.16
Sharper
0.16
aed
0.16
alion
0.15
esub
0.15
à¹Īà¸ĩ
0.14
kaar
0.14
ahat
0.14
Shar
0.14
Activations Density 0.328%