INDEX
Explanations
phrases expressing acceptance or permission
New Auto-Interp
Negative Logits
maal
-0.19
ilee
-0.15
yll
-0.15
polator
-0.15
ILE
-0.15
kop
-0.14
ilet
-0.14
beauty
-0.14
hower
-0.14
út
-0.14
POSITIVE LOGITS
aby
0.18
apy
0.15
ordova
0.14
/wait
0.14
à¥įयप
0.13
ably
0.13
rum
0.13
pez
0.13
asper
0.13
ola
0.13
Activations Density 0.034%