INDEX
Explanations
negative contractions and phrases indicating refusal or inability
New Auto-Interp
Negative Logits
923
-0.18
taire
-0.17
angement
-0.15
reet
-0.15
ÑĭÑĪ
-0.14
_IL
-0.14
npos
-0.14
atrix
-0.13
eren
-0.13
NavController
-0.13
POSITIVE LOGITS
chwitz
0.15
że
0.15
chs
0.14
-linear
0.14
gent
0.14
ate
0.14
ORTH
0.14
ستÙĩ
0.13
chal
0.13
be
0.13
Activations Density 0.059%