INDEX
Explanations
phrases indicating absence or negation
New Auto-Interp
Negative Logits
каÑģ
-0.17
istrovstvÃŃ
-0.15
ÑĦоÑĢ
-0.15
neider
-0.15
edis
-0.15
å§
-0.15
rente
-0.14
riba
-0.14
typeof
-0.14
nÃło
-0.14
POSITIVE LOGITS
ocker
0.16
ork
0.16
naments
0.14
Revel
0.14
retched
0.14
ãģ£ãģį
0.13
ock
0.13
碼
0.13
ascimento
0.13
ire
0.13
Activations Density 0.067%