INDEX
Explanations
phrases indicating absence or negation
New Auto-Interp
Negative Logits
asl
-0.15
dh
-0.14
kses
-0.14
Brock
-0.14
enos
-0.14
gether
-0.14
al
-0.13
Marin
-0.13
otti
-0.13
IIC
-0.13
POSITIVE LOGITS
ä»»ä½ķ
0.24
altogether
0.23
вообÑīе
0.22
any
0.20
žádné
0.19
CKER
0.17
vůbec
0.17
ÙĩÛĮÚĨ
0.16
Alto
0.16
ANY
0.16
Activations Density 0.263%