INDEX
Explanations
phrases indicating ambiguity or uncertainty about specific criteria or definitions
New Auto-Interp
Negative Logits
loh
-0.18
nees
-0.14
iset
-0.14
ican
-0.14
deal
-0.14
lero
-0.14
toc
-0.14
sea
-0.14
/parser
-0.14
sep
-0.14
POSITIVE LOGITS
_DECLARE
0.16
à¹ĥà¸Ķ
0.16
particular
0.15
ingle
0.15
ripper
0.14
_specific
0.14
icit
0.14
-specific
0.14
_PROTO
0.14
pigeon
0.14
Activations Density 0.080%