INDEX
Explanations
phrases that emphasize assurance and confirmation in various contexts
New Auto-Interp
Negative Logits
inability
-0.18
Worse
-0.15
aminer
-0.14
USTER
-0.14
isor
-0.14
oref
-0.14
phin
-0.14
oup
-0.14
許
-0.13
许å¤ļ
-0.13
POSITIVE LOGITS
everyone
0.24
proper
0.24
adequate
0.23
sufficient
0.23
every
0.21
appropriate
0.20
enough
0.20
nothing
0.20
none
0.19
each
0.18
Activations Density 0.101%