INDEX
Explanations
phrases that indicate success or guarantees in various contexts
New Auto-Interp
Negative Logits
kir
-0.14
ogo
-0.14
uch
-0.14
Gä
-0.14
orld
-0.13
uli
-0.13
lse
-0.13
sdale
-0.13
kv
-0.13
288
-0.13
POSITIVE LOGITS
/full
0.18
edly
0.18
pure
0.17
?url
0.16
Pure
0.16
fled
0.15
accurate
0.14
-addons
0.14
-ajax
0.14
chedulers
0.14
Activations Density 0.030%