INDEX
Explanations
phrases indicating clarity or proof of expectations
New Auto-Interp
Negative Logits
zl
-0.17
isses
-0.16
owitz
-0.15
νÏİ
-0.14
addock
-0.14
оÑĢÑĤÑĥ
-0.14
?url
-0.14
ãĥ³ãĥĪ
-0.14
_BORDER
-0.14
zzle
-0.13
POSITIVE LOGITS
wonder
0.61
Wonder
0.45
wondered
0.36
wonders
0.35
Wonder
0.35
wondering
0.31
unsur
0.27
onder
0.27
surprise
0.26
sur
0.25
Activations Density 0.118%