INDEX
Explanations
phrases expressing doubt or uncertainty
New Auto-Interp
Negative Logits
thon
-0.18
ún
-0.15
ukan
-0.14
illes
-0.13
wap
-0.13
_failure
-0.13
注æĦı
-0.13
fucking
-0.13
æk
-0.13
пеÑĢ
-0.13
POSITIVE LOGITS
.Solid
0.14
Rica
0.14
:
0.14
iffies
0.14
ricks
0.14
uh
0.13
èĻŁ
0.13
éry
0.13
613
0.13
erm
0.13
Activations Density 0.108%