INDEX
Explanations
phrases that indicate purpose or intent
New Auto-Interp
Negative Logits
.habbo
-0.15
WSC
-0.15
uses
-0.15
pp
-0.15
WithEmail
-0.14
amon
-0.14
rello
-0.14
kill
-0.14
udeau
-0.14
ivol
-0.14
POSITIVE LOGITS
-called
0.36
oner
0.26
iled
0.26
oth
0.25
ley
0.25
aks
0.24
ìį¨
0.23
iling
0.23
apy
0.23
they
0.22
Activations Density 0.038%