INDEX
Explanations
instances of personal pronouns and references to self
New Auto-Interp
Negative Logits
iaux
-0.18
massaggi
-0.15
ÏĦÏİν
-0.15
ÑĢÑĸм
-0.15
/or
-0.15
oodles
-0.15
ignon
-0.14
olia
-0.14
igation
-0.14
.onDestroy
-0.14
POSITIVE LOGITS
else
0.18
simply
0.18
another
0.17
equival
0.16
phans
0.16
just
0.15
ner
0.15
ignal
0.15
s
0.14
any
0.14
Activations Density 0.120%