INDEX
Explanations
words related to self-referential concepts and actions
New Auto-Interp
Negative Logits
abwe
-0.16
ahoma
-0.16
iform
-0.15
akest
-0.14
geb
-0.14
argent
-0.14
Jensen
-0.14
tie
-0.14
idia
-0.13
éĪ
-0.13
POSITIVE LOGITS
/self
0.41
Self
0.33
Self
0.30
self
0.29
self
0.28
(Self
0.28
SELF
0.27
-self
0.26
SELF
0.25
=self
0.23
Activations Density 0.031%