INDEX
Explanations
terms and concepts related to self-awareness and self-identity
New Auto-Interp
Negative Logits
vala
-0.17
lse
-0.16
ourselves
-0.15
èĩªå·±
-0.15
lr
-0.14
zk
-0.14
ãĥ©ãĤ¹
-0.14
ous
-0.14
üb
-0.14
themselves
-0.14
POSITIVE LOGITS
hood
0.28
änd
0.22
/self
0.20
ishly
0.20
same
0.19
Portrait
0.18
hood
0.18
hoo
0.18
ständ
0.18
ridge
0.18
Activations Density 0.035%