INDEX
Explanations
concepts related to self-identity and personal expression
New Auto-Interp
Negative Logits
ungle
-0.19
Gir
-0.18
urahan
-0.16
monds
-0.15
hands
-0.15
Length
-0.14
angan
-0.14
Minds
-0.14
yc
-0.13
wsp
-0.13
POSITIVE LOGITS
identity
0.31
Identity
0.31
identity
0.30
Identity
0.29
_identity
0.28
.Identity
0.26
identities
0.23
.identity
0.23
身份
0.20
.IDENTITY
0.20
Activations Density 0.213%