INDEX
Explanations
expressions of pride and self-identity
New Auto-Interp
Negative Logits
inne
-0.18
ichen
-0.17
aky
-0.16
̧
-0.15
wich
-0.15
ilk
-0.15
erosis
-0.15
erm
-0.15
Permanent
-0.15
cripts
-0.14
POSITIVE LOGITS
pride
0.30
Pride
0.26
crest
0.20
onas
0.15
proud
0.15
.UIManager
0.15
pron
0.15
-pr
0.15
ACHI
0.14
eg
0.14
Activations Density 0.017%