INDEX
Explanations
concepts related to individual identities and their roles
New Auto-Interp
Negative Logits
one
-0.24
lant
-0.19
.ends
-0.16
ÑĤÑı
-0.15
exo
-0.15
icorn
-0.14
ERY
-0.14
одно
-0.14
atu
-0.14
ixa
-0.14
POSITIVE LOGITS
-dimensional
0.26
-sided
0.25
liners
0.23
-way
0.23
onta
0.22
-third
0.21
-direction
0.20
particular
0.20
SELF
0.19
's
0.19
Activations Density 0.117%