INDEX
Explanations
familial relationships and connections between characters
New Auto-Interp
Negative Logits
PC
-0.18
uper
-0.15
ovah
-0.14
maz
-0.14
ROY
-0.14
otor
-0.14
bé
-0.14
pent
-0.14
aliz
-0.13
abb
-0.13
POSITIVE LOGITS
son
0.29
brother
0.24
sons
0.20
sister
0.18
ë¥
0.18
.son
0.18
bro
0.18
his
0.18
daughter
0.17
son
0.16
Activations Density 0.116%