INDEX
Explanations
references to familial relationships and connections
New Auto-Interp
Negative Logits
kla
-0.15
ahas
-0.15
tesy
-0.15
stan
-0.14
inge
-0.14
ampie
-0.13
afil
-0.13
ÑĥÑģÑĤи
-0.13
stery
-0.13
_LP
-0.13
POSITIVE LOGITS
own
0.21
organisation
0.18
organization
0.17
own
0.16
owied
0.15
organizations
0.15
organisations
0.15
OWN
0.15
ola
0.14
Own
0.14
Activations Density 0.199%