INDEX
Explanations
references to familial relationships and parental roles
New Auto-Interp
Negative Logits
babies
-0.18
ickey
-0.17
Babies
-0.16
juan
-0.16
infants
-0.15
Baby
-0.15
inson
-0.14
Baby
-0.14
bab
-0.14
erv
-0.14
POSITIVE LOGITS
inactive
0.15
.eclipse
0.14
active
0.14
ìķħ
0.14
Dans
0.13
soon
0.13
blended
0.13
ÑĥÑĤи
0.13
cat
0.13
dog
0.13
Activations Density 0.051%