INDEX
Explanations
references to animal reproduction and life cycles
New Auto-Interp
Negative Logits
horses
-0.17
Married
-0.17
horse
-0.16
married
-0.16
vrouwen
-0.16
Horse
-0.16
Dogs
-0.16
Women
-0.15
Women
-0.15
dogs
-0.15
POSITIVE LOGITS
hatch
0.27
pup
0.26
suck
0.24
pups
0.24
fled
0.23
lings
0.22
nursing
0.22
kits
0.22
born
0.22
bro
0.22
Activations Density 0.079%