INDEX
Explanations
references to scientific experimentation involving primates
New Auto-Interp
Negative Logits
sharks
-0.17
urray
-0.15
erin
-0.15
пÑĥнк
-0.15
Beit
-0.15
poultry
-0.15
pawn
-0.14
ghost
-0.14
éŃļ
-0.14
mina
-0.14
POSITIVE LOGITS
prim
0.38
ape
0.37
ap
0.34
Prim
0.34
monkey
0.33
monkeys
0.33
gor
0.32
Monkey
0.31
orang
0.30
chimp
0.29
Activations Density 0.061%