INDEX
Explanations
references to chimpanzees and related experimental contexts
New Auto-Interp
Negative Logits
urette
-0.17
itest
-0.17
grass
-0.17
eyh
-0.16
лиÑĩ
-0.16
sword
-0.15
adla
-0.15
atak
-0.15
Karlov
-0.14
FetchType
-0.14
POSITIVE LOGITS
monkeys
0.30
monkey
0.29
Monkey
0.25
chimpan
0.25
Monkey
0.24
ape
0.23
orang
0.23
monkey
0.23
gor
0.22
tree
0.20
Activations Density 0.040%