INDEX
Explanations
references to primates and their characteristics
New Auto-Interp
Negative Logits
illes
-0.18
é±¼
-0.16
èįī
-0.16
sword
-0.16
fir
-0.16
omes
-0.15
lems
-0.15
lies
-0.15
itel
-0.14
bau
-0.14
POSITIVE LOGITS
monkey
0.23
monkeys
0.23
Monkey
0.22
Monkey
0.21
monkey
0.21
tree
0.20
ape
0.19
-human
0.19
Tree
0.17
arch
0.16
Activations Density 0.052%