INDEX
Explanations
words related to monkeys and gorillas
references to primates, particularly monkeys and gorillas
New Auto-Interp
Negative Logits
sburgh
-0.90
kson
-0.84
oric
-0.80
sheet
-0.79
rences
-0.78
ties
-0.77
ensing
-0.73
HUD
-0.70
idents
-0.70
enza
-0.69
POSITIVE LOGITS
zees
1.08
zee
1.05
chimpanzees
0.93
gorilla
0.93
apes
0.88
eering
0.83
primates
0.81
chimpan
0.80
elephant
0.77
gor
0.75
Activations Density 0.087%