INDEX
Explanations
references to animals and their interactions with humans
New Auto-Interp
Negative Logits
ickle
-0.17
pig
-0.16
Cater
-0.16
Boyd
-0.15
gh
-0.15
cater
-0.14
frog
-0.14
pigs
-0.14
889
-0.14
stagnant
-0.14
POSITIVE LOGITS
Roths
0.15
improvis
0.15
ella
0.15
kest
0.14
mange
0.14
ÐĬ
0.14
/inet
0.14
improvised
0.14
ê
0.14
molt
0.14
Activations Density 0.079%