INDEX
Explanations
references to different types of animals
references to animals
New Auto-Interp
Negative Logits
minster
-0.77
lder
-0.70
nance
-0.68
heit
-0.66
âĸ¬
-0.65
Bauer
-0.65
Sutherland
-0.64
nee
-0.63
nant
-0.63
nder
-0.62
POSITIVE LOGITS
animals
1.35
Animals
1.16
mammals
1.10
animal
1.03
carc
1.03
animal
1.02
apes
0.98
reptiles
0.95
brates
0.93
primates
0.93
Activations Density 0.014%