INDEX
Explanations
phrases related to various types of animals
references to animals
New Auto-Interp
Negative Logits
Sutherland
-0.76
minster
-0.73
âĸ¬
-0.71
heit
-0.70
nance
-0.67
nder
-0.65
ãĥ´ãĤ¡
-0.64
pai
-0.64
ership
-0.62
stream
-0.61
POSITIVE LOGITS
animals
1.17
mammals
0.98
carc
0.92
Animals
0.92
animal
0.90
primates
0.83
elephants
0.83
carniv
0.82
slaughtered
0.82
animal
0.80
Activations Density 0.015%