INDEX
Explanations
mentions of specific large animals
mentions of specific large animal species, particularly elephants and tigers
New Auto-Interp
Negative Logits
ucer
-0.78
member
-0.75
tor
-0.73
Member
-0.73
ame
-0.68
ident
-0.68
ICLE
-0.67
joint
-0.67
rator
-0.67
user
-0.67
POSITIVE LOGITS
cows
2.39
chickens
2.38
elephants
2.35
dolphins
2.25
horses
2.20
goats
2.08
tigers
2.05
ducks
2.00
lions
1.98
monkeys
1.98
Activations Density 0.037%