INDEX
Explanations
references to lions and their conservation
New Auto-Interp
Negative Logits
Egg
-0.19
bubble
-0.19
Ducks
-0.18
egg
-0.18
paddle
-0.18
Duck
-0.17
Eggs
-0.17
egg
-0.17
ponge
-0.16
duck
-0.16
POSITIVE LOGITS
lions
0.42
lion
0.41
Lions
0.38
lion
0.37
Lion
0.34
Tigers
0.33
tiger
0.33
leo
0.32
cats
0.31
Tiger
0.31
Activations Density 0.058%