INDEX
Explanations
phrases related to animal behavior and care instruction
New Auto-Interp
Negative Logits
fur
-0.16
kat
-0.16
_cats
-0.16
kat
-0.16
adoption
-0.15
Kat
-0.15
Fur
-0.15
fur
-0.15
vet
-0.15
710
-0.15
POSITIVE LOGITS
shaping
0.21
retrieves
0.17
commands
0.17
training
0.17
Sit
0.17
foundation
0.17
reward
0.17
associ
0.17
sit
0.16
heel
0.16
Activations Density 0.025%